Skip to main content

AI agents waste 99% of their context scanning raw source code. Cortex is a zero-dependency Rust engine that pre-computes local call graphs, traces taint paths, and persists agent memory over MCP.

Cortex: Your AI Agent Is Reading Your Code Wrong
34 mins

The problem with how agents read codeh2

Every AI coding agent in 2026 does the same thing when you ask a structural question: it reads your files. Every single time.

Ask Claude Code, Cursor, or Cline something like “What calls processOrder?” and watch the terminal. The agent opens a file, scans it line by line, finds an import, opens another file, and keeps going. It can easily burn 20,000 tokens on raw source code just to trace a single caller relationship. Start a new session five minutes later and ask the same question, and it does the whole thing again from scratch. It retained nothing.

This is not a model intelligence problem. Frontier models like GPT-5, Claude Sonnet 4.5, or Gemini 3.1 Pro reason incredibly well. The issue is the interface layer between the agent and your codebase. The agent receives unstructured text when it needs structure. It gets raw files when it needs call relationships. A massive stream of data, when what it actually wanted was a single connection.


Why bigger context windows are not the answerh2

The obvious fix is a larger context window. A million tokens sounds like plenty. But research keeps showing that throwing more context at agents does not make them better, it often makes them worse.

A Chroma Research study, “Context Rot: How Increasing Input Tokens Impacts LLM Performance” (July 2025) tested 18 frontier models across repository-scale tasks. Every model showed measurable performance degradation as input length grew — not a sudden cliff at the end of the context window, but a steady, continuous decline in reasoning quality. A separate 2025 study on long-context models found this degradation happens even when retrieval is perfect. The problem is not whether the right information is in the context. It is how much irrelevant noise surrounds it.

When you flood an agent’s context with raw file content, it has to spend its attention parsing boilerplate code, comments, and imports just to find the one relationship it needed. The more files it reads, the more its accuracy on the next step drops.

The academic research on this has been converging on a common answer. GraphCoder (Liu et al., 2024) proposed retrieving code contexts via a structural graph rather than text similarity, yielding a +6.06 improvement in exact-match code completion while using less time and memory. CodexGraph (Liu et al., 2024) built on that by replacing file-reading with structured graph queries against a codebase-wide code graph. Both studies reached the same conclusion: for repository-scale tasks, structure beats raw text.


The Jail House Lock problemh2

If you have read or watched JoJo’s Bizarre Adventure Stone Ocean, you know exactly what this failure mode feels like.

There is a Stand named Jail House Lock, used by Miu Miu. Its ability is simple but terrifying: it limits its victim to holding only three new pieces of information in their working memory at a time. Learn a fourth thing and the first one disappears. The victim is not made less intelligent. Their reasoning remains intact. They just cannot hold enough context to act on what they know.

Jail House Lock Stand ability
Jolyne’s memory fragmenting under Jail House Lock.

This is exactly what happens to AI coding agents today.

A medium-sized codebase is typically on the order of hundreds of thousands to a few million tokens (roughly 500k–2M), depending on language and code density. An agent cannot fit all of that into context, so it reads files one at a time, forgets what it read three files ago, and makes architectural decisions based on whatever fragment happens to be in its memory right now.

Miu Miu’s strategy in the story is to flood Jolyne with distractions, like counting individual bullets or reading prison signs, so the crucial details get pushed out. Every file an agent reads that is not directly relevant to the task works the same way, pushing out the structural map of your codebase.

Jolyne’s solution is elegant. She stops trying to memorize individual facts and instead learns to look at a reflected image in a mirror to see all the bullets at once, compressing many facts into a single unit.

That is the core idea behind Cortex. It pre-computes the structural relationships in your codebase and serves them as a single, compressed unit. The agent gets a complete answer about callers, callees, or blast radius in one response costing a few hundred tokens. Its context window stays free for actual reasoning.


How agents work today vs. how they should workh2

Standard AI agent context consumption vs. Cortex graph-based traversal
How standard AI agents consume codebases (by scanning files textually) versus the Cortex graph-based traversal approach

The difference is not incremental. If you ask an agent what breaks if you change a core database pool function, the standard approach forces the agent to trace imports recursively across dozens of files. That can easily cost tens of thousands of tokens (e.g., 20k–100k). Cortex answers that same query with a breadth-first traversal in less than 1,000 tokens. That represents a 100x reduction on a single query.

Over a typical coding session where an agent makes 20 to 40 structural queries, the savings compound. The agent stays within its context budget and can actually reason about the answers instead of forgetting them.


What is actually in the landscape right nowh2

Before looking at how Cortex works, it is worth exploring what else exists in the developer tool space and where each approach makes compromises.

  • Repomix: The most popular tool by stars, Repomix packs your entire repository into a single, clean text file with token counting and security checks. The constraint is that it is a one-shot dump. The agent still reads everything. There is no session memory, no delta awareness, and no structural query interface. It works well for small projects, but it quickly fills up the context window on larger codebases.
  • LeanCTX: A highly feature-complete tool. It includes delta reads, shell output compression, session checkpoints, and a dashboard showing token savings. Its main challenge is complexity: it exposes ~58 different tools (as of Q1 2026), which creates a large system prompt overhead for the agent.
  • codebase-memory-mcp: Built in Zig, this tool performs call graph tracing, cross-service HTTP linking, and dead code detection. It has a fast setup but lacks cross-session memory and shell output compression.
  • Engram: Operating entirely locally without GPUs or cloud APIs, Engram intercepts file reads and returns structural summaries instead of full file content. It is extremely efficient but is limited to a few core languages.
  • Serena: This tool uses the Language Server Protocol (LSP) to provide highly precise type information. It excels at type-level queries but requires setting up and running active language servers for each language in your project, making it more complex to deploy.

None of these tools combine graph-based structural intelligence, cross-session memory with automatic staleness invalidation, built-in security analysis, and a zero-dependency binary that configures itself for multiple agents.


What Cortex actually ish2

Cortex is a single Rust binary. It does not require a Python runtime, Docker, cloud API keys, or active language servers. When you run cortex index, it uses tree-sitter to parse your repository, extracting functions, classes, call edges, import relationships, and data flows, storing them in a local SQLite database. When you run cortex serve, it exposes this structural graph over the Model Context Protocol (MCP) using a standard JSON-RPC 2.0 stdio interface.

Any MCP-compatible agent can connect to it immediately. This includes Claude Code, Cursor, Copilot, Cline, Zed, JetBrains, and many others. The agent can use 32 granular tools, or run in smart mode where a single ask meta-tool routes requests internally.

The graph stays updated automatically. A background file watcher listens for native OS file events and triggers incremental re-indexing whenever you save a file. Re-indexing a changed file takes less than 15 milliseconds, ensuring the agent always queries current code.

Terminal window
# Set up Cortex in under 30 seconds
npx @1337xcode/cortex install # downloads binary, detects agents, writes config
cortex index # builds the call graph (127 files in 535ms)

The installer detects your active IDEs and agents, writing the correct configuration entries automatically without requiring manual JSON editing.


Who uses this and whyh2

Cortex is designed for a few distinct development workflows:

  • Vibe Coders: Developers who rely heavily on AI agents to write code. For these users, Cortex prevents the agent from getting lost in large codebases. The agent has a structural map, so it makes fewer mistakes and stays oriented during long coding sessions.
  • Senior Architects: Teams that need to map out legacy systems. Cortex can generate architectural overviews, find dead code, and run community detection to visualize how modules are coupled in reality versus their intended design.
  • Migration Engineers: Developers tasking agents with refactoring legacy code. Cortex helps the agent calculate the blast radius of changing a function, showing exactly how many upstream files will be affected before a line of code is modified.
  • Security & Platform Teams: Teams that want basic security scans without setting up heavy SAST pipelines. Cortex can scan for OWASP patterns, trace user input to sensitive sinks, and generate SPDX SBOMs locally.

Architecture: three subsystems in one processh2

Cortex runs three concurrent subsystems inside a single process:

Cortex architecture
Cortex architecture: File Watcher, Indexer, and MCP subsystems running concurrently inside a single process, backed by SQLite WAL mode
  • The Indexer: Walks the filesystem, parses source files in parallel using Rayon, extracts symbols and edges, and writes them to SQLite. It uses a two-pass resolution strategy for cross-file call edges: the first pass collects definitions to build a symbol table, and the second pass resolves call targets against it.
  • The File Watcher: Uses the notify crate to listen to native OS events like inotify, FSEvents, or ReadDirectoryChangesW. When you save a file, it tells the indexer to re-process only that file. If the SHA-256 hash of the file is unchanged, the update completes in about 13 milliseconds.
  • The MCP Server: Runs on a Tokio async runtime to handle concurrent tool requests. It communicates over stdio using JSON-RPC 2.0. SQLite’s Write-Ahead Logging (WAL) mode enables readers to query the database concurrently without blocking the indexer writes.
Sequence diagram of the file change and query resolution pipeline
Sequence diagram showing the asynchronous file watcher triggering single-file re-indexing in SQLite WAL mode while the MCP server concurrently serves the agent

The Indexer Pipelineh2

Cortex indexer pipeline
The Cortex indexer pipeline: from file change detection and AST parsing to FQN resolution and observation invalidation

The indexer supports 29 languages. 26 use compiled tree-sitter grammars that are statically linked into the binary (no runtime downloads, no external grammar files). Three languages (Kotlin, SQL, Perl) use regex-based extraction as a fallback because their tree-sitter grammar crates have version conflicts with the rest of the tree-sitter ecosystem.

Languages are tiered by extraction quality based on how much structural information the grammar queries can extract:

  • Tier 1 (Full call graph, imports, routes): Python, TypeScript, JavaScript, Rust, Go, and Java. These are the languages where Cortex’s structural analysis is most complete. If your codebase is primarily in one of these languages, you get the full benefit of blast radius analysis, taint flow tracing, and cross-file call resolution.
  • Tier 2 (Symbols + partial call edges): C#, C++, Ruby, Swift, Scala, PHP, and Dart. The graph is useful but may miss some indirect calls or complex dispatch patterns.
  • Tier 3 (Symbol extraction, limited edges): The remaining languages (Haskell, Elixir, Lua, Zig, Bash, R, Objective-C, OCaml, Julia, HCL/Terraform, YAML). You still get function and class definitions indexed, searchable, and visible in the architecture overview. However, call graph edges are sparse.

Parsing is parallelized with Rayon. Each file gets its own tree-sitter parser instance on a thread pool worker. The work-stealing scheduler means fast files do not block slow files. For a 3,500-file Python project (the CPython standard library), full indexing completes in under 60 seconds. A typical 100-file web application takes about 500ms. Large repositories (50K+ files) are processed in batches of 500 to avoid memory exhaustion.

The tree-sitter grammars are compiled from source at build time via the cc crate (each grammar crate includes its own build.rs). No network downloads occur during compilation. All 26 grammar C sources are vendored within their respective crates.


The database schemah2

Cortex uses SQLite in WAL (Write-Ahead Logging) mode. WAL allows concurrent readers while a single writer operates, which maps perfectly to the architecture: many MCP tool calls reading simultaneously, one indexer writing.

Cortex database schema
SQLite database schema: tables representing nodes, edges, file snapshots, observations, taint paths, and security findings

On top of the core tables, Cortex maintains:

  • FTS5 virtual table (nodes_fts) for full-text search over FQN, kind, file, and attributes. Uses the unicode61 tokenizer with BM25 ranking (k1=1.2, b=0.75). Kept in sync via INSERT/UPDATE/DELETE triggers.
  • Vector embeddings table (node_embeddings) for optional semantic search via sqlite-vec and a local ONNX model (nomic-embed-text-v1.5, 768-dimensional embeddings).
  • SBOM entries for dependency tracking extracted from lock files.

Smart tool routing & the 32 MCP toolsh2

Exposing 32 different tools to an agent creates a lot of context overhead. Each tool definition, along with its schema, description, and arguments, consumes tokens just to be present in the system prompt. Exposing all 32 tools costs approximately 4,000 to 6,000 tokens of the agent’s context window before the session even starts.

Smart mode (cortex serve --smart-tools) reduces this to just 5 tools. The ask meta-tool accepts a natural language question, extracts the user’s intent and symbol references, routes it internally to the appropriate graph queries, and returns a unified answer. The agent interacts with one tool instead of 32, dropping context overhead by 89%.

// Simplified routing logic from src/mcp/ask.rs
fn route_question(question: &str) -> Vec<InternalTool> {
if contains_pattern(question, &["what calls", "who calls"]) {
vec![InternalTool::TraceCallers]
} else if contains_pattern(question, &["what does", "call"]) {
vec![InternalTool::TraceCallees]
} else if contains_pattern(question, &["breaks", "impact", "change"]) {
vec![InternalTool::BlastRadius]
} else if contains_pattern(question, &["security", "taint", "vuln"]) {
vec![InternalTool::FindTaintPaths, InternalTool::ScanOwasp]
} else if contains_pattern(question, &["dead code", "unused"]) {
vec![InternalTool::FindDeadCode]
} else if contains_pattern(question, &["architecture", "overview"]) {
vec![InternalTool::GetArchitecture]
} else {
// Fallback: search symbols + full text
vec![InternalTool::SearchSymbols, InternalTool::SearchText]
}
}

The tool categories cover:

CategoryToolsWhat they answer
Structuralsearch_symbols, trace_callers, trace_callees, get_file_context, get_architecture, find_dead_code, blast_radius, detect_changes, get_code_snippet, query_graph”What is this? Who uses it? What depends on it?”
Searchsearch_text, semantic_search”Find me something by name or concept”
HTTPget_http_routes, trace_http_call”What endpoints exist? Where does this call go?”
Securityfind_taint_paths, scan_owasp, generate_sbom, check_dependencies”Is this code vulnerable? What are we importing?”
Memorywrite_observation, read_observations, write_adr, read_adrs, prune_observations”Remember this. What did we learn before?”
Analysisdecompose_boundaries, get_complexity_hotspots, get_task_context, generate_steering, get_class_hierarchy, get_git_hotspots, get_import_graph, find_similar_functions”How is this codebase organized? Where are the risks?”

Cross-session memory and why staleness mattersh2

This is the feature that separates Cortex from a static index. Every other tool in this space treats each session as independent. The agent starts fresh every time, forgetting what it learned about your codebase yesterday.

Cortex maintains a persistent memory layer. When an agent learns something about your code, it can write that observation to Cortex, linked to the specific code symbol’s fully qualified name. The observation persists in SQLite across sessions, agent restarts, and machine reboots.

The engineering challenge here is trust. If an agent wrote “this function uses HMAC-SHA256 for token validation” three weeks ago, and a developer has since changed the function to use Ed25519, that old observation is now wrong. An agent that blindly trusts old observations will make incorrect decisions.

Cortex solves this with staleness invalidation. Every observation records the node_hash_at_write, which is the SHA-256 hash of the linked code at the time the observation was written. When the indexer re-processes a file and detects that a node’s content hash has changed, it marks all linked observations as stale. The observation still surfaces in query results, but with is_stale: true so the agent knows to verify it before relying on it.

Cross-session memory lifecycle
Cross-session memory lifecycle: writing observations, detecting code changes, marking observations as stale, and subsequent agent reading

This creates a feedback loop. Agents build institutional knowledge about the codebase over time, and that knowledge degrades gracefully when the code changes. Stale observations are not deleted because they might still be partially correct or historically useful. They are simply flagged so the agent can make an informed decision.

Architectural Decision Records (ADRs) work the same way. An agent can record “we chose connection pooling over per-request connections because the connection setup latency was dominating request time” linked to the database module. Future agents can read these decisions and understand the reasoning behind the current architecture without asking the developer to explain it again.

For teams where multiple agents work on the same codebase, the memory layer becomes shared institutional knowledge. Agent A’s observations about the API contract are visible to Agent B when it is working on the frontend that consumes that API.


Security analysis on the structural graphh2

Most security scanning tools operate on one of two levels. Static analysis tools (Semgrep, CodeQL) pattern-match against source text or AST patterns within single files. Dynamic analysis tools probe running applications. Both require significant setup and often cloud infrastructure.

Cortex takes a different approach. It runs security analysis on the structural call graph that already exists from indexing. There is no additional scanning pass, no cloud APIs, and no paid subscriptions.

  • Taint flow analysis: Traces data from user-input sources through the call graph to sensitive sinks. Sources include HTTP request parameters (Flask request.args, FastAPI path parameters, Express req.body, Go r.FormValue), environment variables, and file reads. Sinks include raw SQL queries, file writes with user-controlled paths, and shell command execution (os.system, subprocess.run, exec). The analysis follows call edges up to depth 5 for inter-procedural propagation. If function A reads user input and passes it to function B which passes it to function C which executes a SQL query, Cortex traces that full path.
  • OWASP Top 10 pattern detection: Runs against the structural graph, not just regex over source text. It detects patterns for A01 (Broken Access Control, functions that access resources without checking permissions), A02 (Cryptographic Failures, use of weak algorithms or hardcoded keys), A03 (Injection, unsanitized input reaching query construction), and A04 (Insecure Design, missing validation on trust boundaries). Each finding includes a confidence score and CWE classification.
  • SBOM generation: Generates software bills of materials in SPDX 2.3 format by extracting dependencies from the import graph and cross-referencing them with lock files. It reads Cargo.lock, package-lock.json, go.sum, requirements.txt, pyproject.toml, and Gemfile to get exact versions. The check_dependencies tool then queries OSV.dev for known CVEs against those versions.
Terminal window
# Full security workflow
cortex security scan # taint flows + OWASP patterns
cortex security sbom # SPDX 2.3 dependency list
cortex security vulns # cross-reference against OSV.dev (requires network)
cortex security report # human-readable summary of all findings
# CI integration with quality gates
cortex ci --fail-on-taint --fail-on-owasp # exit code 1 if issues found
cortex ci --fail-on-dead-code-above 15 # exit 1 if dead code exceeds 15%
cortex ci --format json # machine-readable output for CI dashboards

The taint analysis is structural, not symbolic execution. It will not catch every vulnerability that a dedicated SAST tool like CodeQL would find. It does not reason about string concatenation patterns or complex control flow within a single function. But it catches the common inter-procedural patterns (unsanitized HTTP input reaching SQL queries across function boundaries, user data flowing through three function calls to reach command execution) with zero configuration and zero cloud dependencies. No other tool in this space offers any security analysis at all.


Hybrid search and why three layers matterh2

Search in Cortex is not a single mechanism. It is three layers that activate in sequence based on result quality.

  • Layer 1: The graph index. When an agent calls search_symbols with a pattern like *UserService*, Cortex first tries exact and glob pattern matching against fully qualified names in the nodes table. This is fast and precise. If it returns 3 or more results, the search stops here.
  • Layer 2: FTS5 BM25. If the graph index returns fewer than 3 results, Cortex automatically falls back to full-text search over the FTS5 virtual table. This catches cases where the agent’s search term does not match the FQN pattern but does appear in the symbol name, file path, or attributes. BM25 ranking produces relevance-sorted results.
  • Layer 3: Vector similarity (optional). When enabled via cortex semantic enable, the semantic_search tool performs cosine similarity search over 768-dimensional embeddings generated by a local ONNX model (nomic-embed-text-v1.5). This handles conceptual queries like “find functions that handle authentication” where the word “authentication” might not appear in any symbol name.

Results from layers 1 and 2 are merged and deduplicated by FQN, sorted by confidence descending. The response includes _meta.retrieval_method (“graph”, “fts5”, or “hybrid”) so the agent knows how the results were found and can adjust its confidence accordingly.

The model for semantic search runs locally with no network calls after the initial 138 MB download, meaning it works in air-gapped environments. The embeddings are stored in SQLite via sqlite-vec, which is compiled as a loadable extension and statically linked into the binary. No external vector database is needed.


Multi-repo federation and cross-service intelligenceh2

Real projects are rarely a single repository. You have a frontend, a backend API, a shared library, maybe an auth service and a notification service. Each lives in its own repo. An agent working in the frontend repo has no visibility into the backend’s structure unless you give it that visibility.

Terminal window
cortex federate add ../auth-service
cortex federate add ../shared-lib
cortex federate add ../notification-service
cortex federate list

Each federated repository must have its own .cortex/ directory (run cortex index there first). Once federated, all MCP queries search across all member repos transparently. “What calls AuthService.validateToken?” returns results from every repository in the federation.

The cross-service HTTP linking is where this gets interesting for microservice architectures. If the frontend makes a fetch("/api/users") call and the backend has a route handler registered for GET /api/users, Cortex creates an HttpLink edge between them with a confidence score. The agent can trace a request from the frontend button click through the API gateway to the backend handler to the database query, across repository boundaries, in one tool call.

For system engineers managing microservice architectures, this answers questions that normally require grepping through 15 repositories. “Which services are affected if I change the /api/auth/refresh endpoint?” becomes a single blast_radius query across the federation.


Build system awareness and module boundary detectionh2

Cortex understands workspace structures for five build systems:

  • Cargo workspaces (Rust) by reading Cargo.toml [workspace] members
  • npm workspaces (Node.js) by reading package.json workspaces field
  • Go workspaces by reading go.work file
  • Gradle multi-module by reading settings.gradle
  • Maven multi-module by reading parent pom.xml

cortex modules --build-system shows workspace members as defined by the build configuration. This is the intended module structure.

cortex modules (without the flag) runs Leiden community detection on the call graph and shows clusters of tightly-coupled code based on actual call relationships. This is the actual module structure.

Comparing the two reveals architectural drift: code that the build system says belongs in module A but the call graph shows is actually tightly coupled to module B. This is the kind of insight that normally requires a senior architect spending a week with a whiteboard. Cortex produces it from graph analysis in under a second.


Git intelligence and risk scoringh2

Code that changes frequently and has many callers is the highest-risk code in any repository. A bug in that code affects the most downstream consumers, and a breaking change in that code requires the most coordination.

Cortex combines git commit history with the call graph to surface these hotspots:

Terminal window
cortex hotspots --months 6 --limit 20

Risk score formula: churn_count * caller_count. A function that changed 15 times in 6 months and has 30 callers scores 450. A function that changed once and has 2 callers scores 2. The high-scoring functions are where bugs are most likely to appear and where changes have the widest blast radius. This is a direct measurement of volatility multiplied by impact.

Coverage gap analysis cross-references LCOV test coverage data with the call graph:

Terminal window
cortex coverage --lcov coverage.lcov

This produces a ranked list of untested functions sorted by how many other functions call them. An untested utility function with 50 callers is a higher-priority coverage gap than an untested leaf function with 0 callers. Standard coverage tools tell you what percentage of lines are covered. Cortex tells you which uncovered functions are the most dangerous to leave untested.

It supports LCOV files generated by cargo-tarpaulin, jest, pytest-cov, gcov, llvm-cov, istanbul, and any tool that produces standard LCOV format.


3D visualizationh2

Terminal window
cortex viz --export graph.html

This generates a standalone HTML file with an embedded 3D force-directed graph using 3d-force-graph. Nodes are colored by Leiden community assignment and sized by caller count. You can rotate, zoom, click nodes to see their details, and visually identify clusters and bottlenecks.

The visualization is self-contained. No server is needed after export; you can open the HTML file in any browser. It is useful for onboarding new team members, architecture review meetings, and identifying god classes (oversized nodes with dozens of edges).

For the interactive version during development, cortex viz --port 9749 starts a local HTTP server with live updates as the graph changes.


The bundle format and team sharingh2

cortex bundle export serializes the full graph (nodes, edges, observations, ADRs, security findings) to a JSON file called cortex.json. This file can be committed to git.

Why JSON and not the SQLite database file? Three reasons:

  1. JSON is diffable in pull requests. You can see what structural relationships changed between commits. “This PR added 3 new call edges to the auth module” is visible in the diff.
  2. Adding fields is backward-compatible. New versions of Cortex can add fields to the bundle without breaking old bundles. Old versions ignore unknown fields.
  3. Developers can read it. Open cortex.json in any text editor and see the observations your team’s agents wrote, the architectural decisions recorded, and the security findings flagged. No SQLite client is needed.
Terminal window
# Developer A indexes and exports
cortex index && cortex bundle export
git add cortex.json && git commit -m "update graph bundle"
# Developer B pulls and imports (skips indexing entirely)
git pull && cortex bundle import
cortex serve # ready to query immediately

For teams where not everyone wants to install Cortex locally, the bundle means the graph is available as a readable JSON artifact in the repository. CI can generate it on every push. The bundle also supports CCG (Code Context Graph) export format via --format ccg for interoperability with other tools.


Design decisions and the reasoning behind themh2

Why Rusth3

Not for the meme, but for three concrete engineering reasons:

  • Single binary distribution: cargo build --release produces one executable with no runtime dependencies. No Python interpreter version conflicts, no Node.js, no JVM, and no Docker. The user downloads a binary and it works on their machine regardless of what else is installed. This matters because Cortex needs to run on developer machines with unpredictable environments, in CI containers, and on air-gapped workstations.
  • Tree-sitter FFI: Tree-sitter is a C library, and Rust’s FFI with C is zero-cost at runtime. The tree-sitter grammar crates compile their C sources via the cc crate at build time and link statically. No dynamic library loading, no PATH issues, and no version conflicts between grammars.
  • Rayon for parallelism: Parsing 3,500 files needs to be fast. Rayon’s work-stealing thread pool makes parallel file parsing trivial. The code is essentially files.par_iter().map(|f| parse(f)).collect(). There is no manual thread management, no async complexity for CPU-bound work, and no race conditions on the parser instances because each thread gets its own.

Why SQLite over a graph databaseh3

The obvious choice for a call graph would be Neo4j or DGraph or some purpose-built graph database. Cortex uses SQLite instead:

  • Zero configuration: No server process to start, no connection strings to configure, no Docker compose files, and no port allocation. The database is a single file at .cortex/graph.db.
  • WAL mode gives the exact concurrency pattern needed: Multiple concurrent readers (MCP tool calls) alongside a single writer (the indexer). No reader ever blocks, and the writer does not block readers.
  • FTS5 is built in: Full-text search with BM25 ranking without deploying Elasticsearch or Meilisearch. One less dependency, one less process, and one less thing that can break.
  • sqlite-vec for vectors: Optional semantic search without Pinecone or Qdrant. The vector index lives in the same database file.
  • Portability: The database file works on every OS without conversion.

The tradeoff is that SQLite’s single-writer constraint means indexing writes are serial. In practice, this does not matter because the bottleneck is parsing (parallelized with Rayon), not writing. The write phase for a typical index run is under 50ms, while the parse phase is 500ms to 60 seconds depending on codebase size.

Why MCP over stdio instead of HTTPh3

The Model Context Protocol uses JSON-RPC 2.0 over stdio (stdin/stdout). This is the simplest possible transport:

  • No port allocation conflicts (multiple Cortex instances can run simultaneously for different projects).
  • No TLS certificate management.
  • No firewall rules or CORS configuration.
  • The agent process spawns Cortex as a child process and communicates via pipes.

Every MCP-compatible agent already knows how to do this. Cortex reads JSON from stdin, writes JSON to stdout, and logs to stderr. The protocol is stateless at the transport level, and state lives in the SQLite database, not in the connection.

Why not LSPh3

The Language Server Protocol (LSP) is designed for IDE features like autocomplete, go-to-definition, and hover information. It is optimized for single-file, cursor-position queries: “What is the type of the variable at line 42, column 15?”

Cortex answers cross-codebase structural questions: “What is the blast radius of changing this function?” “What are the module boundaries?” “Where does user input flow to SQL queries?” LSP has no concept of blast radius, community detection, taint flow analysis, or cross-session memory.

LSP also requires per-language server implementations (rust-analyzer for Rust, pyright for Python, tsserver for TypeScript, gopls for Go). Each is a separate process with its own memory footprint and startup time. Cortex handles 29 languages with one binary because tree-sitter grammars are language-agnostic at the API level.

The tradeoff is precision. LSP-based tools like Serena have exact type information. They know that foo is a Vec<String> and can resolve generic type parameters. Cortex knows that foo is a function that calls bar and is called by baz. For structural questions (callers, callees, blast radius, dead code), tree-sitter extraction is sufficient. For type-level questions, LSP is more accurate.


The tech stack in detailh2

ComponentCrate / TechnologyVersionPurpose
CLIclap (derive macros)4.xArgument parsing, subcommand routing
Databaserusqlite (bundled)0.32SQLite with WAL, FTS5, compiled from source
Serializationserde + serde_json1.xJSON-RPC protocol, bundle format, config
Parsingtree-sitter + 26 grammar crates0.25AST extraction for 26 languages
Parallelismrayon1.xWork-stealing thread pool for file parsing
Async runtimetokio (full features)1.xMCP server, concurrent tool handling
File watchingnotify7.xNative OS filesystem events
Hashingsha20.10Content-based change detection
File walkingwalkdir2.xRecursive directory traversal with .gitignore
Pattern matchingregex1.xSecurity pattern detection, taint source/sink matching
Error handlinganyhow + thiserror1.x / 2.xApplication errors vs. typed library errors
Loggingtracing + tracing-subscriber0.1 / 0.3Structured logging with env-filter and JSON output
Configtoml0.8.cortex/config.toml parsing
IDsuuid (v4)1.xObservation and ADR identifiers
HTTP serveraxum + tower-http (optional)0.7 / 0.5Visualizer endpoint, CORS
ML inferenceort + tokenizers + ndarray (optional)2.0-rc / 0.21 / 0.16ONNX runtime for semantic embeddings

The bundled feature on rusqlite means SQLite itself is compiled from C source and statically linked. This prevents system SQLite version conflicts, and the binary works on a fresh OS install with nothing else installed.

Optional features are gated behind Cargo feature flags. --features visualizer enables the axum HTTP server for the 3D graph UI. --features semantic enables ONNX inference for vector search. The default build includes neither, keeping the binary size around 25 MB with all 26 tree-sitter grammars statically linked.


Performance numbersh2

These are measured values from actual usage, not theoretical estimates:

WhatMeasuredContext
Full index, 127 files, ~30K lines535msTypical web application
Full index, 3,500 files (CPython stdlib)<60sLarge Python project
Incremental re-index, no changes13msHash comparison only
Incremental re-index, 1 file changed<15msSingle file re-parse + write
trace_callers query, depth 3<5msBFS over indexed edges
get_architecture response size~1,000 tokensRegardless of codebase size
get_file_context response size500-800 tokensvs. 15,000+ tokens for raw file
Token reduction on structural queries100x200 tokens vs. 20,000 tokens
Binary size, release build, stripped~25 MBAll grammars statically linked
Database size, 127-file project~2 MBNodes + edges + FTS5 index
First build from source3-5 minutesCompiling all tree-sitter C sources
Incremental Rust build20-40 secondsAfter initial compilation
MCP server startup<100msDatabase open + schema check
Concurrent read connections1-16 (default 4)Configurable via pool_size

Full feature comparisonh2

This table compares Cortex against every relevant tool in the space based on what each tool actually ships today, not roadmap items:

FeatureCortexLeanCTXcodebase-memory-mcpRepomix
ArchitectureRust binary, SQLiteRust binaryC binary, SQLiteNode.js CLI
Languages2921155N/A
MCP tools32 (or 5 smart)59140
Smart routingYesNoNoN/A
Call graph tracingBFS depth 5PartialBFS depth 5No
HTTP linkingYesYesYesNo
Dead codeYesNoYesNo
Taint flowYesNoNoNo
OWASP scanYesNoNoNo
SBOM generationYesNoNoNo
Dependency checkYesNoNoNo
Cross-session memoryYesYesNoNo
Staleness invalidationYesYesNoNo
Multi-repo federationYesNoNoNo
Hybrid searchYesYesGraph onlyNo
CI quality gatesYesNoNoNo
3D visualizationYesNoYesNo
Git hotspotsYesNoNoNo
Coverage gap analysisYesNoNoNo
Community detectionYesNoNoNo
Build system workspacesYesNoNoNo
Shell output compressionNoYesNoNo
Delta readsNoYesPartialNo
Token dashboardNoYesNoPartial
Session resumeNoYesNoNo
Single binary, zero depsYesYesYesNo
Auto IDE config25 agents28 agents8 agentsN/A
Portable bundleYesNoNoYes
CLAUDE.md generationYesNoNoNo
LicenseMITMIT + Apache 2.0MITMIT

Cortex is strongest compared to the field in security analysis, staleness-aware memory, CI quality gates, coverage gap analysis, and the combination of graph intelligence with zero-dependency deployment.

Cortex is weaker in shell output compression (LeanCTX’s strongest feature), delta reads for repeated file access, token savings dashboards, and supporting fewer languages than codebase-memory-mcp.


Specific combinations that create compounding valueh2

Individual features are table stakes. The real engineering value comes from specific combinations that no single competitor offers in one binary:

  • Graph intelligence + security analysis + CI gates: Cortex is the only tool that can answer “what calls this function?”, “does user input reach this SQL query?”, and “fail the build if taint flows exist” from the same graph in the same binary. Running cortex ci --fail-on-taint in a GitHub Actions workflow means every PR gets structural security analysis without configuring heavy third-party enterprise security tooling.
  • Cross-session memory + staleness invalidation + ADRs: No other tool combines persistent agent memory with automatic trust degradation. This combination means an agent that has worked on your codebase for months builds genuine institutional knowledge that degrades gracefully instead of becoming silently wrong.
  • Federation + HTTP linking + blast radius: For microservice architectures, this combination answers “if I change this endpoint in service A, what breaks in services B, C, and D?” in one tool call across repository boundaries.
  • Leiden community detection + build system awareness + hotspot analysis: This combination produces a structural health report that would take a senior architect a week to assemble manually. You can see your actual module boundaries, where they diverge from intended boundaries, and where the highest-risk code lives, in one command, with zero LLM cost.
  • Smart tool routing + token budget management + task context extraction: The agent describes its task. Cortex returns exactly the relevant subgraph sized to fit the agent’s remaining context budget. The agent starts working with full structural awareness without having read a single file.

Real usage patternsh2

Here is what using Cortex looks like in practice across different workflows.

Refactoring a legacy moduleh3

Before touching anything, understand the impact:

Terminal window
cortex impact LegacyAuth.validateSession --depth 5
# Output: 47 functions across 12 files depend on this
# Risk: 3 taint paths flow through this function
# Hotspot score: 340 (changed 17 times in 6 months, 20 callers)
# Check what the agent remembers about this code
cortex memory show
# Output: 2 observations (1 stale), 1 ADR about the auth migration plan
# Start the refactoring session with full context
cortex serve --smart-tools
# Agent asks: "explain LegacyAuth.validateSession"
# Cortex returns: callers, callees, observations, security flags in ~800 tokens

Onboarding to an unfamiliar codebaseh3

Terminal window
cortex index # 535ms for a typical project
cortex report # generates CORTEX_REPORT.md with full architecture overview
cortex viz --export graph.html # visual map of the codebase structure
cortex modules # shows module boundaries and coupling scores
# In your agent session:
# "What is the architecture of this project?"
# Cortex returns: languages, entry points, module structure, ~1000 tokens
# vs. the agent reading 20+ files to figure this out

Security audit before a releaseh3

Terminal window
cortex security scan # taint flows + OWASP patterns
cortex security vulns # dependency CVEs from OSV.dev
cortex ci --fail-on-taint --fail-on-owasp --format json > security-report.json
# In CI (GitHub Actions):
# - name: Security gate
# run: cortex ci --fail-on-taint --fail-on-owasp
# Exit code 1 blocks the merge if issues exist

Microservice architecture analysish3

Terminal window
cortex federate add ../user-service
cortex federate add ../payment-service
cortex federate add ../notification-service
# Now queries span all services
# "What calls PaymentService.processRefund?"
# Returns callers from user-service AND notification-service
# "What HTTP routes exist across all services?"
# Returns unified route map with cross-service links

Finding coverage gaps that matterh3

Terminal window
cortex coverage --lcov coverage.lcov --limit 20
# Output ranked by risk:
# 1. db::connection_pool::acquire | 0% covered | 47 callers
# 2. auth::token::refresh | 0% covered | 31 callers
# 3. api::middleware::rate_limit | 0% covered | 28 callers
# ...
# These are the functions where a bug would affect the most code

Configuration and tuningh2

Cortex reads configuration from environment variables (prefix CORTEX_) and a configuration file at .cortex/config.toml in the repository root. Environment variables override file values.

# .cortex/config.toml - all fields optional, defaults shown
repo_root = "."
data_dir = ".cortex"
log_level = "info"
max_traversal_depth = 5 # max BFS depth for callers/callees/blast_radius
max_graph_query_results = 500 # cap on query results
auto_index = true # re-index on file changes
update_check = true # check for new versions on startup
auto_bundle_export = true # export cortex.json after each index
pool_size = 4 # read connections (1-16)
additional_repos = [] # paths for multi-repo federation

For most projects, the defaults work without any configuration file. The only tuning most developers do is increasing pool_size if they have multiple agents querying simultaneously, or setting additional_repos for federation.


Supported platforms (all 25)h2

PlatformInstall commandConfig location
Claude Code (Linux/Mac)cortex install~/.claude/settings.json
Claude Code (Windows)cortex install --platform claude-code%APPDATA%\Claude\claude_desktop_config.json
Cursorcortex cursor install~/.cursor/mcp.json
VS Code Copilot Chatcortex vscode install.vscode/mcp.json
GitHub Copilot CLIcortex install --platform copilot~/.config/github-copilot/mcp.json
Windsurfcortex install --platform windsurf~/.codeium/windsurf/mcp_config.json
Kiro IDEcortex kiro install~/.kiro/settings/mcp.json
Zedcortex install --platform zed~/.config/zed/settings.json
JetBrainscortex install --platform jetbrains~/.config/github-copilot/mcp.json
Cline/Roocortex install --platform cline.cline/mcp.json
OpenAI Codexcortex install --platform codex~/.codex/mcp.json
OpenCodecortex install --platform opencodedetected automatically
OpenClawcortex install --platform openclawdetected automatically
Factory Droidcortex install --platform droiddetected automatically
Traecortex install --platform traedetected automatically
Trae CNcortex install --platform trae-cndetected automatically
Gemini CLIcortex install --platform geminidetected automatically
Hermescortex install --platform hermesdetected automatically
Kimi Codecortex install --platform kimidetected automatically
Pi coding agentcortex install --platform pidetected automatically
Google Antigravitycortex antigravity installdetected automatically
Aidercortex install --platform aiderdetected automatically
Continue.devcortex install --platform continuedetected automatically
Supermavencortex install --platform supermavendetected automatically
Tabninecortex install --platform tabninedetected automatically

The install command is idempotent. Running it again does not duplicate configuration; it merges the Cortex MCP server entry into existing config files without overwriting other settings.


What the future looks likeh2

Cortex 1.0 shipped in May 2026. The roadmap from here is informed by what the competitive landscape validates as high-value and what Cortex’s architecture makes uniquely possible.

Near-term (already in progress):

  • Kotlin tree-sitter support (waiting for upstream updates to resolve version conflicts with older grammars).
  • More granular taint propagation (tracking data flow through struct fields, not just function boundaries).
  • Incremental bundle export (diff-based updates instead of full serialization).

Medium-term:

  • Live collaboration mode: Multiple agents querying the same Cortex instance simultaneously with conflict-free observation writes.
  • Graph diffing between branches: Not just “what files changed” but “what structural relationships changed” between main and a feature branch.
  • Custom tree-sitter queries: Let users define their own extraction patterns for domain-specific constructs (e.g., extracting GraphQL resolvers, React component props, database migration steps).

Long-term:

  • Distributed federation: Query across repositories on different machines (currently federation requires local filesystem access).
  • Temporal graph: Track how the call graph evolves over time, not just its current state. “When did this function gain 20 new callers?” “When did this module become coupled to that one?”
  • Agent coordination: When multiple agents work on the same codebase, Cortex could mediate their observations and flag conflicts.

The core thesis does not change: agents need structure, not text. As models get larger context windows, the token savings become less about fitting within limits and more about signal-to-noise ratio. A 200-token graph result is not just cheaper than a 20,000-token file dump. It is also cleaner. The agent does not have to parse irrelevant code to find the structural answer.


Try ith2

Terminal window
# Install (downloads binary, detects your agents, writes MCP config)
npx @1337xcode/cortex install
# Index your repository
cortex index
# Start using it - ask your agent structural questions
# "What calls processOrder?"
# "What breaks if I change DatabasePool.acquire?"
# "Show me the security findings"
# "What are the module boundaries?"
# "Find dead code in this project"

Closing thoughtsh2

The AI coding agent space in 2026 is crowded with model improvements, prompt engineering techniques, and context window expansions. Most of that work happens at the model layer or the prompt layer. Almost nobody is working seriously on the data layer, the question of what information the agent actually receives and in what form.

Repomix gives agents everything at once. LeanCTX compresses what agents receive. Engram intercepts and summarizes. Context7 injects documentation. Each approach has merit. Cortex takes a different position: it pre-computes the structural relationships that agents need most frequently and serves them as compressed, queryable, persistent knowledge.

The problem Cortex solves is prominent but undertouched. Agents are memory-constrained like Jolyne under Jail House Lock. The solution is engineering, not bigger models. Give the agent a mirror that shows all the bullets at once instead of making it memorize them one by one.

The code is open. The license is MIT. The binary is free. If you are building AI coding tools, working with AI coding agents daily, or just tired of watching your agent burn 50,000 tokens to answer a question that should cost 200, give it a try. The graph does not lie.

NOTE

Cortex is open source and licensed under the MIT license. The source code, npm package, and documentation are available on GitHub.


Further readingh2

Author: 1337XCode
Post: Cortex: Your AI Agent Is Reading Your Code Wrong
License: CC BY-NC 4.0