AI Agents Burning Too Many Tokens? Context Engineering Is the Answer

Karify98 & Amy 🌸·
Cover Image for AI Agents Burning Too Many Tokens? Context Engineering Is the Answer

When AI Agents Read Code Like They're Blind

Using AI coding agents like Claude Code, Cursor, or Codex is increasingly common. But there's a problem few people talk about: the way agents currently "read" codebases is terribly inefficient.

Every time an agent needs to understand where a function is called from, it greps the entire project. Every time it needs to know which class imports another, it reads file by file. The result: dozens of tool calls, thousands of tokens burned — just to find files.

Two open-source projects trending on GitHub on May 26, 2026 tackle this from two angles: data structure and behavior control.

CodeGraph: A Knowledge Graph for AI Agents

The first project is CodeGraph by codegraph-ai — a tool that builds a semantic graph of your entire codebase, parses 37 languages via tree-sitter, and exposes 45 MCP tools for AI agents to query.

Instead of grepping, agents call codegraph_find_symbol to locate a function, codegraph_get_callers to see what calls it, codegraph_get_dependencies to inspect module dependencies. Everything runs 100% locally — no uploading code to the cloud.

The core value: CodeGraph significantly reduces tool calls and token consumption. Instead of reading file by file to understand project structure, agents query the knowledge graph directly — cutting out unnecessary tool calls.

CodeGraph supports Claude Code, Cursor, Windsurf, Codex, Cline, OpenCode, and Hermes Agent. Setup is straightforward: add to ~/.claude.json or install the VS Code extension, and the server auto-indexes the current directory.

andrej-karpathy-skills: Controlling Agent Behavior via CLAUDE.md

The second project is andrej-karpathy-skills by multica-ai — a single CLAUDE.md file designed to improve Claude Code behavior, derived from Andrej Karpathy's observations on common LLM coding pitfalls.

From Karpathy's post:

"The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should."

"They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code… implement a bloated construction over 1000 lines when 100 would do."

The CLAUDE.md file is organized into four principles:

1. Think Before Coding

Don't assume. Don't hide confusion. Surface tradeoffs. If uncertain — stop and ask.

2. Simplicity First

Minimum code that solves the problem. No abstractions for single-use code. No error handling for impossible scenarios. If 200 lines could be 50 — rewrite.

3. Surgical Changes

Only touch lines directly related to the task. Don't "improve" adjacent code. Don't delete others' comments. Clean up only the mess you create.

4. Goal-Driven Execution

Every step must have verifiable success criteria. Turn "do X" into "do X, run tests, confirm output." Don't loop endlessly — stop when you lose control.

Context Engineering: The Missing Piece

What's interesting is that these two projects solve the same problem from opposite directions:

  • CodeGraph → provides structured context
  • andrej-karpathy-skills → controls how agents use that context

This is context engineering — the practice of designing and optimizing the information fed into LLMs. Not old-school prompt engineering (writing better prompts), but building a pipeline that delivers the right information, at the right time, in the right format.

In practice, context engineering for AI coding agents includes:

  • Pre-indexing: Index the codebase before the agent runs (like CodeGraph)
  • Rule files: Define behavioral rules for the agent (like andrej-karpathy-skills)
  • Tool profiles: Limit the tool surface the agent sees to avoid prompt context overload
  • Memory layer: Cache previous analysis results for reuse

A typical Node.js project has 500-2000 files. Dumping everything into a prompt is not viable. Context engineering ensures agents only see what they need.

Note: for small projects (<50 files), the setup cost of CodeGraph may not be worth it. But once grep stops being effective, this investment pays off.

Applying This to Daily Workflow

If you're using Claude Code or Cursor, you can start today:

Step 1: Install CodeGraph

# Add to ~/.claude.json
{
  "mcpServers": {
    "codegraph": {
      "command": "/path/to/codegraph-server",
      "args": ["--mcp"]
    }
  }
}

Step 2: Create a CLAUDE.md for your project, applying the four principles from andrej-karpathy-skills. You can copy directly from the multica-ai/andrej-karpathy-skills repo — the file is designed to be reusable.

Step 3: Limit the MCP tool profile when you don't need the full surface. CodeGraph ships with a core profile (8 tools) for chat sessions and a graph profile (16 tools) for refactoring.

Result: faster agent responses, cheaper token usage, cleaner code output.

Conclusion

AI coding agents are evolving fast, but context remains the primary bottleneck. It's not that models aren't smart enough — it's that they're not given the right information.

CodeGraph and andrej-karpathy-skills are small projects, but they point in the right direction. Instead of waiting for smarter models, give your models better context.


References: