This context limit naturally limits the size of a code base that an LLM can process at one time, and if you feed the AI model with many large code files (which must be re-evaluated by the LLM every time you send another response), this can burn through tokens or usage limits quite quickly.
Tips of the trade
To get around these limitations, coding agent creators use several tricks. For example, AI models are refined to write code to outsource activities to other software tools. For example, they can write Python scripts to extract data from images or files rather than passing the entire file through an LLM, which saves tokens and avoids inaccurate results.
Anthropic Documentation Remarks Claude Code also uses this approach to perform complex data analysis on large databases, writing targeted queries and using Bash commands such as “head” and “tail” to analyze large volumes of data without ever loading the full data objects in context.
(In a way, these AI agents are guided but semi-autonomous programs using tools that are a major extension of a concept we first saw early 2023.)
Another major advance in the field of agents came from dynamic context management. Agents can do this in several ways that are not fully disclosed in proprietary coding models, but we know the most important technique they use: context compression.
The command line version of OpenAI Codex running in a macOS terminal window.
Credit: Benj Edwards
When a coding LLM approaches its context limit, this technique compresses the context history by summarizing it, losing details in the process but shortening the history to key details. Anthropic Documentation describe This “compaction” involves distilling contextual content in a high-fidelity manner, preserving key details such as architectural decisions and unresolved bugs while removing redundant tool output.
This means that the AI coding agents periodically “forget” much of what they are doing each time this compression occurs, but unlike older LLM-based systems, they do not completely have a clue what happened and can quickly reorient themselves by reading existing code, written notes left in files, change logs, etc.
