This page documents Claude Code's context window management system, including token tracking, automatic and manual compaction strategies, and memory preservation across conversation boundaries. For information about memory scopes that persist across sessions, see Memory System. For details on how MCP tools interact with context limits, see MCP Server Integration.
Context window management is critical to maintaining long-running Claude Code sessions. The system monitors token usage in real-time, automatically compacts conversations when approaching limits, and provides manual controls for optimization. Context compaction creates a summarized version of the conversation history while preserving essential information like session names, plan mode state, and custom configurations.
Diagram: Context Window Management Architecture
The system tracks tokens from multiple sources and triggers compaction when approaching the effective context window limit. State preservation ensures critical session metadata survives the compaction process.
Sources: CHANGELOG.md38 CHANGELOG.md73-76 CHANGELOG.md258 CHANGELOG.md314 CHANGELOG.md400 CHANGELOG.md452 CHANGELOG.md458 CHANGELOG.md468
The system continuously monitors token consumption and displays usage statistics in the status line. Token counting covers all conversation elements including user messages, assistant responses, tool results, and tool descriptions.
| Context Component | Token Impact | Notes |
|---|---|---|
| System Prompt | Fixed overhead | Date moved out to improve cache hit rates |
| User Messages | Variable | Includes text and attachment content |
| Assistant Messages | Variable | Thinking blocks count toward usage |
| Tool Results | Variable | Large outputs saved to disk with file references |
| Images | High | Dimension limits enforced, suggest /compact on errors |
| PDF Documents | High | Stripped before compaction API call |
| MCP Tool Descriptions | Grows with servers | Auto-deferred when exceeding 10% threshold |
| Skill Descriptions | 2% of context | Scales with context window size |
The status line exposes context_window.used_percentage and context_window.remaining_percentage fields for scripts and extensions to query current usage.
Sources: CHANGELOG.md143 CHANGELOG.md230 CHANGELOG.md251 CHANGELOG.md257 CHANGELOG.md458 CHANGELOG.md468
The prompt cache system reduces token costs by reusing previously-sent context. Cache invalidation occurs when:
The system tracks cache hit rates and optimizes prompt structure to maximize cache reuse. The date was moved out of the system prompt specifically to improve caching performance.
Sources: CHANGELOG.md140 CHANGELOG.md258
Diagram: Auto-Compaction Trigger Logic
Auto-compaction triggers when token usage reaches approximately 98% of the effective context window (total context minus reserved output tokens). This calculation was corrected from an earlier implementation that aggressively blocked at 65% and later incorrectly used the full window without accounting for output token reservation.
Sources: CHANGELOG.md314 CHANGELOG.md400 CHANGELOG.md452
Before sending conversation history to the compaction API, the system performs preprocessing:
This preprocessing prevents compaction failures that would otherwise occur with media-rich conversations.
Sources: CHANGELOG.md38 CHANGELOG.md257
Auto-compact failures are handled gracefully:
/compact to retrySources: CHANGELOG.md164
/compact CommandDiagram: Manual Compaction Flow
Users can explicitly trigger compaction at any time using the /compact command. This is useful for:
The /context command displays current token usage statistics using the same counts shown in the status line.
Sources: CHANGELOG.md143 CHANGELOG.md281 CHANGELOG.md384 CHANGELOG.md392 CHANGELOG.md406
Diagram: State Preservation During Compaction
Several critical bugs were fixed to ensure metadata survives compaction:
Session Name Preservation (Fixed in 2.1.47): Custom session titles set via /rename are now preserved through compaction. Previously, renamed sessions would revert to default names after auto-compact.
Plan Mode Preservation (Fixed in 2.1.47): Plan mode state is maintained across compaction, preventing the model from switching from planning to implementation unexpectedly.
Subagent Context Management: Subagent message histories use the correct model during compaction and are trimmed after task completion to manage memory usage.
Sources: CHANGELOG.md43 CHANGELOG.md73-76 CHANGELOG.md118 CHANGELOG.md273 CHANGELOG.md329 CHANGELOG.md507
The system enforces a blocking limit to prevent API errors from exceeding the context window:
Diagram: Context Window Blocking Logic
The blocking limit calculation has evolved through several fixes:
Sources: CHANGELOG.md314 CHANGELOG.md400 CHANGELOG.md452
The status line displays a "Context remaining" warning when approaching the limit. This warning:
/compact successfullyA bug was fixed where the warning would not hide immediately after manual compaction, requiring users to send another message before the UI updated.
Sources: CHANGELOG.md281 CHANGELOG.md384 CHANGELOG.md392 CHANGELOG.md474 CHANGELOG.md488
Diagram: MCP Tool Context Management
When MCP tool descriptions exceed 10% of the context window, they are automatically deferred and made discoverable through the MCPSearch tool instead of loading upfront. This default-enabled behavior (as of 2.1.7) significantly reduces context usage for users with many MCP servers configured.
Users can disable auto-defer by adding MCPSearch to the disallowedTools list in their settings, forcing all MCP tools to load into context immediately.
The system also supports custom thresholds using auto:N syntax, where N is a percentage (0-100) of the context window.
Sources: CHANGELOG.md434 CHANGELOG.md458
The Read tool enforces token limits on file content to prevent context overflow:
Diagram: File Read Token Limiting
The CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS environment variable allows users to customize the file read token limit. Files exceeding this limit are either truncated or returned as lightweight references rather than inline content.
Sources: CHANGELOG.md573
Large PDFs receive special treatment:
| PDF Size | Behavior | Rationale |
|---|---|---|
| ≤10 pages | Inline content | Reasonable context usage |
| >10 pages | Lightweight reference | Prevent context overflow |
| Any size | Pages parameter support | Read specific ranges (e.g., pages: "1-5") |
When a PDF exceeds 100 pages, error messages explicitly state the limit and suggest compaction or selective page reading.
Sources: CHANGELOG.md245 CHANGELOG.md251
Diagram: Tool Output Storage Strategy
Large tool outputs (bash commands and other tool results) are saved to disk instead of being truncated or included inline. This approach:
Read toolMemory usage improvements in 2.1.25 addressed issues where shell command output caused unbounded RSS growth by implementing better streaming and cleanup.
Sources: CHANGELOG.md122 CHANGELOG.md534-535
Skill descriptions consume a percentage-based allocation of the context window:
Diagram: Skill Context Budget Allocation
The skill character budget scales with the context window (2% of total context). Users with larger context windows can see more skill descriptions without truncation. This scaling ensures that skill descriptions don't consume a disproportionate amount of context as more skills are added.
Sources: CHANGELOG.md230
Diagram: Session Resume with Context Awareness
The session loading system implements several optimizations and fixes:
Large First Message Handling: Sessions with first messages exceeding 16KB would disappear from the resume list or fail to load. This was fixed to properly handle large prompts and array-format content.
Compaction Awareness: When resuming a session that was previously compacted, the system loads the compact summary rather than the full conversation history. A bug that caused resume to load full history instead of the compact summary was fixed in 2.1.20.
Memory Efficiency: The session index was replaced with stat-based loading and progressive enrichment, reducing memory usage by 68% for users with many sessions.
Sources: CHANGELOG.md62 CHANGELOG.md79 CHANGELOG.md93 CHANGELOG.md265 CHANGELOG.md329
| Setting | Type | Description |
|---|---|---|
context_window.used_percentage | Read-only | Current context usage as percentage |
context_window.remaining_percentage | Read-only | Remaining context as percentage |
CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS | Environment | Override file read token limit |
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS | Environment | Disable beta features including context management |
disallowedTools | Setting | Block specific tools (e.g., MCPSearch to force MCP tool loading) |
The context management system surfaces information through several UI components:
Status Line: Displays real-time token usage percentage and warning indicators
/context Command: Shows detailed token statistics and breakdown
/compact Command: Triggers manual compaction with progress feedback
Warning Alerts: "Context remaining" notifications when approaching limits
Sources: CHANGELOG.md278 CHANGELOG.md289 CHANGELOG.md468 CHANGELOG.md573
Refresh this wiki