Context Window & Compaction

Relevant source files

This page documents Claude Code's context window management system, including token tracking, automatic and manual compaction strategies, and memory preservation across conversation boundaries. For information about memory scopes that persist across sessions, see Memory System. For details on how MCP tools interact with context limits, see MCP Server Integration.

Overview

Context window management is critical to maintaining long-running Claude Code sessions. The system monitors token usage in real-time, automatically compacts conversations when approaching limits, and provides manual controls for optimization. Context compaction creates a summarized version of the conversation history while preserving essential information like session names, plan mode state, and custom configurations.

Context Window Architecture

Diagram: Context Window Management Architecture

The system tracks tokens from multiple sources and triggers compaction when approaching the effective context window limit. State preservation ensures critical session metadata survives the compaction process.

Sources: CHANGELOG.md38 CHANGELOG.md73-76 CHANGELOG.md258 CHANGELOG.md314 CHANGELOG.md400 CHANGELOG.md452 CHANGELOG.md458 CHANGELOG.md468

Token Counting and Tracking

Real-Time Token Usage

The system continuously monitors token consumption and displays usage statistics in the status line. Token counting covers all conversation elements including user messages, assistant responses, tool results, and tool descriptions.

Context Component	Token Impact	Notes
System Prompt	Fixed overhead	Date moved out to improve cache hit rates
User Messages	Variable	Includes text and attachment content
Assistant Messages	Variable	Thinking blocks count toward usage
Tool Results	Variable	Large outputs saved to disk with file references
Images	High	Dimension limits enforced, suggest `/compact` on errors
PDF Documents	High	Stripped before compaction API call
MCP Tool Descriptions	Grows with servers	Auto-deferred when exceeding 10% threshold
Skill Descriptions	2% of context	Scales with context window size

The status line exposes context_window.used_percentage and context_window.remaining_percentage fields for scripts and extensions to query current usage.

Sources: CHANGELOG.md143 CHANGELOG.md230 CHANGELOG.md251 CHANGELOG.md257 CHANGELOG.md458 CHANGELOG.md468

Prompt Caching

The prompt cache system reduces token costs by reusing previously-sent context. Cache invalidation occurs when:

Tool names change
Tool descriptions change
Tool input schemas change

The system tracks cache hit rates and optimizes prompt structure to maximize cache reuse. The date was moved out of the system prompt specifically to improve caching performance.

Sources: CHANGELOG.md140 CHANGELOG.md258

Automatic Compaction

Trigger Conditions

Diagram: Auto-Compaction Trigger Logic

Auto-compaction triggers when token usage reaches approximately 98% of the effective context window (total context minus reserved output tokens). This calculation was corrected from an earlier implementation that aggressively blocked at 65% and later incorrectly used the full window without accounting for output token reservation.

Sources: CHANGELOG.md314 CHANGELOG.md400 CHANGELOG.md452

Content Preparation

Before sending conversation history to the compaction API, the system performs preprocessing:

Image Stripping: All image content blocks are removed to avoid dimension limit errors
PDF Document Stripping: PDF document blocks are stripped alongside images
Phantom Block Removal: Empty "(no content)" text blocks are eliminated to reduce token waste

This preprocessing prevents compaction failures that would otherwise occur with media-rich conversations.

Sources: CHANGELOG.md38 CHANGELOG.md257

Failure Handling

Auto-compact failures are handled gracefully:

Error notifications are suppressed to avoid alarming users
The session continues operating with the existing context
Users can manually invoke /compact to retry

Sources: CHANGELOG.md164

Manual Compaction

`/compact` Command

Diagram: Manual Compaction Flow

Users can explicitly trigger compaction at any time using the /compact command. This is useful for:

Proactively reducing context before starting a large task
Recovering from auto-compact failures
Optimizing context after extensive tool usage

The /context command displays current token usage statistics using the same counts shown in the status line.

Sources: CHANGELOG.md143 CHANGELOG.md281 CHANGELOG.md384 CHANGELOG.md392 CHANGELOG.md406

State Preservation Across Compaction

Critical Metadata

Diagram: State Preservation During Compaction

Several critical bugs were fixed to ensure metadata survives compaction:

Session Name Preservation (Fixed in 2.1.47): Custom session titles set via /rename are now preserved through compaction. Previously, renamed sessions would revert to default names after auto-compact.

Plan Mode Preservation (Fixed in 2.1.47): Plan mode state is maintained across compaction, preventing the model from switching from planning to implementation unexpectedly.

Subagent Context Management: Subagent message histories use the correct model during compaction and are trimmed after task completion to manage memory usage.

Sources: CHANGELOG.md43 CHANGELOG.md73-76 CHANGELOG.md118 CHANGELOG.md273 CHANGELOG.md329 CHANGELOG.md507

Context Window Limits and Warnings

Blocking Behavior

The system enforces a blocking limit to prevent API errors from exceeding the context window:

Diagram: Context Window Blocking Logic

The blocking limit calculation has evolved through several fixes:

Original Bug: Blocked at ~65% usage (too aggressive)
First Fix: Used full context window (didn't account for output tokens)
Current Implementation: Uses effective window (full window minus max output tokens) with ~98% threshold

Sources: CHANGELOG.md314 CHANGELOG.md400 CHANGELOG.md452

Warning Display

The status line displays a "Context remaining" warning when approaching the limit. This warning:

Appears when usage reaches a threshold (typically ~70% after rate limit reset)
Disappears after running /compact successfully
Persists until context is reduced below the warning threshold

A bug was fixed where the warning would not hide immediately after manual compaction, requiring users to send another message before the UI updated.

Sources: CHANGELOG.md281 CHANGELOG.md384 CHANGELOG.md392 CHANGELOG.md474 CHANGELOG.md488

MCP Tool Descriptions and Context

Auto-Defer Mechanism

Diagram: MCP Tool Context Management

When MCP tool descriptions exceed 10% of the context window, they are automatically deferred and made discoverable through the MCPSearch tool instead of loading upfront. This default-enabled behavior (as of 2.1.7) significantly reduces context usage for users with many MCP servers configured.

Users can disable auto-defer by adding MCPSearch to the disallowedTools list in their settings, forcing all MCP tools to load into context immediately.

The system also supports custom thresholds using auto:N syntax, where N is a percentage (0-100) of the context window.

Sources: CHANGELOG.md434 CHANGELOG.md458

File Reading and Context Limits

Token-Based Truncation

The Read tool enforces token limits on file content to prevent context overflow:

Diagram: File Read Token Limiting

The CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS environment variable allows users to customize the file read token limit. Files exceeding this limit are either truncated or returned as lightweight references rather than inline content.

Sources: CHANGELOG.md573

PDF-Specific Handling

Large PDFs receive special treatment:

PDF Size	Behavior	Rationale
≤10 pages	Inline content	Reasonable context usage
>10 pages	Lightweight reference	Prevent context overflow
Any size	Pages parameter support	Read specific ranges (e.g., `pages: "1-5"`)

When a PDF exceeds 100 pages, error messages explicitly state the limit and suggest compaction or selective page reading.

Sources: CHANGELOG.md245 CHANGELOG.md251

Tool Output Management

Large Output Handling

Diagram: Tool Output Storage Strategy

Large tool outputs (bash commands and other tool results) are saved to disk instead of being truncated or included inline. This approach:

Preserves the complete output for Claude to read later
Reduces immediate context consumption
Provides file references that Claude can selectively read using the Read tool

Memory usage improvements in 2.1.25 addressed issues where shell command output caused unbounded RSS growth by implementing better streaming and cleanup.

Sources: CHANGELOG.md122 CHANGELOG.md534-535

Skill and Command Context Budget

Dynamic Skill Allocation

Skill descriptions consume a percentage-based allocation of the context window:

Diagram: Skill Context Budget Allocation

The skill character budget scales with the context window (2% of total context). Users with larger context windows can see more skill descriptions without truncation. This scaling ensures that skill descriptions don't consume a disproportionate amount of context as more skills are added.

Sources: CHANGELOG.md230

Resume and Session Loading

Context-Aware Session Loading

Diagram: Session Resume with Context Awareness

The session loading system implements several optimizations and fixes:

Large First Message Handling: Sessions with first messages exceeding 16KB would disappear from the resume list or fail to load. This was fixed to properly handle large prompts and array-format content.

Compaction Awareness: When resuming a session that was previously compacted, the system loads the compact summary rather than the full conversation history. A bug that caused resume to load full history instead of the compact summary was fixed in 2.1.20.

Memory Efficiency: The session index was replaced with stat-based loading and progressive enrichment, reducing memory usage by 68% for users with many sessions.

Sources: CHANGELOG.md62 CHANGELOG.md79 CHANGELOG.md93 CHANGELOG.md265 CHANGELOG.md329

Configuration Options

Context Management Settings

Setting	Type	Description
`context_window.used_percentage`	Read-only	Current context usage as percentage
`context_window.remaining_percentage`	Read-only	Remaining context as percentage
`CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS`	Environment	Override file read token limit
`CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS`	Environment	Disable beta features including context management
`disallowedTools`	Setting	Block specific tools (e.g., `MCPSearch` to force MCP tool loading)

User Interface Elements

The context management system surfaces information through several UI components:

Status Line: Displays real-time token usage percentage and warning indicators /context Command: Shows detailed token statistics and breakdown /compact Command: Triggers manual compaction with progress feedback Warning Alerts: "Context remaining" notifications when approaching limits

Sources: CHANGELOG.md278 CHANGELOG.md289 CHANGELOG.md468 CHANGELOG.md573

Context Window & Compaction

Overview

Context Window Architecture

Token Counting and Tracking

Real-Time Token Usage

Prompt Caching

Automatic Compaction

Trigger Conditions

Content Preparation

Failure Handling

Manual Compaction

/compact Command

State Preservation Across Compaction

Critical Metadata

Context Window Limits and Warnings

Blocking Behavior

Warning Display

MCP Tool Descriptions and Context

Auto-Defer Mechanism

File Reading and Context Limits

Token-Based Truncation

PDF-Specific Handling

Tool Output Management

Large Output Handling

Skill and Command Context Budget

Dynamic Skill Allocation

Resume and Session Loading

Context-Aware Session Loading

Configuration Options

Context Management Settings

User Interface Elements

On this page

Context Window & Compaction

Overview

Context Window Architecture

Token Counting and Tracking

Real-Time Token Usage

Prompt Caching

Automatic Compaction

Trigger Conditions

Content Preparation

Failure Handling

Manual Compaction

/compact Command

State Preservation Across Compaction

Critical Metadata

Context Window Limits and Warnings

Blocking Behavior

Warning Display

MCP Tool Descriptions and Context

Auto-Defer Mechanism

File Reading and Context Limits

Token-Based Truncation

PDF-Specific Handling

Tool Output Management

Large Output Handling

Skill and Command Context Budget

Dynamic Skill Allocation

Resume and Session Loading

Context-Aware Session Loading

Configuration Options

Context Management Settings

User Interface Elements

On this page

`/compact` Command

`/compact` Command