Testing Infrastructure

Relevant source files

This document describes the testing infrastructure for the Codex repository, including CI/CD pipelines, test organization, test harnesses, mock servers, and execution patterns. For information about the build and release process, see Release Pipeline. For setting up a development environment, see Development Setup.

Overview

The Codex testing infrastructure consists of:

Multi-platform CI pipeline running lints, builds, and tests across macOS, Linux, and Windows
Hierarchical test organization with unit tests, integration tests, and end-to-end scenarios
Rich test harness providing mock servers, response builders, and assertion utilities
Parallel test execution using nextest for fast feedback
Build caching via sccache to optimize repeated CI runs

CI Pipeline Architecture

Sources: .github/workflows/rust-ci.yml1-685

The CI pipeline uses a changed file detection mechanism to skip unnecessary work when only non-Rust files are modified. The changed job inspects the diff and sets outputs that downstream jobs check before executing.

Platform Matrix

The lint_build and tests jobs execute across multiple platform/architecture combinations:

Platform	Target	Profile	Job
macOS 15 (xlarge)	aarch64-apple-darwin	dev, release	lint_build, tests
macOS 15 (xlarge)	x86_64-apple-darwin	dev	lint_build
Ubuntu 24.04	x86_64-unknown-linux-musl	dev, release	lint_build
Ubuntu 24.04	x86_64-unknown-linux-gnu	dev	lint_build, tests
Ubuntu 24.04 ARM	aarch64-unknown-linux-musl	dev, release	lint_build
Ubuntu 24.04 ARM	aarch64-unknown-linux-gnu	dev	lint_build, tests
Windows x64	x86_64-pc-windows-msvc	dev, release	lint_build, tests
Windows ARM64	aarch64-pc-windows-msvc	dev, release	lint_build, tests

Sources: .github/workflows/rust-ci.yml105-183 .github/workflows/rust-ci.yml468-499

Caching Strategy

The pipeline uses two levels of caching:

Cargo Home Cache: Stores compiled dependencies, registry indices, and git databases
sccache: Caches compiled object files across builds (non-Windows only)

Sources: .github/workflows/rust-ci.yml98-104 .github/workflows/rust-ci.yml224-280 .github/workflows/rust-ci.yml403-426

The sccache backend automatically detects GitHub Actions cache availability via ACTIONS_CACHE_URL and ACTIONS_RUNTIME_TOKEN environment variables. When these are present, it uses the GitHub Actions cache backend; otherwise, it falls back to local disk storage.

Test Organization

Directory Structure

Tests are organized into distinct layers:

codex-rs/
├── core/
│   ├── src/
│   │   ├── **/*.rs              # Unit tests inline with implementation
│   │   ├── context_manager/
│   │   │   └── history_tests.rs  # Module-level unit tests
│   │   └── unified_exec/
│   │       └── mod.rs            # Component tests at bottom
│   └── tests/
│       └── suite/
│           ├── unified_exec.rs      # Integration tests for unified exec
│           ├── compact.rs           # Compaction scenarios
│           ├── compact_remote.rs    # Remote compaction
│           ├── compact_resume_fork.rs  # Multi-operation flows
│           └── resume_warning.rs    # Resume edge cases

Sources: codex-rs/core/tests/suite/unified_exec.rs1-935 codex-rs/core/tests/suite/compact.rs1-1447 codex-rs/core/src/context_manager/history_tests.rs1-1473

Test Categories

Category	Location	Purpose	Example
Unit Tests	Inline in source files	Test individual functions/methods	codex-rs/core/src/context_manager/history_tests.rs258-278
Component Tests	Module `#[cfg(test)]` blocks	Test module interactions	codex-rs/core/src/unified_exec/mod.rs178-506
Integration Tests	`tests/suite/*.rs`	Test cross-component flows	codex-rs/core/tests/suite/unified_exec.rs159-285
Scenario Tests	`tests/suite/*.rs`	Test complete user workflows	codex-rs/core/tests/suite/compact_resume_fork.rs139-236

Test Harness Infrastructure

The test harness provides a complete mock environment for testing Codex without real API calls:

Sources: codex-rs/core/tests/suite/unified_exec.rs30-32 codex-rs/core/tests/suite/compact.rs26-46

TestCodex Builder

The test_codex() function returns a builder that allows configuration customization before spawning a test instance:

The builder configures:

Model selection
Feature flags
Approval/sandbox policies
Provider endpoints (pointing to mock server)
Working directory (temporary)

Sources: codex-rs/core/tests/suite/unified_exec.rs164-170 codex-rs/core/tests/suite/unified_exec.rs293-304

Mock Server Infrastructure

Tests use wiremock::MockServer to simulate the OpenAI Responses API. Helper functions provide convenient response builders:

Response Builder Functions

Function	Purpose	Example
`ev_response_created(id)`	Start of response	Creates response header
`ev_assistant_message(id, text)`	Model text output	Assistant message delta
`ev_function_call(call_id, name, args)`	Tool invocation	Triggers tool execution
`ev_completed(id)`	End of response	Marks completion
`sse(events)`	SSE response wrapper	Wraps events in SSE format
`mount_sse_sequence(server, responses)`	Sequential responses	Mounts multiple responses in order

Sources: codex-rs/core/tests/suite/unified_exec.rs182-193 codex-rs/core/tests/suite/compact.rs207-219

Request Inspection

The RequestLog returned by mounting functions captures all requests for assertion:

Sources: codex-rs/core/tests/suite/compact.rs223-276 codex-rs/core/tests/suite/compact_resume_fork.rs176-196

Event Waiting Utilities

Tests wait for specific events from the Codex thread using helper functions:

Sources: codex-rs/core/tests/suite/unified_exec.rs222-249 codex-rs/core/tests/suite/compact.rs246-269

Common patterns:

Test Execution

Running Tests Locally

Prerequisites

Tests require:

Rust toolchain 1.93.0 (codex-rs/rust-toolchain.toml1-4)
cargo-nextest for parallel execution
DotSlash for some integration tests
Platform-specific sandbox support (bubblewrap on Linux)

Sources: codex-rs/rust-toolchain.toml1-4 .github/workflows/rust-ci.yml501-514

Basic Commands

Platform-Specific Setup

Linux: Enable unprivileged user namespaces for bubblewrap:

Sources: .github/workflows/rust-ci.yml584-593

CI Test Execution

The tests job in CI uses specific configurations:

Environment variables:

RUST_BACKTRACE=1 - Full backtraces on failure
NEXTEST_STATUS_LEVEL=leak - Show leaked test resources

Sources: .github/workflows/rust-ci.yml595-601

The ci-test Cargo profile is optimized for CI builds, balancing compilation speed with runtime performance.

Test Utilities and Patterns

Conditional Test Execution

Tests use macros to skip execution when dependencies aren't available:

Sources: codex-rs/core/tests/suite/unified_exec.rs161-163 codex-rs/core/tests/suite/compact.rs28

Context Snapshot Testing

The context_snapshot module provides utilities for capturing and formatting conversation context for snapshot assertions:

This generates human-readable snapshots of the model-visible context at different points in a conversation flow.

Sources: codex-rs/core/tests/suite/compact_resume_fork.rs183-197 codex-rs/core/tests/suite/compact_remote.rs19-25

Process Lifecycle Testing

For testing long-running processes (like unified exec), tests use process ID files and lifecycle helpers:

Sources: codex-rs/core/tests/suite/unified_exec.rs16-19

Output Parsing

Tests include utilities for parsing structured output from tools:

Sources: codex-rs/core/tests/suite/unified_exec.rs51-131

Test Configuration

Feature Flags for Tests

Tests can enable specific features during setup:

Sources: codex-rs/core/tests/suite/unified_exec.rs165-169 codex-rs/core/tests/suite/compact.rs227-231

Approval and Sandbox Policies

Tests typically disable approval prompts and sandboxing for deterministic execution:

Sources: codex-rs/core/tests/suite/unified_exec.rs208-209 codex-rs/core/src/unified_exec/mod.rs196-201

Deterministic Process IDs

The unified exec system supports deterministic process ID allocation for tests:

This ensures process IDs are sequential (1000, 1001, 1002...) rather than random in test environments.

Sources: codex-rs/core/src/unified_exec/mod.rs46-48 codex-rs/core/src/unified_exec/process_manager.rs68-84

Test Isolation and Cleanup

Temporary Directories

Each test creates isolated temporary directories via TempDir:

Sources: codex-rs/core/tests/suite/unified_exec.rs374-382 codex-rs/core/tests/suite/unified_exec.rs462-463

Thread Lifecycle Management

Tests properly shut down Codex threads to flush state:

Sources: codex-rs/core/tests/suite/compact.rs363-365

Common Test Patterns

Multi-Step Flow Testing

Many tests follow a pattern of:

Mount sequential mock responses
Execute operations step-by-step
Wait for events after each step
Inspect captured requests
Assert on final state

Sources: codex-rs/core/tests/suite/compact.rs199-276

Event Sequence Validation

Tests validate that events occur in the expected order:

Sources: codex-rs/core/tests/suite/unified_exec.rs218-268

Async Process Testing

Testing long-lived processes requires coordination between spawning and polling:

Sources: codex-rs/core/src/unified_exec/mod.rs291-325

Testing Infrastructure

Relevant source files

Overview

The Codex testing infrastructure consists of:

Multi-platform CI pipeline running lints, builds, and tests across macOS, Linux, and Windows
Hierarchical test organization with unit tests, integration tests, and end-to-end scenarios
Rich test harness providing mock servers, response builders, and assertion utilities
Parallel test execution using nextest for fast feedback
Build caching via sccache to optimize repeated CI runs

CI Pipeline Architecture

Sources: .github/workflows/rust-ci.yml1-685

Platform Matrix

The lint_build and tests jobs execute across multiple platform/architecture combinations:

Platform	Target	Profile	Job
macOS 15 (xlarge)	aarch64-apple-darwin	dev, release	lint_build, tests
macOS 15 (xlarge)	x86_64-apple-darwin	dev	lint_build
Ubuntu 24.04	x86_64-unknown-linux-musl	dev, release	lint_build
Ubuntu 24.04	x86_64-unknown-linux-gnu	dev	lint_build, tests
Ubuntu 24.04 ARM	aarch64-unknown-linux-musl	dev, release	lint_build
Ubuntu 24.04 ARM	aarch64-unknown-linux-gnu	dev	lint_build, tests
Windows x64	x86_64-pc-windows-msvc	dev, release	lint_build, tests
Windows ARM64	aarch64-pc-windows-msvc	dev, release	lint_build, tests

Sources: .github/workflows/rust-ci.yml105-183 .github/workflows/rust-ci.yml468-499

Caching Strategy

The pipeline uses two levels of caching:

Cargo Home Cache: Stores compiled dependencies, registry indices, and git databases
sccache: Caches compiled object files across builds (non-Windows only)

Sources: .github/workflows/rust-ci.yml98-104 .github/workflows/rust-ci.yml224-280 .github/workflows/rust-ci.yml403-426

Test Organization

Directory Structure

Tests are organized into distinct layers:

codex-rs/
├── core/
│   ├── src/
│   │   ├── **/*.rs              # Unit tests inline with implementation
│   │   ├── context_manager/
│   │   │   └── history_tests.rs  # Module-level unit tests
│   │   └── unified_exec/
│   │       └── mod.rs            # Component tests at bottom
│   └── tests/
│       └── suite/
│           ├── unified_exec.rs      # Integration tests for unified exec
│           ├── compact.rs           # Compaction scenarios
│           ├── compact_remote.rs    # Remote compaction
│           ├── compact_resume_fork.rs  # Multi-operation flows
│           └── resume_warning.rs    # Resume edge cases

Sources: codex-rs/core/tests/suite/unified_exec.rs1-935 codex-rs/core/tests/suite/compact.rs1-1447 codex-rs/core/src/context_manager/history_tests.rs1-1473

Test Categories

Category	Location	Purpose	Example
Unit Tests	Inline in source files	Test individual functions/methods	codex-rs/core/src/context_manager/history_tests.rs258-278
Component Tests	Module `#[cfg(test)]` blocks	Test module interactions	codex-rs/core/src/unified_exec/mod.rs178-506
Integration Tests	`tests/suite/*.rs`	Test cross-component flows	codex-rs/core/tests/suite/unified_exec.rs159-285
Scenario Tests	`tests/suite/*.rs`	Test complete user workflows	codex-rs/core/tests/suite/compact_resume_fork.rs139-236

Test Harness Infrastructure

The test harness provides a complete mock environment for testing Codex without real API calls:

Sources: codex-rs/core/tests/suite/unified_exec.rs30-32 codex-rs/core/tests/suite/compact.rs26-46

TestCodex Builder

The test_codex() function returns a builder that allows configuration customization before spawning a test instance:

The builder configures:

Model selection
Feature flags
Approval/sandbox policies
Provider endpoints (pointing to mock server)
Working directory (temporary)

Sources: codex-rs/core/tests/suite/unified_exec.rs164-170 codex-rs/core/tests/suite/unified_exec.rs293-304

Mock Server Infrastructure

Tests use wiremock::MockServer to simulate the OpenAI Responses API. Helper functions provide convenient response builders:

Response Builder Functions

Function	Purpose	Example
`ev_response_created(id)`	Start of response	Creates response header
`ev_assistant_message(id, text)`	Model text output	Assistant message delta
`ev_function_call(call_id, name, args)`	Tool invocation	Triggers tool execution
`ev_completed(id)`	End of response	Marks completion
`sse(events)`	SSE response wrapper	Wraps events in SSE format
`mount_sse_sequence(server, responses)`	Sequential responses	Mounts multiple responses in order

Sources: codex-rs/core/tests/suite/unified_exec.rs182-193 codex-rs/core/tests/suite/compact.rs207-219

Request Inspection

The RequestLog returned by mounting functions captures all requests for assertion:

Sources: codex-rs/core/tests/suite/compact.rs223-276 codex-rs/core/tests/suite/compact_resume_fork.rs176-196

Event Waiting Utilities

Tests wait for specific events from the Codex thread using helper functions:

Sources: codex-rs/core/tests/suite/unified_exec.rs222-249 codex-rs/core/tests/suite/compact.rs246-269

Common patterns:

Test Execution

Running Tests Locally

Prerequisites

Tests require:

Rust toolchain 1.93.0 (codex-rs/rust-toolchain.toml1-4)
cargo-nextest for parallel execution
DotSlash for some integration tests
Platform-specific sandbox support (bubblewrap on Linux)

Sources: codex-rs/rust-toolchain.toml1-4 .github/workflows/rust-ci.yml501-514

Basic Commands

Platform-Specific Setup

Linux: Enable unprivileged user namespaces for bubblewrap:

Sources: .github/workflows/rust-ci.yml584-593

CI Test Execution

The tests job in CI uses specific configurations:

Environment variables:

RUST_BACKTRACE=1 - Full backtraces on failure
NEXTEST_STATUS_LEVEL=leak - Show leaked test resources

Sources: .github/workflows/rust-ci.yml595-601

The ci-test Cargo profile is optimized for CI builds, balancing compilation speed with runtime performance.

Test Utilities and Patterns

Conditional Test Execution

Tests use macros to skip execution when dependencies aren't available:

Sources: codex-rs/core/tests/suite/unified_exec.rs161-163 codex-rs/core/tests/suite/compact.rs28

Context Snapshot Testing

The context_snapshot module provides utilities for capturing and formatting conversation context for snapshot assertions:

This generates human-readable snapshots of the model-visible context at different points in a conversation flow.

Sources: codex-rs/core/tests/suite/compact_resume_fork.rs183-197 codex-rs/core/tests/suite/compact_remote.rs19-25

Process Lifecycle Testing

For testing long-running processes (like unified exec), tests use process ID files and lifecycle helpers:

Sources: codex-rs/core/tests/suite/unified_exec.rs16-19

Output Parsing

Tests include utilities for parsing structured output from tools:

Sources: codex-rs/core/tests/suite/unified_exec.rs51-131

Test Configuration

Feature Flags for Tests

Tests can enable specific features during setup:

Sources: codex-rs/core/tests/suite/unified_exec.rs165-169 codex-rs/core/tests/suite/compact.rs227-231

Approval and Sandbox Policies

Tests typically disable approval prompts and sandboxing for deterministic execution:

Sources: codex-rs/core/tests/suite/unified_exec.rs208-209 codex-rs/core/src/unified_exec/mod.rs196-201

Deterministic Process IDs

The unified exec system supports deterministic process ID allocation for tests:

This ensures process IDs are sequential (1000, 1001, 1002...) rather than random in test environments.

Sources: codex-rs/core/src/unified_exec/mod.rs46-48 codex-rs/core/src/unified_exec/process_manager.rs68-84

Test Isolation and Cleanup

Temporary Directories

Each test creates isolated temporary directories via TempDir:

Sources: codex-rs/core/tests/suite/unified_exec.rs374-382 codex-rs/core/tests/suite/unified_exec.rs462-463

Thread Lifecycle Management

Tests properly shut down Codex threads to flush state:

Sources: codex-rs/core/tests/suite/compact.rs363-365

Common Test Patterns

Multi-Step Flow Testing

Many tests follow a pattern of:

Mount sequential mock responses
Execute operations step-by-step
Wait for events after each step
Inspect captured requests
Assert on final state

Sources: codex-rs/core/tests/suite/compact.rs199-276

Event Sequence Validation

Tests validate that events occur in the expected order:

Sources: codex-rs/core/tests/suite/unified_exec.rs218-268

Async Process Testing

Testing long-lived processes requires coordination between spawning and polling:

Sources: codex-rs/core/src/unified_exec/mod.rs291-325

Testing Infrastructure

Overview

CI Pipeline Architecture

Platform Matrix

Caching Strategy

Test Organization

Directory Structure

Test Categories

Test Harness Infrastructure

TestCodex Builder

Mock Server Infrastructure

Response Builder Functions

Request Inspection

Event Waiting Utilities

Test Execution

Running Tests Locally

Prerequisites

Basic Commands

Platform-Specific Setup

CI Test Execution

Test Utilities and Patterns

Conditional Test Execution

Context Snapshot Testing

Process Lifecycle Testing

Output Parsing

Test Configuration

Feature Flags for Tests

Approval and Sandbox Policies

Deterministic Process IDs

Test Isolation and Cleanup

Temporary Directories

Thread Lifecycle Management

Common Test Patterns

Multi-Step Flow Testing

Event Sequence Validation

Async Process Testing

On this page

Testing Infrastructure

Overview

CI Pipeline Architecture

Platform Matrix

Caching Strategy

Test Organization

Directory Structure

Test Categories

Test Harness Infrastructure

TestCodex Builder

Mock Server Infrastructure

Response Builder Functions

Request Inspection

Event Waiting Utilities

Test Execution

Running Tests Locally

Prerequisites

Basic Commands

Platform-Specific Setup

CI Test Execution

Test Utilities and Patterns

Conditional Test Execution

Context Snapshot Testing

Process Lifecycle Testing

Output Parsing

Test Configuration

Feature Flags for Tests

Approval and Sandbox Policies

Deterministic Process IDs

Test Isolation and Cleanup

Temporary Directories

Thread Lifecycle Management

Common Test Patterns

Multi-Step Flow Testing

Event Sequence Validation

Async Process Testing

On this page