This document describes the testing infrastructure for the Codex repository, including CI/CD pipelines, test organization, test harnesses, mock servers, and execution patterns. For information about the build and release process, see Release Pipeline. For setting up a development environment, see Development Setup.
The Codex testing infrastructure consists of:
nextest for fast feedbacksccache to optimize repeated CI runsSources: .github/workflows/rust-ci.yml1-685
The CI pipeline uses a changed file detection mechanism to skip unnecessary work when only non-Rust files are modified. The changed job inspects the diff and sets outputs that downstream jobs check before executing.
The lint_build and tests jobs execute across multiple platform/architecture combinations:
| Platform | Target | Profile | Job |
|---|---|---|---|
| macOS 15 (xlarge) | aarch64-apple-darwin | dev, release | lint_build, tests |
| macOS 15 (xlarge) | x86_64-apple-darwin | dev | lint_build |
| Ubuntu 24.04 | x86_64-unknown-linux-musl | dev, release | lint_build |
| Ubuntu 24.04 | x86_64-unknown-linux-gnu | dev | lint_build, tests |
| Ubuntu 24.04 ARM | aarch64-unknown-linux-musl | dev, release | lint_build |
| Ubuntu 24.04 ARM | aarch64-unknown-linux-gnu | dev | lint_build, tests |
| Windows x64 | x86_64-pc-windows-msvc | dev, release | lint_build, tests |
| Windows ARM64 | aarch64-pc-windows-msvc | dev, release | lint_build, tests |
Sources: .github/workflows/rust-ci.yml105-183 .github/workflows/rust-ci.yml468-499
The pipeline uses two levels of caching:
Sources: .github/workflows/rust-ci.yml98-104 .github/workflows/rust-ci.yml224-280 .github/workflows/rust-ci.yml403-426
The sccache backend automatically detects GitHub Actions cache availability via ACTIONS_CACHE_URL and ACTIONS_RUNTIME_TOKEN environment variables. When these are present, it uses the GitHub Actions cache backend; otherwise, it falls back to local disk storage.
Tests are organized into distinct layers:
codex-rs/
├── core/
│ ├── src/
│ │ ├── **/*.rs # Unit tests inline with implementation
│ │ ├── context_manager/
│ │ │ └── history_tests.rs # Module-level unit tests
│ │ └── unified_exec/
│ │ └── mod.rs # Component tests at bottom
│ └── tests/
│ └── suite/
│ ├── unified_exec.rs # Integration tests for unified exec
│ ├── compact.rs # Compaction scenarios
│ ├── compact_remote.rs # Remote compaction
│ ├── compact_resume_fork.rs # Multi-operation flows
│ └── resume_warning.rs # Resume edge cases
Sources: codex-rs/core/tests/suite/unified_exec.rs1-935 codex-rs/core/tests/suite/compact.rs1-1447 codex-rs/core/src/context_manager/history_tests.rs1-1473
| Category | Location | Purpose | Example |
|---|---|---|---|
| Unit Tests | Inline in source files | Test individual functions/methods | codex-rs/core/src/context_manager/history_tests.rs258-278 |
| Component Tests | Module #[cfg(test)] blocks | Test module interactions | codex-rs/core/src/unified_exec/mod.rs178-506 |
| Integration Tests | tests/suite/*.rs | Test cross-component flows | codex-rs/core/tests/suite/unified_exec.rs159-285 |
| Scenario Tests | tests/suite/*.rs | Test complete user workflows | codex-rs/core/tests/suite/compact_resume_fork.rs139-236 |
The test harness provides a complete mock environment for testing Codex without real API calls:
Sources: codex-rs/core/tests/suite/unified_exec.rs30-32 codex-rs/core/tests/suite/compact.rs26-46
The test_codex() function returns a builder that allows configuration customization before spawning a test instance:
The builder configures:
Sources: codex-rs/core/tests/suite/unified_exec.rs164-170 codex-rs/core/tests/suite/unified_exec.rs293-304
Tests use wiremock::MockServer to simulate the OpenAI Responses API. Helper functions provide convenient response builders:
| Function | Purpose | Example |
|---|---|---|
ev_response_created(id) | Start of response | Creates response header |
ev_assistant_message(id, text) | Model text output | Assistant message delta |
ev_function_call(call_id, name, args) | Tool invocation | Triggers tool execution |
ev_completed(id) | End of response | Marks completion |
sse(events) | SSE response wrapper | Wraps events in SSE format |
mount_sse_sequence(server, responses) | Sequential responses | Mounts multiple responses in order |
Sources: codex-rs/core/tests/suite/unified_exec.rs182-193 codex-rs/core/tests/suite/compact.rs207-219
The RequestLog returned by mounting functions captures all requests for assertion:
Sources: codex-rs/core/tests/suite/compact.rs223-276 codex-rs/core/tests/suite/compact_resume_fork.rs176-196
Tests wait for specific events from the Codex thread using helper functions:
Sources: codex-rs/core/tests/suite/unified_exec.rs222-249 codex-rs/core/tests/suite/compact.rs246-269
Common patterns:
Tests require:
cargo-nextest for parallel executionSources: codex-rs/rust-toolchain.toml1-4 .github/workflows/rust-ci.yml501-514
Linux: Enable unprivileged user namespaces for bubblewrap:
Sources: .github/workflows/rust-ci.yml584-593
The tests job in CI uses specific configurations:
Environment variables:
RUST_BACKTRACE=1 - Full backtraces on failureNEXTEST_STATUS_LEVEL=leak - Show leaked test resourcesSources: .github/workflows/rust-ci.yml595-601
The ci-test Cargo profile is optimized for CI builds, balancing compilation speed with runtime performance.
Tests use macros to skip execution when dependencies aren't available:
Sources: codex-rs/core/tests/suite/unified_exec.rs161-163 codex-rs/core/tests/suite/compact.rs28
The context_snapshot module provides utilities for capturing and formatting conversation context for snapshot assertions:
This generates human-readable snapshots of the model-visible context at different points in a conversation flow.
Sources: codex-rs/core/tests/suite/compact_resume_fork.rs183-197 codex-rs/core/tests/suite/compact_remote.rs19-25
For testing long-running processes (like unified exec), tests use process ID files and lifecycle helpers:
Sources: codex-rs/core/tests/suite/unified_exec.rs16-19
Tests include utilities for parsing structured output from tools:
Sources: codex-rs/core/tests/suite/unified_exec.rs51-131
Tests can enable specific features during setup:
Sources: codex-rs/core/tests/suite/unified_exec.rs165-169 codex-rs/core/tests/suite/compact.rs227-231
Tests typically disable approval prompts and sandboxing for deterministic execution:
Sources: codex-rs/core/tests/suite/unified_exec.rs208-209 codex-rs/core/src/unified_exec/mod.rs196-201
The unified exec system supports deterministic process ID allocation for tests:
This ensures process IDs are sequential (1000, 1001, 1002...) rather than random in test environments.
Sources: codex-rs/core/src/unified_exec/mod.rs46-48 codex-rs/core/src/unified_exec/process_manager.rs68-84
Each test creates isolated temporary directories via TempDir:
Sources: codex-rs/core/tests/suite/unified_exec.rs374-382 codex-rs/core/tests/suite/unified_exec.rs462-463
Tests properly shut down Codex threads to flush state:
Sources: codex-rs/core/tests/suite/compact.rs363-365
Many tests follow a pattern of:
Sources: codex-rs/core/tests/suite/compact.rs199-276
Tests validate that events occur in the expected order:
Sources: codex-rs/core/tests/suite/unified_exec.rs218-268
Testing long-lived processes requires coordination between spawning and polling:
Refresh this wiki