Testing and CI/CD

Relevant source files

This page describes how vLLM is tested and how its continuous integration pipelines are structured. It covers the overall layout of the test infrastructure, the Buildkite CI pipeline organization, hardware-specific testing (including AMD/ROCm), and the model correctness verification framework.

For details on individual topics, see the child pages:

Test Organization — directory layout, test categories, and test_areas YAML files: see 12.1
Buildkite CI Pipelines — pipeline generation, step configuration, and release pipeline: see 12.2
AMD-Specific Testing — ROCm hardware setup, run-amd-test.sh, multi-node AMD configurations: see 12.3
Model Correctness Testing — _HfExamplesInfo, VLM test framework, model registry: see 12.4

Overview

vLLM uses Buildkite as its CI platform. Each pull request triggers a pipeline that builds Docker images and then dispatches a large number of parallelized test steps across NVIDIA and AMD GPU pools, as well as CPU-only environments.

The pipeline configuration was migrated in early 2026 from a single monolithic file .buildkite/test-pipeline.yaml to a modular directory-based structure. The old file now serves only as a pointer:

.buildkite/test-pipeline.yaml — deprecated as of Feb 18, 2026. Content migrated to:

.buildkite/test_areas/ — test job definitions

.buildkite/image_build/ — Docker image build jobs

.buildkite/hardware_tests/ — tests for other hardware (Intel, Ascend, Arm, etc.)

.buildkite/ci_config.yaml — pipeline configuration

Sources: .buildkite/test-pipeline.yaml1-9

CI Pipeline Architecture

Buildkite Pipeline High-Level Flow

Each test area YAML in .buildkite/test_areas/ defines a group with one or more steps. Steps depend on the image-build step completing first. Steps for NVIDIA and AMD hardware run in parallel across different agent pools.

Sources: .buildkite/test-pipeline.yaml1-9 .buildkite/test_areas/entrypoints.yaml1-14 .buildkite/test_areas/engine.yaml1-39

Test Areas: Modular Pipeline Structure

Each file in .buildkite/test_areas/ is a self-contained group of test steps. The step format is uniform:

Test Step Field Reference

Field	Type	Description
`label`	`string`	Display name shown in Buildkite UI
`timeout_in_minutes`	`int`	Max runtime; step is killed if exceeded
`commands`	`list[string]`	Shell commands to execute in order
`source_file_dependencies`	`list[string]`	File path prefixes; step may be skipped if no matching changes
`working_dir`	`string`	Working directory (default: `/vllm-workspace/tests`)
`num_devices`	`int`	Number of GPUs required (default: 1)
`num_nodes`	`int`	Enables multi-node container simulation
`optional`	`bool`	If `true`, must be manually unblocked
`fast_check`	`bool`	Included in every-commit fast-check pipeline
`torch_nightly`	`bool`	Included in nightly PyTorch test pipeline
`grade`	`string`	`Blocking` means step failure blocks the pipeline
`mirror`	`object`	Configuration to run the same step on other hardware

Example step from .buildkite/test_areas/engine.yaml:

Sources: .buildkite/test_areas/engine.yaml1-39 .buildkite/test_areas/entrypoints.yaml1-50 .buildkite/test_areas/samplers.yaml1-22 .buildkite/test_areas/models_language.yaml1-50

Known Test Area Groups

The following groups exist as separate YAML files under .buildkite/test_areas/:

File	Group Name	Example Steps
`engine.yaml`	Engine	Engine, V1 e2e + engine, V1 e2e (multi-GPU)
`entrypoints.yaml`	Entrypoints	Unit Tests, LLM integration, API Server 1 & 2, Responses API
`samplers.yaml`	Samplers	Samplers Test (also with `VLLM_USE_FLASHINFER_SAMPLER=1`)
`models_language.yaml`	Models - Language	Language Models Tests (Standard), Language Models Tests (Extra Standard)

Additional groups exist (not shown in provided files) for kernels, quantization, distributed tests, speculative decoding, multimodal models, and more.

Sources: .buildkite/test_areas/engine.yaml1-39 .buildkite/test_areas/entrypoints.yaml1-50 .buildkite/test_areas/samplers.yaml1-22 .buildkite/test_areas/models_language.yaml1-50

AMD/ROCm Testing Infrastructure

AMD tests are driven by two mechanisms:

test-amd.yaml — AMD-primary test definitions. Steps are defined with AMD agent pools directly (e.g., agent_pool: mi325_1). These tests use AMD-specific flags (mirror_hardwares, grade: Blocking, etc.).
mirror blocks in test_areas/*.yaml — NVIDIA-primary tests can declare an AMD mirror configuration, which re-runs the same tests on ROCm hardware.

Test execution on AMD hardware is handled by .buildkite/scripts/hardware_ci/run-amd-test.sh.

AMD Test Infrastructure Diagram

AMD Agent Pool Reference

Pool Name	GPUs
`mi325_1`	1× AMD MI325
`mi325_2`	2× AMD MI325
`mi325_4`	4× AMD MI325
`mi325_8`	8× AMD MI325

Multi-node detection in run-amd-test.sh uses two signals: the NUM_NODES environment variable (preferred) or structural detection of bracket command syntax [node0_cmds] && [node1_cmds].

Commands should be passed via the VLLM_TEST_COMMANDS environment variable (not positional arguments) to avoid quoting issues.

Sources: .buildkite/test-amd.yaml1-100 .buildkite/scripts/hardware_ci/run-amd-test.sh1-120

Model Correctness Testing Framework

Model correctness tests are parameterized through a registry defined in tests/models/registry.py. This registry maps architecture class names to test metadata via _HfExamplesInfo instances.

Model Test Registry Architecture

The registries in tests/models/registry.py (_TEXT_GENERATION_EXAMPLE_MODELS, _EMBEDDING_EXAMPLE_MODELS, _MULTIMODAL_EXAMPLE_MODELS, _SEQUENCE_CLASSIFICATION_EXAMPLE_MODELS) mirror the production model registry in vllm/model_executor/models/registry.py. Any new architecture added to the production registry should have a corresponding entry in the test registry.

_HfExamplesInfo.check_transformers_version() calls pytest.skip() or raises a RuntimeError when the installed transformers version is outside [min_transformers_version, max_transformers_version]. This allows model-specific gating without removing tests.

For VLM generation tests, tests/models/multimodal/generation/test_common.py uses VLMTestInfo with configurable VLMTestType values:

`VLMTestType`	Description
`IMAGE`	Single image input
`MULTI_IMAGE`	Multiple image inputs
`VIDEO`	Video input
`EMBEDDING`	Image embedding input
`CUSTOM_INPUTS`	Arbitrary custom test inputs

Tests marked with pytest.mark.core_model are included in fast-check CI pipelines. Tests marked with pytest.mark.cpu_model also run on CPU-only agents.

Sources: tests/models/registry.py1-185 tests/models/multimodal/generation/test_common.py1-160 tests/models/multimodal/processing/test_common.py1-100

Test Infrastructure Code Map

The following diagram maps the key test infrastructure code entities to their roles:

Sources: tests/models/registry.py1-185 .buildkite/test_areas/engine.yaml1-39 .buildkite/test-amd.yaml1-50

Relationship to Other Subsystems

Testing exercises the entire vLLM stack. The major tested subsystems include:

Engine and scheduling — tests/engine/, tests/v1/engine/, tests/v1/e2e/
Model execution and GPU runner — tests/models/, tests/kernels/
Serving APIs — tests/entrypoints/openai/, tests/entrypoints/llm/
Distributed execution — tests/distributed/, tests/v1/distributed/
Quantization — tests/quantization/
Samplers — tests/samplers/
KV cache and prefix caching — tests/v1/core/, tests/v1/kv_connector/
Speculative decoding — tests/v1/spec_decode/, tests/spec_decode/

For details on how the model correctness test registry works, see 12.4. For the exact Buildkite step schema and pipeline generation scripts, see 12.2. For ROCm-specific configuration details, see 12.3.

Testing and CI/CD

Relevant source files

For details on individual topics, see the child pages:

Test Organization — directory layout, test categories, and test_areas YAML files: see 12.1
Buildkite CI Pipelines — pipeline generation, step configuration, and release pipeline: see 12.2
AMD-Specific Testing — ROCm hardware setup, run-amd-test.sh, multi-node AMD configurations: see 12.3
Model Correctness Testing — _HfExamplesInfo, VLM test framework, model registry: see 12.4

Overview

The pipeline configuration was migrated in early 2026 from a single monolithic file .buildkite/test-pipeline.yaml to a modular directory-based structure. The old file now serves only as a pointer:

.buildkite/test-pipeline.yaml — deprecated as of Feb 18, 2026. Content migrated to:

.buildkite/test_areas/ — test job definitions

.buildkite/image_build/ — Docker image build jobs

.buildkite/hardware_tests/ — tests for other hardware (Intel, Ascend, Arm, etc.)

.buildkite/ci_config.yaml — pipeline configuration

Sources: .buildkite/test-pipeline.yaml1-9

CI Pipeline Architecture

Buildkite Pipeline High-Level Flow

Sources: .buildkite/test-pipeline.yaml1-9 .buildkite/test_areas/entrypoints.yaml1-14 .buildkite/test_areas/engine.yaml1-39

Test Areas: Modular Pipeline Structure

Each file in .buildkite/test_areas/ is a self-contained group of test steps. The step format is uniform:

Test Step Field Reference

Field	Type	Description
`label`	`string`	Display name shown in Buildkite UI
`timeout_in_minutes`	`int`	Max runtime; step is killed if exceeded
`commands`	`list[string]`	Shell commands to execute in order
`source_file_dependencies`	`list[string]`	File path prefixes; step may be skipped if no matching changes
`working_dir`	`string`	Working directory (default: `/vllm-workspace/tests`)
`num_devices`	`int`	Number of GPUs required (default: 1)
`num_nodes`	`int`	Enables multi-node container simulation
`optional`	`bool`	If `true`, must be manually unblocked
`fast_check`	`bool`	Included in every-commit fast-check pipeline
`torch_nightly`	`bool`	Included in nightly PyTorch test pipeline
`grade`	`string`	`Blocking` means step failure blocks the pipeline
`mirror`	`object`	Configuration to run the same step on other hardware

Example step from .buildkite/test_areas/engine.yaml:

Sources: .buildkite/test_areas/engine.yaml1-39 .buildkite/test_areas/entrypoints.yaml1-50 .buildkite/test_areas/samplers.yaml1-22 .buildkite/test_areas/models_language.yaml1-50

Known Test Area Groups

The following groups exist as separate YAML files under .buildkite/test_areas/:

File	Group Name	Example Steps
`engine.yaml`	Engine	Engine, V1 e2e + engine, V1 e2e (multi-GPU)
`entrypoints.yaml`	Entrypoints	Unit Tests, LLM integration, API Server 1 & 2, Responses API
`samplers.yaml`	Samplers	Samplers Test (also with `VLLM_USE_FLASHINFER_SAMPLER=1`)
`models_language.yaml`	Models - Language	Language Models Tests (Standard), Language Models Tests (Extra Standard)

Additional groups exist (not shown in provided files) for kernels, quantization, distributed tests, speculative decoding, multimodal models, and more.

Sources: .buildkite/test_areas/engine.yaml1-39 .buildkite/test_areas/entrypoints.yaml1-50 .buildkite/test_areas/samplers.yaml1-22 .buildkite/test_areas/models_language.yaml1-50

AMD/ROCm Testing Infrastructure

AMD tests are driven by two mechanisms:

test-amd.yaml — AMD-primary test definitions. Steps are defined with AMD agent pools directly (e.g., agent_pool: mi325_1). These tests use AMD-specific flags (mirror_hardwares, grade: Blocking, etc.).
mirror blocks in test_areas/*.yaml — NVIDIA-primary tests can declare an AMD mirror configuration, which re-runs the same tests on ROCm hardware.

Test execution on AMD hardware is handled by .buildkite/scripts/hardware_ci/run-amd-test.sh.

AMD Test Infrastructure Diagram

AMD Agent Pool Reference

Pool Name	GPUs
`mi325_1`	1× AMD MI325
`mi325_2`	2× AMD MI325
`mi325_4`	4× AMD MI325
`mi325_8`	8× AMD MI325

Multi-node detection in run-amd-test.sh uses two signals: the NUM_NODES environment variable (preferred) or structural detection of bracket command syntax [node0_cmds] && [node1_cmds].

Commands should be passed via the VLLM_TEST_COMMANDS environment variable (not positional arguments) to avoid quoting issues.

Sources: .buildkite/test-amd.yaml1-100 .buildkite/scripts/hardware_ci/run-amd-test.sh1-120

Model Correctness Testing Framework

Model correctness tests are parameterized through a registry defined in tests/models/registry.py. This registry maps architecture class names to test metadata via _HfExamplesInfo instances.

Model Test Registry Architecture

For VLM generation tests, tests/models/multimodal/generation/test_common.py uses VLMTestInfo with configurable VLMTestType values:

`VLMTestType`	Description
`IMAGE`	Single image input
`MULTI_IMAGE`	Multiple image inputs
`VIDEO`	Video input
`EMBEDDING`	Image embedding input
`CUSTOM_INPUTS`	Arbitrary custom test inputs

Tests marked with pytest.mark.core_model are included in fast-check CI pipelines. Tests marked with pytest.mark.cpu_model also run on CPU-only agents.

Sources: tests/models/registry.py1-185 tests/models/multimodal/generation/test_common.py1-160 tests/models/multimodal/processing/test_common.py1-100

Test Infrastructure Code Map

The following diagram maps the key test infrastructure code entities to their roles:

Sources: tests/models/registry.py1-185 .buildkite/test_areas/engine.yaml1-39 .buildkite/test-amd.yaml1-50

Relationship to Other Subsystems

Testing exercises the entire vLLM stack. The major tested subsystems include:

Engine and scheduling — tests/engine/, tests/v1/engine/, tests/v1/e2e/
Model execution and GPU runner — tests/models/, tests/kernels/
Serving APIs — tests/entrypoints/openai/, tests/entrypoints/llm/
Distributed execution — tests/distributed/, tests/v1/distributed/
Quantization — tests/quantization/
Samplers — tests/samplers/
KV cache and prefix caching — tests/v1/core/, tests/v1/kv_connector/
Speculative decoding — tests/v1/spec_decode/, tests/spec_decode/

Testing and CI/CD

Overview

CI Pipeline Architecture

Test Areas: Modular Pipeline Structure

Known Test Area Groups

AMD/ROCm Testing Infrastructure

Model Correctness Testing Framework

Test Infrastructure Code Map

Relationship to Other Subsystems

On this page

Testing and CI/CD

Overview

CI Pipeline Architecture

Test Areas: Modular Pipeline Structure

Known Test Area Groups

AMD/ROCm Testing Infrastructure

Model Correctness Testing Framework

Test Infrastructure Code Map

Relationship to Other Subsystems

On this page