This page describes how vLLM is tested and how its continuous integration pipelines are structured. It covers the overall layout of the test infrastructure, the Buildkite CI pipeline organization, hardware-specific testing (including AMD/ROCm), and the model correctness verification framework.
For details on individual topics, see the child pages:
test_areas YAML files: see 12.1run-amd-test.sh, multi-node AMD configurations: see 12.3_HfExamplesInfo, VLM test framework, model registry: see 12.4vLLM uses Buildkite as its CI platform. Each pull request triggers a pipeline that builds Docker images and then dispatches a large number of parallelized test steps across NVIDIA and AMD GPU pools, as well as CPU-only environments.
The pipeline configuration was migrated in early 2026 from a single monolithic file .buildkite/test-pipeline.yaml to a modular directory-based structure. The old file now serves only as a pointer:
.buildkite/test-pipeline.yaml— deprecated as of Feb 18, 2026. Content migrated to:
.buildkite/test_areas/— test job definitions.buildkite/image_build/— Docker image build jobs.buildkite/hardware_tests/— tests for other hardware (Intel, Ascend, Arm, etc.).buildkite/ci_config.yaml— pipeline configuration
Sources: .buildkite/test-pipeline.yaml1-9
Buildkite Pipeline High-Level Flow
Each test area YAML in .buildkite/test_areas/ defines a group with one or more steps. Steps depend on the image-build step completing first. Steps for NVIDIA and AMD hardware run in parallel across different agent pools.
Sources: .buildkite/test-pipeline.yaml1-9 .buildkite/test_areas/entrypoints.yaml1-14 .buildkite/test_areas/engine.yaml1-39
Each file in .buildkite/test_areas/ is a self-contained group of test steps. The step format is uniform:
Test Step Field Reference
| Field | Type | Description |
|---|---|---|
label | string | Display name shown in Buildkite UI |
timeout_in_minutes | int | Max runtime; step is killed if exceeded |
commands | list[string] | Shell commands to execute in order |
source_file_dependencies | list[string] | File path prefixes; step may be skipped if no matching changes |
working_dir | string | Working directory (default: /vllm-workspace/tests) |
num_devices | int | Number of GPUs required (default: 1) |
num_nodes | int | Enables multi-node container simulation |
optional | bool | If true, must be manually unblocked |
fast_check | bool | Included in every-commit fast-check pipeline |
torch_nightly | bool | Included in nightly PyTorch test pipeline |
grade | string | Blocking means step failure blocks the pipeline |
mirror | object | Configuration to run the same step on other hardware |
Example step from .buildkite/test_areas/engine.yaml:
Sources: .buildkite/test_areas/engine.yaml1-39 .buildkite/test_areas/entrypoints.yaml1-50 .buildkite/test_areas/samplers.yaml1-22 .buildkite/test_areas/models_language.yaml1-50
The following groups exist as separate YAML files under .buildkite/test_areas/:
| File | Group Name | Example Steps |
|---|---|---|
engine.yaml | Engine | Engine, V1 e2e + engine, V1 e2e (multi-GPU) |
entrypoints.yaml | Entrypoints | Unit Tests, LLM integration, API Server 1 & 2, Responses API |
samplers.yaml | Samplers | Samplers Test (also with VLLM_USE_FLASHINFER_SAMPLER=1) |
models_language.yaml | Models - Language | Language Models Tests (Standard), Language Models Tests (Extra Standard) |
Additional groups exist (not shown in provided files) for kernels, quantization, distributed tests, speculative decoding, multimodal models, and more.
Sources: .buildkite/test_areas/engine.yaml1-39 .buildkite/test_areas/entrypoints.yaml1-50 .buildkite/test_areas/samplers.yaml1-22 .buildkite/test_areas/models_language.yaml1-50
AMD tests are driven by two mechanisms:
test-amd.yaml — AMD-primary test definitions. Steps are defined with AMD agent pools directly (e.g., agent_pool: mi325_1). These tests use AMD-specific flags (mirror_hardwares, grade: Blocking, etc.).mirror blocks in test_areas/*.yaml — NVIDIA-primary tests can declare an AMD mirror configuration, which re-runs the same tests on ROCm hardware.Test execution on AMD hardware is handled by .buildkite/scripts/hardware_ci/run-amd-test.sh.
AMD Test Infrastructure Diagram
AMD Agent Pool Reference
| Pool Name | GPUs |
|---|---|
mi325_1 | 1× AMD MI325 |
mi325_2 | 2× AMD MI325 |
mi325_4 | 4× AMD MI325 |
mi325_8 | 8× AMD MI325 |
Multi-node detection in run-amd-test.sh uses two signals: the NUM_NODES environment variable (preferred) or structural detection of bracket command syntax [node0_cmds] && [node1_cmds].
Commands should be passed via the VLLM_TEST_COMMANDS environment variable (not positional arguments) to avoid quoting issues.
Sources: .buildkite/test-amd.yaml1-100 .buildkite/scripts/hardware_ci/run-amd-test.sh1-120
Model correctness tests are parameterized through a registry defined in tests/models/registry.py. This registry maps architecture class names to test metadata via _HfExamplesInfo instances.
Model Test Registry Architecture
The registries in tests/models/registry.py (_TEXT_GENERATION_EXAMPLE_MODELS, _EMBEDDING_EXAMPLE_MODELS, _MULTIMODAL_EXAMPLE_MODELS, _SEQUENCE_CLASSIFICATION_EXAMPLE_MODELS) mirror the production model registry in vllm/model_executor/models/registry.py. Any new architecture added to the production registry should have a corresponding entry in the test registry.
_HfExamplesInfo.check_transformers_version() calls pytest.skip() or raises a RuntimeError when the installed transformers version is outside [min_transformers_version, max_transformers_version]. This allows model-specific gating without removing tests.
For VLM generation tests, tests/models/multimodal/generation/test_common.py uses VLMTestInfo with configurable VLMTestType values:
VLMTestType | Description |
|---|---|
IMAGE | Single image input |
MULTI_IMAGE | Multiple image inputs |
VIDEO | Video input |
EMBEDDING | Image embedding input |
CUSTOM_INPUTS | Arbitrary custom test inputs |
Tests marked with pytest.mark.core_model are included in fast-check CI pipelines. Tests marked with pytest.mark.cpu_model also run on CPU-only agents.
Sources: tests/models/registry.py1-185 tests/models/multimodal/generation/test_common.py1-160 tests/models/multimodal/processing/test_common.py1-100
The following diagram maps the key test infrastructure code entities to their roles:
Sources: tests/models/registry.py1-185 .buildkite/test_areas/engine.yaml1-39 .buildkite/test-amd.yaml1-50
Testing exercises the entire vLLM stack. The major tested subsystems include:
tests/engine/, tests/v1/engine/, tests/v1/e2e/tests/models/, tests/kernels/tests/entrypoints/openai/, tests/entrypoints/llm/tests/distributed/, tests/v1/distributed/tests/quantization/tests/samplers/tests/v1/core/, tests/v1/kv_connector/tests/v1/spec_decode/, tests/spec_decode/For details on how the model correctness test registry works, see 12.4. For the exact Buildkite step schema and pipeline generation scripts, see 12.2. For ROCm-specific configuration details, see 12.3.
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.