This document describes the Buildkite continuous integration (CI) pipeline configuration system used for automated testing in vLLM. It covers the pipeline organization, YAML configuration format, test categorization, hardware-specific pipelines, agent pool management, and Docker-based test execution environments. For general test organization and pytest configuration, see page 12.1. For AMD-specific GPU testing procedures and state management, see page 12.3.
The Buildkite CI configuration has been reorganized from a single monolithic file into a modular structure. The deprecated .buildkite/test-pipeline.yaml file indicates the migration completed on February 18, 2026, with content split across multiple directories:
| Directory/File | Purpose |
|---|---|
.buildkite/test_areas/ | Test job definitions organized by functional area (NVIDIA/general) |
.buildkite/image_build/ | Docker image building jobs |
.buildkite/hardware_tests/ | Hardware-specific test jobs (Intel, Ascend NPU, Arm, etc.) |
.buildkite/ci_config.yaml | CI pipeline configuration settings |
.buildkite/test-amd.yaml | AMD GPU-specific test pipeline (separate schema from test_areas/) |
.buildkite/release-pipeline.yaml | Release pipeline: wheels, Docker images, PyPI publishing |
There are two distinct YAML schemas in use:
test_areas/*.yaml: Used for NVIDIA/general CI. Each file defines a named group with steps that can optionally mirror to AMD hardware via a mirror: sub-key.test-amd.yaml: Used for the AMD GPU pipeline. Uses agent_pool:, mirror_hardwares:, grade:, fast_check:, and torch_nightly: fields not present in the test_areas schema.Sources: .buildkite/test-pipeline.yaml1-9 .buildkite/test_areas/engine.yaml1-71 .buildkite/test_areas/entrypoints.yaml1-99
Pipeline File Map
Sources: .buildkite/test-pipeline.yaml1-9 .buildkite/test-amd.yaml1-34 .buildkite/scripts/hardware_ci/run-amd-test.sh1-50
There are two distinct YAML schemas used in the CI configuration.
Each file in .buildkite/test_areas/ defines a named group of steps for NVIDIA/general testing. The top-level group: and depends_on: fields apply to all steps in the file. Individual steps can optionally mirror to AMD hardware.
Top-level fields:
| Field | Type | Description |
|---|---|---|
group | string | Display group name in Buildkite UI |
depends_on | list | Step keys that must complete first (e.g., image-build) |
steps | list | List of step definitions |
Per-step fields (test_areas/):
| Field | Type | Description |
|---|---|---|
label | string | Step display name |
timeout_in_minutes | int | Step timeout |
commands | list | Shell commands to run |
source_file_dependencies | list | Path prefixes; step skipped if no changes match |
working_dir | string | Working directory inside container |
num_devices | int | Number of GPUs (1, 2, 4, 8) |
parallelism | int | Number of parallel shards |
optional | bool | Requires manual unblock to run |
soft_fail | bool | Failure does not block pipeline |
no_gpu | bool | CPU-only test |
torch_nightly | bool | Include in nightly PyTorch test run |
mirror.amd.device | string | AMD agent pool to mirror on (e.g., mi325_1) |
mirror.amd.depends_on | list | AMD-specific step dependencies |
mirror.amd.commands | list | Override commands for AMD run |
Example from .buildkite/test_areas/engine.yaml:
Sources: .buildkite/test_areas/engine.yaml1-71 .buildkite/test_areas/entrypoints.yaml1-99 .buildkite/test_areas/samplers.yaml1-22
The .buildkite/test-amd.yaml file uses a different schema processed through a Jinja2 template (test-template-aws.j2). It has AMD-specific fields not present in test_areas/:
| Field | Type | Description |
|---|---|---|
agent_pool | string | AMD GPU pool: mi325_1, mi325_2, mi325_4, mi325_8 |
mirror_hardwares | list | Also run on: amdexperimental, amdproduction, amdtentative |
grade | string | Blocking means failure fails the pipeline; absent = non-blocking |
fast_check | bool | Run on every commit in the fast-check pipeline |
torch_nightly | bool | Include in torch nightly pipeline |
num_gpus | int | GPU count override (1, 2, 4, 8) |
num_nodes | int | Simulate multi-node via multiple containers on one host |
gpu | string | Override GPU type: a100, h100, b200 |
autorun_on_main | bool | Run automatically on main branch merges |
no_gpu | bool | CPU-only test |
Example step from .buildkite/test-amd.yaml:
Sources: .buildkite/test-amd.yaml9-27 .buildkite/test-amd.yaml107-123
Tests are organized into logical groups based on functionality and execution characteristics. The AMD pipeline demonstrates the standard categorization:
Tests marked with fast_check: true run on every commit to provide rapid feedback:
| Test Group | Duration | Purpose |
|---|---|---|
| Pytorch Nightly Dependency Override Check | 2 min | Verify torch nightly compatibility |
| Async Engine, Inputs, Utils, Worker Test | 10 min | Core engine functionality |
| Basic Correctness Test | 20 min | Basic inference correctness |
| Entrypoints Unit Tests | 5 min | API endpoint validation |
| Entrypoints Integration Test (LLM) | 30 min | LLM class integration |
Sources: .buildkite/test-amd.yaml36-222
Multi-GPU tests with specific parallelism configurations:
Sources: .buildkite/test-amd.yaml223-283
Performance-critical kernel tests use parallelism for faster execution:
The %N in the label is replaced with the job number, and $$BUILDKITE_PARALLEL_JOB / $$BUILDKITE_PARALLEL_JOB_COUNT environment variables enable pytest sharding.
Sources: .buildkite/test-amd.yaml650-663
The AMD GPU pipeline (.buildkite/test-amd.yaml) demonstrates hardware-specific testing patterns. Tests can target different agent pools corresponding to GPU counts:
| Agent Pool | GPU Count | Typical Use Cases |
|---|---|---|
mi325_1 | 1 GPU | Single-GPU tests, unit tests, correctness tests |
mi325_2 | 2 GPUs | Small distributed tests, metrics/tracing |
mi325_4 | 4 GPUs | Distributed parallelism tests (TP, PP, DP, EP) |
mi325_8 | 8 GPUs | Large-scale distributed tests |
Sources: .buildkite/test-amd.yaml44-306
Tests can be mirrored across multiple hardware tiers using mirror_hardwares:
This runs the same test step on experimental, production, and tentative AMD GPU configurations.
In .buildkite/test_areas/ files, the equivalent is the mirror.amd block, which additionally allows overriding the commands for the AMD run.
Sources: .buildkite/test-amd.yaml42-54 .buildkite/test_areas/samplers.yaml15-21
Sources: .buildkite/test-amd.yaml44-128
Tests execute inside Docker containers with specific GPU access and resource configurations. The run-amd-test.sh script manages the container lifecycle.
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh1-238
The script ensures clean GPU state before and after test execution:
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh10-71
The script implements disk-aware cleanup to prevent storage exhaustion:
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh23-45
The AMD test script (run-amd-test.sh) applies two sequential transformations before running commands inside the container:
re_quote_pytest_markers: Restores quotes around -m / -k expressions that are stripped by shell argument passing.apply_rocm_test_overrides: Applies ROCm-specific --ignore flags and environment variable prepends.run-amd-test.sh command transformation flow
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh132-331
The re_quote_pytest_markers function .buildkite/scripts/hardware_ci/run-amd-test.sh132-237 tokenizes the command string and wraps multi-word -m/-k expressions in single quotes. It detects boundaries at test paths (/), .py files, long flags (--), and command separators (&&, ||). Single-word expressions (e.g., -m cpu_test) are passed through unquoted.
The apply_rocm_test_overrides function .buildkite/scripts/hardware_ci/run-amd-test.sh247-331 performs pattern matching on the assembled command string:
| Pattern Matched | Transformation |
|---|---|
pytest -v -s lora | Prepend VLLM_ROCM_CUSTOM_PAGED_ATTN=0 |
kernels/core | --ignore for fused_quant_layernorm, permute_cols |
kernels/attention | --ignore for flash_attn, flashinfer, prefix_prefill, cascade_flash_attn, mha_attn, lightning_attn, test_attention |
kernels/quantization | --ignore for int8_quant, machete_mm, block_fp8, marlin_gemm, cutlass_scaled_mm, int8_kernel |
kernels/mamba | --ignore for mamba_mixer2, causal_conv1d, mamba_ssm_ssd |
kernels/moe | --ignore for test_moe, cutlass_moe, triton_moe_ptpc_fp8 |
entrypoints/openai | --ignore for test_audio, test_shutdown, test_completion, test_models, test_lora_adapters, test_return_tokens_as_ids, test_root_path, test_tokenization, test_prompt_validation |
entrypoints/llm | --ignore for test_chat, test_accuracy, test_init, test_prompt_validation |
models/test_registry.py | Add -k 'not BambaForCausalLM and not GritLM and not Mamba2ForCausalLM and not Zamba2ForCausalLM' |
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh247-331
The script prefers commands from VLLM_TEST_COMMANDS environment variable (which preserves inner quoting) over positional arguments ($*). The positional argument path is maintained for backward compatibility but strips inner double-quotes.
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh363-400
Multi-node tests simulate distributed deployments by launching multiple Docker containers on a single host, connected via a Docker network.
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh192-219
Multi-node tests use a special bracket syntax in their command string:
prefix ; [node0_cmd1, node0_cmd2] && [node1_cmd1, node1_cmd2]
The is_multi_node function .buildkite/scripts/hardware_ci/run-amd-test.sh88-100 detects multi-node jobs via:
NUM_NODES environment variable set to > 1, or[...] && [...] bracket syntax in the command string.When detected, the script parses the bracket arrays and calls run-multi-node-test.sh for each pair of node commands:
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh422-463
After multi-node tests, cleanup_network stops each nodeN container and removes the docker-net network. The number of nodes defaults to 2 if NUM_NODES is unset.
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh76-86
The script configures GPU access, memory, and environment for test execution:
Key configuration elements:
| Option | Purpose |
|---|---|
--device /dev/kfd | AMD GPU compute device access |
$BUILDKITE_AGENT_META_DATA_RENDER_DEVICES | Render devices from agent metadata |
--group-add "$render_gid" | Add container to render group for GPU access |
--shm-size=16gb | Shared memory for multi-process workloads |
--network=host | Host network for distributed tests |
-v "${HF_CACHE}:${HF_MOUNT}" | Mount HuggingFace model cache |
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh85-237
The script mounts a persistent HuggingFace cache to avoid re-downloading models:
This is mounted into the container with -v "${HF_CACHE}:${HF_MOUNT}" and exposed via the HF_HOME environment variable.
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh85-87
The .buildkite/release-pipeline.yaml file orchestrates publishing vLLM releases. It is triggered manually or on a schedule and proceeds through three groups of steps.
.buildkite/release-pipeline.yaml step groups
| Step | Agent Queue | Key Actions |
|---|---|---|
| Build wheel (x86/arm64, CUDA 12.9/13.0/CPU) | cpu_queue_postmerge / arm64_cpu_queue_postmerge | docker build --target build -f docker/Dockerfile, upload via upload-nightly-wheels.sh |
| Build release image (x86/arm64) | cpu_queue_postmerge / arm64_cpu_queue_postmerge | docker build --target vllm-openai, push to public.ecr.aws/q9t5s3a7/vllm-release-repo |
| Create multi-arch manifest | small_cpu_queue_postmerge | docker manifest create combining x86_64 + aarch64 images |
| Publish nightly to DockerHub | small_cpu_queue_postmerge | push-nightly-builds.sh, tagged nightly or cu130-nightly |
| Upload release wheels to PyPI | small_cpu_queue_postmerge | upload-release-wheels-pypi.sh, gated by block step |
| Build ROCm Base Wheels | cpu_queue_postmerge | Builds docker/Dockerfile.rocm_base, caches in S3 by config hash |
| Build vLLM ROCm Wheel | cpu_queue_postmerge | Builds docker/Dockerfile.rocm --target export_vllm_wheel_release |
| Upload ROCm Wheels to S3 | cpu_queue_postmerge | Controlled by ROCM_UPLOAD_WHEELS env var |
The ROCm wheel build uses S3 caching keyed on the Dockerfile.rocm_base configuration hash. The cached Docker image is stored at s3://vllm-wheels/rocm/cache/<key>/rocm-base-image.tar.gz and reused if the base image hasn't changed.
Sources: .buildkite/release-pipeline.yaml1-270 .buildkite/scripts/push-nightly-builds.sh1-36 .buildkite/scripts/cleanup-nightly-builds.sh1-127
The source_file_dependencies field enables selective test execution based on file modifications. Tests only run if any of their listed file prefixes have changed in the current commit. When no files match, Buildkite skips the step entirely.
This is present in both test_areas/*.yaml and test-amd.yaml schemas. An empty source_file_dependencies list causes the step to always run.
Sources: .buildkite/test-amd.yaml366-379 .buildkite/test_areas/engine.yaml7-12
Tests marked with torch_nightly: true run against PyTorch nightly builds to detect compatibility issues early:
A dedicated dependency check validates nightly compatibility:
If this test fails, it indicates PyTorch nightly incompatibility, and packages may need to be added to a whitelist in /vllm/tools/pre_commit/generate_nightly_torch_test.py.
Sources: .buildkite/test-amd.yaml38-49 .buildkite/test-amd.yaml107-123
Tests marked optional: true require manual unblocking unless running in scheduled nightly builds:
These typically cover extended test suites that are too time-consuming for regular CI runs.
Sources: .buildkite/test-amd.yaml945-959
Tests marked soft_fail: true report failures but don't block the pipeline:
This is useful for experimental features or flaky tests that shouldn't block development.
Sources: .buildkite/test-amd.yaml38-49
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.