Buildkite CI Pipelines

Relevant source files

This document describes the Buildkite continuous integration (CI) pipeline configuration system used for automated testing in vLLM. It covers the pipeline organization, YAML configuration format, test categorization, hardware-specific pipelines, agent pool management, and Docker-based test execution environments. For general test organization and pytest configuration, see page 12.1. For AMD-specific GPU testing procedures and state management, see page 12.3.

Pipeline Organization

The Buildkite CI configuration has been reorganized from a single monolithic file into a modular structure. The deprecated .buildkite/test-pipeline.yaml file indicates the migration completed on February 18, 2026, with content split across multiple directories:

Directory/File	Purpose
`.buildkite/test_areas/`	Test job definitions organized by functional area (NVIDIA/general)
`.buildkite/image_build/`	Docker image building jobs
`.buildkite/hardware_tests/`	Hardware-specific test jobs (Intel, Ascend NPU, Arm, etc.)
`.buildkite/ci_config.yaml`	CI pipeline configuration settings
`.buildkite/test-amd.yaml`	AMD GPU-specific test pipeline (separate schema from test_areas/)
`.buildkite/release-pipeline.yaml`	Release pipeline: wheels, Docker images, PyPI publishing

There are two distinct YAML schemas in use:

test_areas/*.yaml: Used for NVIDIA/general CI. Each file defines a named group with steps that can optionally mirror to AMD hardware via a mirror: sub-key.
test-amd.yaml: Used for the AMD GPU pipeline. Uses agent_pool:, mirror_hardwares:, grade:, fast_check:, and torch_nightly: fields not present in the test_areas schema.

Sources: .buildkite/test-pipeline.yaml1-9 .buildkite/test_areas/engine.yaml1-71 .buildkite/test_areas/entrypoints.yaml1-99

Pipeline Architecture

Pipeline File Map

Sources: .buildkite/test-pipeline.yaml1-9 .buildkite/test-amd.yaml1-34 .buildkite/scripts/hardware_ci/run-amd-test.sh1-50

Test Step Configuration

There are two distinct YAML schemas used in the CI configuration.

test_areas/*.yaml Schema

Each file in .buildkite/test_areas/ defines a named group of steps for NVIDIA/general testing. The top-level group: and depends_on: fields apply to all steps in the file. Individual steps can optionally mirror to AMD hardware.

Top-level fields:

Field	Type	Description
`group`	string	Display group name in Buildkite UI
`depends_on`	list	Step keys that must complete first (e.g., `image-build`)
`steps`	list	List of step definitions

Per-step fields (test_areas/):

Field	Type	Description
`label`	string	Step display name
`timeout_in_minutes`	int	Step timeout
`commands`	list	Shell commands to run
`source_file_dependencies`	list	Path prefixes; step skipped if no changes match
`working_dir`	string	Working directory inside container
`num_devices`	int	Number of GPUs (1, 2, 4, 8)
`parallelism`	int	Number of parallel shards
`optional`	bool	Requires manual unblock to run
`soft_fail`	bool	Failure does not block pipeline
`no_gpu`	bool	CPU-only test
`torch_nightly`	bool	Include in nightly PyTorch test run
`mirror.amd.device`	string	AMD agent pool to mirror on (e.g., `mi325_1`)
`mirror.amd.depends_on`	list	AMD-specific step dependencies
`mirror.amd.commands`	list	Override commands for AMD run

Example from .buildkite/test_areas/engine.yaml:

Sources: .buildkite/test_areas/engine.yaml1-71 .buildkite/test_areas/entrypoints.yaml1-99 .buildkite/test_areas/samplers.yaml1-22

test-amd.yaml Schema

The .buildkite/test-amd.yaml file uses a different schema processed through a Jinja2 template (test-template-aws.j2). It has AMD-specific fields not present in test_areas/:

Field	Type	Description
`agent_pool`	string	AMD GPU pool: `mi325_1`, `mi325_2`, `mi325_4`, `mi325_8`
`mirror_hardwares`	list	Also run on: `amdexperimental`, `amdproduction`, `amdtentative`
`grade`	string	`Blocking` means failure fails the pipeline; absent = non-blocking
`fast_check`	bool	Run on every commit in the fast-check pipeline
`torch_nightly`	bool	Include in torch nightly pipeline
`num_gpus`	int	GPU count override (1, 2, 4, 8)
`num_nodes`	int	Simulate multi-node via multiple containers on one host
`gpu`	string	Override GPU type: `a100`, `h100`, `b200`
`autorun_on_main`	bool	Run automatically on main branch merges
`no_gpu`	bool	CPU-only test

Example step from .buildkite/test-amd.yaml:

Sources: .buildkite/test-amd.yaml9-27 .buildkite/test-amd.yaml107-123

Test Categories and Organization

Tests are organized into logical groups based on functionality and execution characteristics. The AMD pipeline demonstrates the standard categorization:

Fast Check Tests

Tests marked with fast_check: true run on every commit to provide rapid feedback:

Test Group	Duration	Purpose
Pytorch Nightly Dependency Override Check	2 min	Verify torch nightly compatibility
Async Engine, Inputs, Utils, Worker Test	10 min	Core engine functionality
Basic Correctness Test	20 min	Basic inference correctness
Entrypoints Unit Tests	5 min	API endpoint validation
Entrypoints Integration Test (LLM)	30 min	LLM class integration

Sources: .buildkite/test-amd.yaml36-222

Distributed Tests

Multi-GPU tests with specific parallelism configurations:

Sources: .buildkite/test-amd.yaml223-283

Kernel Tests with Sharding

Performance-critical kernel tests use parallelism for faster execution:

The %N in the label is replaced with the job number, and $$BUILDKITE_PARALLEL_JOB / $$BUILDKITE_PARALLEL_JOB_COUNT environment variables enable pytest sharding.

Sources: .buildkite/test-amd.yaml650-663

AMD-Specific Pipeline Configuration

The AMD GPU pipeline (.buildkite/test-amd.yaml) demonstrates hardware-specific testing patterns. Tests can target different agent pools corresponding to GPU counts:

AMD Agent Pool Mapping

Agent Pool	GPU Count	Typical Use Cases
`mi325_1`	1 GPU	Single-GPU tests, unit tests, correctness tests
`mi325_2`	2 GPUs	Small distributed tests, metrics/tracing
`mi325_4`	4 GPUs	Distributed parallelism tests (TP, PP, DP, EP)
`mi325_8`	8 GPUs	Large-scale distributed tests

Sources: .buildkite/test-amd.yaml44-306

Mirror Hardware Configuration (test-amd.yaml)

Tests can be mirrored across multiple hardware tiers using mirror_hardwares:

This runs the same test step on experimental, production, and tentative AMD GPU configurations.

In .buildkite/test_areas/ files, the equivalent is the mirror.amd block, which additionally allows overriding the commands for the AMD run.

Sources: .buildkite/test-amd.yaml42-54 .buildkite/test_areas/samplers.yaml15-21

Grade-Based Blocking (test-amd.yaml only)

Sources: .buildkite/test-amd.yaml44-128

Test Execution Environment

Tests execute inside Docker containers with specific GPU access and resource configurations. The run-amd-test.sh script manages the container lifecycle.

Docker Container Execution Flow

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh1-238

GPU State Management

The script ensures clean GPU state before and after test execution:

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh10-71

Docker Cleanup Strategy

The script implements disk-aware cleanup to prevent storage exhaustion:

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh23-45

Test Filtering and Customization

The AMD test script (run-amd-test.sh) applies two sequential transformations before running commands inside the container:

re_quote_pytest_markers: Restores quotes around -m / -k expressions that are stripped by shell argument passing.
apply_rocm_test_overrides: Applies ROCm-specific --ignore flags and environment variable prepends.

Command Transformation Pipeline

run-amd-test.sh command transformation flow

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh132-331

Key Filter Implementations

The re_quote_pytest_markers function .buildkite/scripts/hardware_ci/run-amd-test.sh132-237 tokenizes the command string and wraps multi-word -m/-k expressions in single quotes. It detects boundaries at test paths (/), .py files, long flags (--), and command separators (&&, ||). Single-word expressions (e.g., -m cpu_test) are passed through unquoted.

The apply_rocm_test_overrides function .buildkite/scripts/hardware_ci/run-amd-test.sh247-331 performs pattern matching on the assembled command string:

Pattern Matched	Transformation
`pytest -v -s lora`	Prepend `VLLM_ROCM_CUSTOM_PAGED_ATTN=0`
`kernels/core`	`--ignore` for `fused_quant_layernorm`, `permute_cols`
`kernels/attention`	`--ignore` for `flash_attn`, `flashinfer`, `prefix_prefill`, `cascade_flash_attn`, `mha_attn`, `lightning_attn`, `test_attention`
`kernels/quantization`	`--ignore` for `int8_quant`, `machete_mm`, `block_fp8`, `marlin_gemm`, `cutlass_scaled_mm`, `int8_kernel`
`kernels/mamba`	`--ignore` for `mamba_mixer2`, `causal_conv1d`, `mamba_ssm_ssd`
`kernels/moe`	`--ignore` for `test_moe`, `cutlass_moe`, `triton_moe_ptpc_fp8`
`entrypoints/openai`	`--ignore` for `test_audio`, `test_shutdown`, `test_completion`, `test_models`, `test_lora_adapters`, `test_return_tokens_as_ids`, `test_root_path`, `test_tokenization`, `test_prompt_validation`
`entrypoints/llm`	`--ignore` for `test_chat`, `test_accuracy`, `test_init`, `test_prompt_validation`
`models/test_registry.py`	Add `-k 'not BambaForCausalLM and not GritLM and not Mamba2ForCausalLM and not Zamba2ForCausalLM'`

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh247-331

Command Source Selection

The script prefers commands from VLLM_TEST_COMMANDS environment variable (which preserves inner quoting) over positional arguments ($*). The positional argument path is maintained for backward compatibility but strips inner double-quotes.

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh363-400

Multi-Node Test Configuration

Multi-node tests simulate distributed deployments by launching multiple Docker containers on a single host, connected via a Docker network.

Multi-Node Setup Architecture

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh192-219

Multi-Node Command Format

Multi-node tests use a special bracket syntax in their command string:

prefix ; [node0_cmd1, node0_cmd2] && [node1_cmd1, node1_cmd2]

The is_multi_node function .buildkite/scripts/hardware_ci/run-amd-test.sh88-100 detects multi-node jobs via:

NUM_NODES environment variable set to > 1, or
Pattern match on [...] && [...] bracket syntax in the command string.

When detected, the script parses the bracket arrays and calls run-multi-node-test.sh for each pair of node commands:

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh422-463

Network Cleanup

After multi-node tests, cleanup_network stops each nodeN container and removes the docker-net network. The number of nodes defaults to 2 if NUM_NODES is unset.

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh76-86

Docker Container Configuration

Container Resource Mapping

The script configures GPU access, memory, and environment for test execution:

Key configuration elements:

Option	Purpose
`--device /dev/kfd`	AMD GPU compute device access
`$BUILDKITE_AGENT_META_DATA_RENDER_DEVICES`	Render devices from agent metadata
`--group-add "$render_gid"`	Add container to render group for GPU access
`--shm-size=16gb`	Shared memory for multi-process workloads
`--network=host`	Host network for distributed tests
`-v "${HF_CACHE}:${HF_MOUNT}"`	Mount HuggingFace model cache

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh85-237

HuggingFace Cache Management

The script mounts a persistent HuggingFace cache to avoid re-downloading models:

This is mounted into the container with -v "${HF_CACHE}:${HF_MOUNT}" and exposed via the HF_HOME environment variable.

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh85-87

Release Pipeline

The .buildkite/release-pipeline.yaml file orchestrates publishing vLLM releases. It is triggered manually or on a schedule and proceeds through three groups of steps.

Release Pipeline Structure

.buildkite/release-pipeline.yaml step groups

Release Steps Detail

Step	Agent Queue	Key Actions
Build wheel (x86/arm64, CUDA 12.9/13.0/CPU)	`cpu_queue_postmerge` / `arm64_cpu_queue_postmerge`	`docker build --target build -f docker/Dockerfile`, upload via `upload-nightly-wheels.sh`
Build release image (x86/arm64)	`cpu_queue_postmerge` / `arm64_cpu_queue_postmerge`	`docker build --target vllm-openai`, push to `public.ecr.aws/q9t5s3a7/vllm-release-repo`
Create multi-arch manifest	`small_cpu_queue_postmerge`	`docker manifest create` combining x86_64 + aarch64 images
Publish nightly to DockerHub	`small_cpu_queue_postmerge`	`push-nightly-builds.sh`, tagged `nightly` or `cu130-nightly`
Upload release wheels to PyPI	`small_cpu_queue_postmerge`	`upload-release-wheels-pypi.sh`, gated by `block` step
Build ROCm Base Wheels	`cpu_queue_postmerge`	Builds `docker/Dockerfile.rocm_base`, caches in S3 by config hash
Build vLLM ROCm Wheel	`cpu_queue_postmerge`	Builds `docker/Dockerfile.rocm --target export_vllm_wheel_release`
Upload ROCm Wheels to S3	`cpu_queue_postmerge`	Controlled by `ROCM_UPLOAD_WHEELS` env var

The ROCm wheel build uses S3 caching keyed on the Dockerfile.rocm_base configuration hash. The cached Docker image is stored at s3://vllm-wheels/rocm/cache/<key>/rocm-base-image.tar.gz and reused if the base image hasn't changed.

Sources: .buildkite/release-pipeline.yaml1-270 .buildkite/scripts/push-nightly-builds.sh1-36 .buildkite/scripts/cleanup-nightly-builds.sh1-127

Source File Dependencies

The source_file_dependencies field enables selective test execution based on file modifications. Tests only run if any of their listed file prefixes have changed in the current commit. When no files match, Buildkite skips the step entirely.

This is present in both test_areas/*.yaml and test-amd.yaml schemas. An empty source_file_dependencies list causes the step to always run.

Sources: .buildkite/test-amd.yaml366-379 .buildkite/test_areas/engine.yaml7-12

Torch Nightly Testing

Tests marked with torch_nightly: true run against PyTorch nightly builds to detect compatibility issues early:

A dedicated dependency check validates nightly compatibility:

If this test fails, it indicates PyTorch nightly incompatibility, and packages may need to be added to a whitelist in /vllm/tools/pre_commit/generate_nightly_torch_test.py.

Sources: .buildkite/test-amd.yaml38-49 .buildkite/test-amd.yaml107-123

Optional and Soft-Fail Tests

Optional Tests

Tests marked optional: true require manual unblocking unless running in scheduled nightly builds:

These typically cover extended test suites that are too time-consuming for regular CI runs.

Sources: .buildkite/test-amd.yaml945-959

Soft-Fail Tests

Tests marked soft_fail: true report failures but don't block the pipeline:

This is useful for experimental features or flaky tests that shouldn't block development.

Sources: .buildkite/test-amd.yaml38-49

Buildkite CI Pipelines

Relevant source files

Pipeline Organization

Directory/File	Purpose
`.buildkite/test_areas/`	Test job definitions organized by functional area (NVIDIA/general)
`.buildkite/image_build/`	Docker image building jobs
`.buildkite/hardware_tests/`	Hardware-specific test jobs (Intel, Ascend NPU, Arm, etc.)
`.buildkite/ci_config.yaml`	CI pipeline configuration settings
`.buildkite/test-amd.yaml`	AMD GPU-specific test pipeline (separate schema from test_areas/)
`.buildkite/release-pipeline.yaml`	Release pipeline: wheels, Docker images, PyPI publishing

There are two distinct YAML schemas in use:

test_areas/*.yaml: Used for NVIDIA/general CI. Each file defines a named group with steps that can optionally mirror to AMD hardware via a mirror: sub-key.
test-amd.yaml: Used for the AMD GPU pipeline. Uses agent_pool:, mirror_hardwares:, grade:, fast_check:, and torch_nightly: fields not present in the test_areas schema.

Sources: .buildkite/test-pipeline.yaml1-9 .buildkite/test_areas/engine.yaml1-71 .buildkite/test_areas/entrypoints.yaml1-99

Pipeline Architecture

Pipeline File Map

Sources: .buildkite/test-pipeline.yaml1-9 .buildkite/test-amd.yaml1-34 .buildkite/scripts/hardware_ci/run-amd-test.sh1-50

Test Step Configuration

There are two distinct YAML schemas used in the CI configuration.

test_areas/*.yaml Schema

Top-level fields:

Field	Type	Description
`group`	string	Display group name in Buildkite UI
`depends_on`	list	Step keys that must complete first (e.g., `image-build`)
`steps`	list	List of step definitions

Per-step fields (test_areas/):

Field	Type	Description
`label`	string	Step display name
`timeout_in_minutes`	int	Step timeout
`commands`	list	Shell commands to run
`source_file_dependencies`	list	Path prefixes; step skipped if no changes match
`working_dir`	string	Working directory inside container
`num_devices`	int	Number of GPUs (1, 2, 4, 8)
`parallelism`	int	Number of parallel shards
`optional`	bool	Requires manual unblock to run
`soft_fail`	bool	Failure does not block pipeline
`no_gpu`	bool	CPU-only test
`torch_nightly`	bool	Include in nightly PyTorch test run
`mirror.amd.device`	string	AMD agent pool to mirror on (e.g., `mi325_1`)
`mirror.amd.depends_on`	list	AMD-specific step dependencies
`mirror.amd.commands`	list	Override commands for AMD run

Example from .buildkite/test_areas/engine.yaml:

Sources: .buildkite/test_areas/engine.yaml1-71 .buildkite/test_areas/entrypoints.yaml1-99 .buildkite/test_areas/samplers.yaml1-22

test-amd.yaml Schema

The .buildkite/test-amd.yaml file uses a different schema processed through a Jinja2 template (test-template-aws.j2). It has AMD-specific fields not present in test_areas/:

Field	Type	Description
`agent_pool`	string	AMD GPU pool: `mi325_1`, `mi325_2`, `mi325_4`, `mi325_8`
`mirror_hardwares`	list	Also run on: `amdexperimental`, `amdproduction`, `amdtentative`
`grade`	string	`Blocking` means failure fails the pipeline; absent = non-blocking
`fast_check`	bool	Run on every commit in the fast-check pipeline
`torch_nightly`	bool	Include in torch nightly pipeline
`num_gpus`	int	GPU count override (1, 2, 4, 8)
`num_nodes`	int	Simulate multi-node via multiple containers on one host
`gpu`	string	Override GPU type: `a100`, `h100`, `b200`
`autorun_on_main`	bool	Run automatically on main branch merges
`no_gpu`	bool	CPU-only test

Example step from .buildkite/test-amd.yaml:

Sources: .buildkite/test-amd.yaml9-27 .buildkite/test-amd.yaml107-123

Test Categories and Organization

Tests are organized into logical groups based on functionality and execution characteristics. The AMD pipeline demonstrates the standard categorization:

Fast Check Tests

Tests marked with fast_check: true run on every commit to provide rapid feedback:

Test Group	Duration	Purpose
Pytorch Nightly Dependency Override Check	2 min	Verify torch nightly compatibility
Async Engine, Inputs, Utils, Worker Test	10 min	Core engine functionality
Basic Correctness Test	20 min	Basic inference correctness
Entrypoints Unit Tests	5 min	API endpoint validation
Entrypoints Integration Test (LLM)	30 min	LLM class integration

Sources: .buildkite/test-amd.yaml36-222

Distributed Tests

Multi-GPU tests with specific parallelism configurations:

Sources: .buildkite/test-amd.yaml223-283

Kernel Tests with Sharding

Performance-critical kernel tests use parallelism for faster execution:

The %N in the label is replaced with the job number, and $$BUILDKITE_PARALLEL_JOB / $$BUILDKITE_PARALLEL_JOB_COUNT environment variables enable pytest sharding.

Sources: .buildkite/test-amd.yaml650-663

AMD-Specific Pipeline Configuration

The AMD GPU pipeline (.buildkite/test-amd.yaml) demonstrates hardware-specific testing patterns. Tests can target different agent pools corresponding to GPU counts:

AMD Agent Pool Mapping

Agent Pool	GPU Count	Typical Use Cases
`mi325_1`	1 GPU	Single-GPU tests, unit tests, correctness tests
`mi325_2`	2 GPUs	Small distributed tests, metrics/tracing
`mi325_4`	4 GPUs	Distributed parallelism tests (TP, PP, DP, EP)
`mi325_8`	8 GPUs	Large-scale distributed tests

Sources: .buildkite/test-amd.yaml44-306

Mirror Hardware Configuration (test-amd.yaml)

Tests can be mirrored across multiple hardware tiers using mirror_hardwares:

This runs the same test step on experimental, production, and tentative AMD GPU configurations.

In .buildkite/test_areas/ files, the equivalent is the mirror.amd block, which additionally allows overriding the commands for the AMD run.

Sources: .buildkite/test-amd.yaml42-54 .buildkite/test_areas/samplers.yaml15-21

Grade-Based Blocking (test-amd.yaml only)

Sources: .buildkite/test-amd.yaml44-128

Test Execution Environment

Tests execute inside Docker containers with specific GPU access and resource configurations. The run-amd-test.sh script manages the container lifecycle.

Docker Container Execution Flow

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh1-238

GPU State Management

The script ensures clean GPU state before and after test execution:

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh10-71

Docker Cleanup Strategy

The script implements disk-aware cleanup to prevent storage exhaustion:

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh23-45

Test Filtering and Customization

The AMD test script (run-amd-test.sh) applies two sequential transformations before running commands inside the container:

re_quote_pytest_markers: Restores quotes around -m / -k expressions that are stripped by shell argument passing.
apply_rocm_test_overrides: Applies ROCm-specific --ignore flags and environment variable prepends.

Command Transformation Pipeline

run-amd-test.sh command transformation flow

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh132-331

Key Filter Implementations

The apply_rocm_test_overrides function .buildkite/scripts/hardware_ci/run-amd-test.sh247-331 performs pattern matching on the assembled command string:

Pattern Matched	Transformation
`pytest -v -s lora`	Prepend `VLLM_ROCM_CUSTOM_PAGED_ATTN=0`
`kernels/core`	`--ignore` for `fused_quant_layernorm`, `permute_cols`
`kernels/attention`	`--ignore` for `flash_attn`, `flashinfer`, `prefix_prefill`, `cascade_flash_attn`, `mha_attn`, `lightning_attn`, `test_attention`
`kernels/quantization`	`--ignore` for `int8_quant`, `machete_mm`, `block_fp8`, `marlin_gemm`, `cutlass_scaled_mm`, `int8_kernel`
`kernels/mamba`	`--ignore` for `mamba_mixer2`, `causal_conv1d`, `mamba_ssm_ssd`
`kernels/moe`	`--ignore` for `test_moe`, `cutlass_moe`, `triton_moe_ptpc_fp8`
`entrypoints/openai`	`--ignore` for `test_audio`, `test_shutdown`, `test_completion`, `test_models`, `test_lora_adapters`, `test_return_tokens_as_ids`, `test_root_path`, `test_tokenization`, `test_prompt_validation`
`entrypoints/llm`	`--ignore` for `test_chat`, `test_accuracy`, `test_init`, `test_prompt_validation`
`models/test_registry.py`	Add `-k 'not BambaForCausalLM and not GritLM and not Mamba2ForCausalLM and not Zamba2ForCausalLM'`

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh247-331

Command Source Selection

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh363-400

Multi-Node Test Configuration

Multi-node tests simulate distributed deployments by launching multiple Docker containers on a single host, connected via a Docker network.

Multi-Node Setup Architecture

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh192-219

Multi-Node Command Format

Multi-node tests use a special bracket syntax in their command string:

prefix ; [node0_cmd1, node0_cmd2] && [node1_cmd1, node1_cmd2]

The is_multi_node function .buildkite/scripts/hardware_ci/run-amd-test.sh88-100 detects multi-node jobs via:

NUM_NODES environment variable set to > 1, or
Pattern match on [...] && [...] bracket syntax in the command string.

When detected, the script parses the bracket arrays and calls run-multi-node-test.sh for each pair of node commands:

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh422-463

Network Cleanup

After multi-node tests, cleanup_network stops each nodeN container and removes the docker-net network. The number of nodes defaults to 2 if NUM_NODES is unset.

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh76-86

Docker Container Configuration

Container Resource Mapping

The script configures GPU access, memory, and environment for test execution:

Key configuration elements:

Option	Purpose
`--device /dev/kfd`	AMD GPU compute device access
`$BUILDKITE_AGENT_META_DATA_RENDER_DEVICES`	Render devices from agent metadata
`--group-add "$render_gid"`	Add container to render group for GPU access
`--shm-size=16gb`	Shared memory for multi-process workloads
`--network=host`	Host network for distributed tests
`-v "${HF_CACHE}:${HF_MOUNT}"`	Mount HuggingFace model cache

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh85-237

HuggingFace Cache Management

The script mounts a persistent HuggingFace cache to avoid re-downloading models:

This is mounted into the container with -v "${HF_CACHE}:${HF_MOUNT}" and exposed via the HF_HOME environment variable.

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh85-87

Release Pipeline

The .buildkite/release-pipeline.yaml file orchestrates publishing vLLM releases. It is triggered manually or on a schedule and proceeds through three groups of steps.

Release Pipeline Structure

.buildkite/release-pipeline.yaml step groups

Release Steps Detail

Step	Agent Queue	Key Actions
Build wheel (x86/arm64, CUDA 12.9/13.0/CPU)	`cpu_queue_postmerge` / `arm64_cpu_queue_postmerge`	`docker build --target build -f docker/Dockerfile`, upload via `upload-nightly-wheels.sh`
Build release image (x86/arm64)	`cpu_queue_postmerge` / `arm64_cpu_queue_postmerge`	`docker build --target vllm-openai`, push to `public.ecr.aws/q9t5s3a7/vllm-release-repo`
Create multi-arch manifest	`small_cpu_queue_postmerge`	`docker manifest create` combining x86_64 + aarch64 images
Publish nightly to DockerHub	`small_cpu_queue_postmerge`	`push-nightly-builds.sh`, tagged `nightly` or `cu130-nightly`
Upload release wheels to PyPI	`small_cpu_queue_postmerge`	`upload-release-wheels-pypi.sh`, gated by `block` step
Build ROCm Base Wheels	`cpu_queue_postmerge`	Builds `docker/Dockerfile.rocm_base`, caches in S3 by config hash
Build vLLM ROCm Wheel	`cpu_queue_postmerge`	Builds `docker/Dockerfile.rocm --target export_vllm_wheel_release`
Upload ROCm Wheels to S3	`cpu_queue_postmerge`	Controlled by `ROCM_UPLOAD_WHEELS` env var

Sources: .buildkite/release-pipeline.yaml1-270 .buildkite/scripts/push-nightly-builds.sh1-36 .buildkite/scripts/cleanup-nightly-builds.sh1-127

Source File Dependencies

This is present in both test_areas/*.yaml and test-amd.yaml schemas. An empty source_file_dependencies list causes the step to always run.

Sources: .buildkite/test-amd.yaml366-379 .buildkite/test_areas/engine.yaml7-12

Torch Nightly Testing

Tests marked with torch_nightly: true run against PyTorch nightly builds to detect compatibility issues early:

A dedicated dependency check validates nightly compatibility:

If this test fails, it indicates PyTorch nightly incompatibility, and packages may need to be added to a whitelist in /vllm/tools/pre_commit/generate_nightly_torch_test.py.

Sources: .buildkite/test-amd.yaml38-49 .buildkite/test-amd.yaml107-123

Optional and Soft-Fail Tests

Optional Tests

Tests marked optional: true require manual unblocking unless running in scheduled nightly builds:

These typically cover extended test suites that are too time-consuming for regular CI runs.

Sources: .buildkite/test-amd.yaml945-959

Soft-Fail Tests

Tests marked soft_fail: true report failures but don't block the pipeline:

This is useful for experimental features or flaky tests that shouldn't block development.

Sources: .buildkite/test-amd.yaml38-49

Buildkite CI Pipelines

Pipeline Organization

Pipeline Architecture

Test Step Configuration

test_areas/*.yaml Schema

test-amd.yaml Schema

Test Categories and Organization

Fast Check Tests

Distributed Tests

Kernel Tests with Sharding

AMD-Specific Pipeline Configuration

AMD Agent Pool Mapping

Mirror Hardware Configuration (test-amd.yaml)

Grade-Based Blocking (test-amd.yaml only)

Test Execution Environment

Docker Container Execution Flow

GPU State Management

Docker Cleanup Strategy

Test Filtering and Customization

Command Transformation Pipeline

Key Filter Implementations

Command Source Selection

Multi-Node Test Configuration

Multi-Node Setup Architecture

Multi-Node Command Format

Network Cleanup

Docker Container Configuration

Container Resource Mapping

HuggingFace Cache Management

Release Pipeline

Release Pipeline Structure

Release Steps Detail

Source File Dependencies

Torch Nightly Testing

Optional and Soft-Fail Tests

Optional Tests

Soft-Fail Tests

On this page

Buildkite CI Pipelines

Pipeline Organization

Pipeline Architecture

Test Step Configuration

test_areas/*.yaml Schema

test-amd.yaml Schema

Test Categories and Organization

Fast Check Tests

Distributed Tests

Kernel Tests with Sharding

AMD-Specific Pipeline Configuration

AMD Agent Pool Mapping

Mirror Hardware Configuration (test-amd.yaml)

Grade-Based Blocking (test-amd.yaml only)

Test Execution Environment

Docker Container Execution Flow

GPU State Management

Docker Cleanup Strategy

Test Filtering and Customization

Command Transformation Pipeline

Key Filter Implementations

Command Source Selection

Multi-Node Test Configuration

Multi-Node Setup Architecture

Multi-Node Command Format

Network Cleanup

Docker Container Configuration

Container Resource Mapping

HuggingFace Cache Management

Release Pipeline

Release Pipeline Structure

Release Steps Detail

Source File Dependencies

Torch Nightly Testing

Optional and Soft-Fail Tests

Optional Tests

Soft-Fail Tests

On this page