This page covers the CI test infrastructure for AMD/ROCm GPU hardware in vLLM, including the run-amd-test.sh execution script, the test-amd.yaml test definitions, multi-node test configuration, ROCm-specific test exclusions, and XPU test procedures.
For general test organization and directory structure, see 12.1. For Buildkite CI pipeline structure and the modular test_areas/ YAML system, see 12.2. For the ROCm platform implementation itself, see 10.3.
AMD/ROCm tests run inside Docker containers pulled from a per-commit image (rocm/vllm-ci:${BUILDKITE_COMMIT}) on bare-metal Buildkite agents equipped with AMD GPUs. The entry point for all AMD test execution is .buildkite/scripts/hardware_ci/run-amd-test.sh
Test definition file: .buildkite/test-amd.yaml — defines every test step that runs on AMD hardware, using the same step schema as the CUDA pipeline (label, commands, timeouts, agent_pool, etc.).
The following diagram maps the overall flow from Buildkite to test execution:
Diagram: AMD CI Execution Flow
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh1-484 .buildkite/test-amd.yaml1-50
Each test step in test-amd.yaml specifies an agent_pool that determines how many AMD GPUs are allocated:
| Agent Pool | GPU Count | Typical Usage |
|---|---|---|
mi325_1 | 1 GPU | Single-GPU unit tests, basic correctness, kernel tests |
mi325_2 | 2 GPUs | Metrics/tracing tests, 2-GPU distributed tests |
mi325_4 | 4 GPUs | 4-GPU distributed, EPLB execution, speculative decode |
mi325_8 | 8 GPUs | 8-GPU distributed tests (TP=2, DP=4) |
The mirror_hardwares field in test-amd.yaml selects which AMD hardware class receives a copy of a given test step:
| Tier | Description |
|---|---|
amdproduction | Stable production AMD machines; most tests mirror here |
amdexperimental | Experimental AMD hardware; subset of tests |
amdtentative | Staging/tentative hardware; typically only fast/blocking tests |
Example from test-amd.yaml:
In the newer modular test_areas/ YAML format (see 12.2), the equivalent is a mirror.amd block per step:
Sources: .buildkite/test-amd.yaml35-50 .buildkite/test_areas/engine.yaml32-39 .buildkite/test_areas/entrypoints.yaml27-31
run-amd-test.sh ScriptThe script .buildkite/scripts/hardware_ci/run-amd-test.sh performs all setup and orchestration before invoking the Docker container.
Diagram: run-amd-test.sh Function Map
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh38-86 .buildkite/scripts/hardware_ci/run-amd-test.sh132-237 .buildkite/scripts/hardware_ci/run-amd-test.sh247-331
Before pulling any image, the script waits up to 300 seconds for the AMD GPU state file at /opt/amdgpu/etc/gpu_state to read clean. After this check, it resets the GPUs and waits again:
.buildkite/scripts/hardware_ci/run-amd-test.sh38-53
wait_for_clean_gpus → cleanup_docker → echo "reset" > /opt/amdgpu/etc/gpu_state → wait_for_clean_gpus
Test commands may be passed in two ways:
| Method | Variable | Notes |
|---|---|---|
| Preferred | VLLM_TEST_COMMANDS | Single-quoted assignment preserves inner double quotes |
| Legacy | Positional args ($*) | Inner double quotes may be stripped by calling shell |
.buildkite/scripts/hardware_ci/run-amd-test.sh370-390
re_quote_pytest_markers)When commands pass through $*, pytest -m and -k expressions lose their quotes. For example:
pytest -v -s -m 'not cpu_test' v1/core
becomes:
pytest -v -s -m not cpu_test v1/core
The re_quote_pytest_markers function reconstructs quotes by collecting tokens after -m or -k until it hits a boundary token (path containing /, .py file, --flag, &&, etc.).
.buildkite/scripts/hardware_ci/run-amd-test.sh132-237
.buildkite/scripts/hardware_ci/run-amd-test.sh464-483
Key flags:
/dev/kfd — AMD kernel fusion driver device$BUILDKITE_AGENT_META_DATA_RENDER_DEVICES — render device paths (e.g. /dev/dri/renderD128)--group-add render_gid — grants GPU access to the render group$RDMA_FLAGS — conditionally adds --device /dev/infiniband --cap-add=IPC_LOCK if host has RDMA devicesMulti-node is detected via two signals, checked in is_multi_node():
NUM_NODES environment variable is greater than 1[node0_cmds] && [node1_cmds]When multi-node is detected, the script parses bracket-delimited per-node command arrays and calls run-multi-node-test.sh for each command pair:
.buildkite/scripts/hardware_ci/run-amd-test.sh422-463
The apply_rocm_test_overrides function .buildkite/scripts/hardware_ci/run-amd-test.sh247-331 appends --ignore flags and environment overrides for tests that are not yet supported or behave differently on ROCm. This function modifies the command string in-place before the container is launched.
Diagram: apply_rocm_test_overrides Coverage
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh247-331
| Kernel Category | Excluded Tests |
|---|---|
kernels/core | test_fused_quant_layernorm.py, test_permute_cols.py |
kernels/attention | test_attention_selector.py, test_encoder_decoder_attn.py, test_flash_attn.py, test_flashinfer.py, test_prefix_prefill.py, test_cascade_flash_attn.py, test_mha_attn.py, test_lightning_attn.py, test_attention.py |
kernels/quantization | test_int8_quant.py, test_machete_mm.py, test_block_fp8.py, test_block_int8.py, test_marlin_gemm.py, test_cutlass_scaled_mm.py, test_int8_kernel.py |
kernels/mamba | test_mamba_mixer2.py, test_causal_conv1d.py, test_mamba_ssm_ssd.py |
kernels/moe | test_moe.py, test_cutlass_moe.py, test_triton_moe_ptpc_fp8.py |
Steps in test-amd.yaml may have a grade field:
| Grade | Behavior |
|---|---|
Blocking | Pipeline fails if this step fails |
| (absent) | Non-blocking; failure is logged but does not block |
Steps with optional: true require manual unblocking unless the run is a scheduled nightly.
Several AMD-specific environment variables are set within test commands:
| Variable | Value | Reason |
|---|---|---|
TORCH_NCCL_BLOCKING_WAIT | 1 | Workaround for HIP bug ROCm/hip#3876 in distributed tests |
VLLM_ROCM_CUSTOM_PAGED_ATTN | 0 | Disables custom paged attention for LoRA tests |
VLLM_WORKER_MULTIPROC_METHOD | spawn | Required for some entrypoint integration tests |
Sources: .buildkite/test-amd.yaml248-312 .buildkite/scripts/hardware_ci/run-amd-test.sh247-260
| Label | Agent Pool | Notes |
|---|---|---|
Basic Correctness Test | mi325_1 | fast_check: true; runs test_cumem.py, test_basic_correctness.py, test_cpu_offload.py |
Distributed Tests (4 GPUs) | mi325_4 | Sets TORCH_NCCL_BLOCKING_WAIT=1; includes torchrun tests with TP, PP, DP, EP |
Distributed Tests (8 GPUs) | mi325_8 | torchrun with TP=2, DP=4, EP |
Kernels Attention Test %N | mi325_1 | Parallel sharding with parallelism: 2 |
OpenAI API correctness | mi325_1 | Runs tools/install_torchcodec_rocm.sh first |
V1 Test others | mi325_1 | Installs kv_connectors.txt requirements; runs kv_offload, spec_decode, kv_connector tests |
Sources: .buildkite/test-amd.yaml107-124 .buildkite/test-amd.yaml227-290 .buildkite/test-amd.yaml688-744 .buildkite/test-amd.yaml442-468
Some test steps in test-amd.yaml use the num_nodes field to simulate multi-node setups by launching multiple containers on a single host connected through a docker-net Docker network.
The bracket command syntax used for multi-node steps looks like:
The is_multi_node function .buildkite/scripts/hardware_ci/run-amd-test.sh88-100 detects this via either NUM_NODES > 1 or the regex pattern \[.*\].*\&\&.*\[.*\].
The cleanup_network function .buildkite/scripts/hardware_ci/run-amd-test.sh76-86 ensures Docker containers named node0, node1, etc. and the docker-net network are removed after the test completes, both on success and failure.
Some test files contain explicit ROCm guards. For example, tests/samplers/test_beam_search.py applies additional engine kwargs on ROCm to enforce deterministic output:
tests/samplers/test_beam_search.py22-31
This guards against non-associative floating-point reductions in ROCm attention and GEMM kernels. The current_platform.is_rocm() check resolves through vllm/platforms/__init__.py via the rocm_platform_plugin function vllm/platforms/__init__.py110-128
Similarly, monkeypatch.setenv("VLLM_ROCM_USE_SKINNY_GEMM", "0") is applied in beam search tests when ROCm is detected tests/samplers/test_beam_search.py57-59
XPU testing uses a separate script .buildkite/scripts/hardware_ci/run-xpu-test.sh and a dedicated Dockerfile docker/Dockerfile.xpu
Diagram: XPU Test Execution
Sources: .buildkite/scripts/hardware_ci/run-xpu-test.sh1-56
Unlike AMD testing, XPU testing builds the Docker image locally at test time rather than pulling a pre-built image. The Dockerfile.xpu base image is intel/deep-learning-essentials:2025.3.2-0-devel-ubuntu24.04 and installs PyTorch XPU (torch==2.10.0+xpu) along with the vllm_xpu_kernels wheel.
Key XPU-specific environment variables set in docker/Dockerfile.xpu:
VLLM_TARGET_DEVICE=xpuVLLM_WORKER_MULTIPROC_METHOD=spawnThe XPUWorker class vllm/v1/worker/xpu_worker.py25-115 extends the base Worker and initializes XPUModelRunner with XPU-specific device setup via torch.xpu APIs.
The run-xpu-test.sh script explicitly ignores certain test files that are not yet supported on XPU:
| Test Directory | Ignored Files |
|---|---|
v1/sample | test_logprobs.py, test_logprobs_e2e.py |
v1/worker | test_gpu_model_runner.py |
v1/spec_decode | test_max_len.py, test_tree_attention.py, test_speculators_eagle3.py, test_acceptance_length.py |
v1/kv_connector/unit | test_multi_connector.py, test_nixl_connector.py, test_example_connector.py, test_lmcache_integration.py |
Sources: .buildkite/scripts/hardware_ci/run-xpu-test.sh46-55
| File | Role |
|---|---|
| .buildkite/test-amd.yaml | Defines all AMD test steps, agent pools, commands, and mirror hardware configs |
| .buildkite/scripts/hardware_ci/run-amd-test.sh | Orchestrates GPU state, Docker operations, command processing, and routing for AMD tests |
| .buildkite/scripts/hardware_ci/run-xpu-test.sh | Builds and runs XPU Docker container for Intel GPU tests |
| docker/Dockerfile.xpu | XPU CI container image definition |
| vllm/platforms/__init__.py | Platform detection logic (rocm_platform_plugin, xpu_platform_plugin) used by test guards |
| vllm/v1/worker/xpu_worker.py | XPUWorker implementation for Intel XPU |
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.