AMD-Specific Testing

Relevant source files

This page covers the CI test infrastructure for AMD/ROCm GPU hardware in vLLM, including the run-amd-test.sh execution script, the test-amd.yaml test definitions, multi-node test configuration, ROCm-specific test exclusions, and XPU test procedures.

For general test organization and directory structure, see 12.1. For Buildkite CI pipeline structure and the modular test_areas/ YAML system, see 12.2. For the ROCm platform implementation itself, see 10.3.

Infrastructure Overview

AMD/ROCm tests run inside Docker containers pulled from a per-commit image (rocm/vllm-ci:${BUILDKITE_COMMIT}) on bare-metal Buildkite agents equipped with AMD GPUs. The entry point for all AMD test execution is .buildkite/scripts/hardware_ci/run-amd-test.sh

Test definition file: .buildkite/test-amd.yaml — defines every test step that runs on AMD hardware, using the same step schema as the CUDA pipeline (label, commands, timeouts, agent_pool, etc.).

The following diagram maps the overall flow from Buildkite to test execution:

Diagram: AMD CI Execution Flow

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh1-484 .buildkite/test-amd.yaml1-50

Agent Pools and Hardware Tiers

GPU Agent Pools

Each test step in test-amd.yaml specifies an agent_pool that determines how many AMD GPUs are allocated:

Agent Pool	GPU Count	Typical Usage
`mi325_1`	1 GPU	Single-GPU unit tests, basic correctness, kernel tests
`mi325_2`	2 GPUs	Metrics/tracing tests, 2-GPU distributed tests
`mi325_4`	4 GPUs	4-GPU distributed, EPLB execution, speculative decode
`mi325_8`	8 GPUs	8-GPU distributed tests (TP=2, DP=4)

Hardware Tiers

The mirror_hardwares field in test-amd.yaml selects which AMD hardware class receives a copy of a given test step:

Tier	Description
`amdproduction`	Stable production AMD machines; most tests mirror here
`amdexperimental`	Experimental AMD hardware; subset of tests
`amdtentative`	Staging/tentative hardware; typically only fast/blocking tests

Example from test-amd.yaml:

In the newer modular test_areas/ YAML format (see 12.2), the equivalent is a mirror.amd block per step:

Sources: .buildkite/test-amd.yaml35-50 .buildkite/test_areas/engine.yaml32-39 .buildkite/test_areas/entrypoints.yaml27-31

The `run-amd-test.sh` Script

The script .buildkite/scripts/hardware_ci/run-amd-test.sh performs all setup and orchestration before invoking the Docker container.

Key Functions

Diagram: run-amd-test.sh Function Map

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh38-86 .buildkite/scripts/hardware_ci/run-amd-test.sh132-237 .buildkite/scripts/hardware_ci/run-amd-test.sh247-331

GPU State Management

Before pulling any image, the script waits up to 300 seconds for the AMD GPU state file at /opt/amdgpu/etc/gpu_state to read clean. After this check, it resets the GPUs and waits again:

.buildkite/scripts/hardware_ci/run-amd-test.sh38-53

wait_for_clean_gpus  →  cleanup_docker  →  echo "reset" > /opt/amdgpu/etc/gpu_state  →  wait_for_clean_gpus

Command Source Selection

Test commands may be passed in two ways:

Method	Variable	Notes
Preferred	`VLLM_TEST_COMMANDS`	Single-quoted assignment preserves inner double quotes
Legacy	Positional args (`$*`)	Inner double quotes may be stripped by calling shell

.buildkite/scripts/hardware_ci/run-amd-test.sh370-390

Pytest Marker Re-quoting (`re_quote_pytest_markers`)

When commands pass through $*, pytest -m and -k expressions lose their quotes. For example:

pytest -v -s -m 'not cpu_test' v1/core

becomes:

pytest -v -s -m not cpu_test v1/core

The re_quote_pytest_markers function reconstructs quotes by collecting tokens after -m or -k until it hits a boundary token (path containing /, .py file, --flag, &&, etc.).

.buildkite/scripts/hardware_ci/run-amd-test.sh132-237

Single-Node Docker Invocation

.buildkite/scripts/hardware_ci/run-amd-test.sh464-483

Key flags:

/dev/kfd — AMD kernel fusion driver device
$BUILDKITE_AGENT_META_DATA_RENDER_DEVICES — render device paths (e.g. /dev/dri/renderD128)
--group-add render_gid — grants GPU access to the render group
$RDMA_FLAGS — conditionally adds --device /dev/infiniband --cap-add=IPC_LOCK if host has RDMA devices

Multi-Node Detection and Routing

Multi-node is detected via two signals, checked in is_multi_node():

NUM_NODES environment variable is greater than 1
Command string contains the bracket syntax: [node0_cmds] && [node1_cmds]

When multi-node is detected, the script parses bracket-delimited per-node command arrays and calls run-multi-node-test.sh for each command pair:

.buildkite/scripts/hardware_ci/run-amd-test.sh422-463

ROCm-Specific Test Exclusions

The apply_rocm_test_overrides function .buildkite/scripts/hardware_ci/run-amd-test.sh247-331 appends --ignore flags and environment overrides for tests that are not yet supported or behave differently on ROCm. This function modifies the command string in-place before the container is launched.

Diagram: apply_rocm_test_overrides Coverage

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh247-331

Kernel Exclusions Summary

Kernel Category	Excluded Tests
`kernels/core`	`test_fused_quant_layernorm.py`, `test_permute_cols.py`
`kernels/attention`	`test_attention_selector.py`, `test_encoder_decoder_attn.py`, `test_flash_attn.py`, `test_flashinfer.py`, `test_prefix_prefill.py`, `test_cascade_flash_attn.py`, `test_mha_attn.py`, `test_lightning_attn.py`, `test_attention.py`
`kernels/quantization`	`test_int8_quant.py`, `test_machete_mm.py`, `test_block_fp8.py`, `test_block_int8.py`, `test_marlin_gemm.py`, `test_cutlass_scaled_mm.py`, `test_int8_kernel.py`
`kernels/mamba`	`test_mamba_mixer2.py`, `test_causal_conv1d.py`, `test_mamba_ssm_ssd.py`
`kernels/moe`	`test_moe.py`, `test_cutlass_moe.py`, `test_triton_moe_ptpc_fp8.py`

AMD Test Steps

Step Grades

Steps in test-amd.yaml may have a grade field:

Grade	Behavior
`Blocking`	Pipeline fails if this step fails
(absent)	Non-blocking; failure is logged but does not block

Steps with optional: true require manual unblocking unless the run is a scheduled nightly.

Environment Workarounds

Several AMD-specific environment variables are set within test commands:

Variable	Value	Reason
`TORCH_NCCL_BLOCKING_WAIT`	`1`	Workaround for HIP bug ROCm/hip#3876 in distributed tests
`VLLM_ROCM_CUSTOM_PAGED_ATTN`	`0`	Disables custom paged attention for LoRA tests
`VLLM_WORKER_MULTIPROC_METHOD`	`spawn`	Required for some entrypoint integration tests

Sources: .buildkite/test-amd.yaml248-312 .buildkite/scripts/hardware_ci/run-amd-test.sh247-260

Selected Notable Steps

Label	Agent Pool	Notes
`Basic Correctness Test`	`mi325_1`	`fast_check: true`; runs `test_cumem.py`, `test_basic_correctness.py`, `test_cpu_offload.py`
`Distributed Tests (4 GPUs)`	`mi325_4`	Sets `TORCH_NCCL_BLOCKING_WAIT=1`; includes torchrun tests with TP, PP, DP, EP
`Distributed Tests (8 GPUs)`	`mi325_8`	torchrun with TP=2, DP=4, EP
`Kernels Attention Test %N`	`mi325_1`	Parallel sharding with `parallelism: 2`
`OpenAI API correctness`	`mi325_1`	Runs `tools/install_torchcodec_rocm.sh` first
`V1 Test others`	`mi325_1`	Installs `kv_connectors.txt` requirements; runs kv_offload, spec_decode, kv_connector tests

Sources: .buildkite/test-amd.yaml107-124 .buildkite/test-amd.yaml227-290 .buildkite/test-amd.yaml688-744 .buildkite/test-amd.yaml442-468

Multi-Node AMD Test Configuration

Some test steps in test-amd.yaml use the num_nodes field to simulate multi-node setups by launching multiple containers on a single host connected through a docker-net Docker network.

The bracket command syntax used for multi-node steps looks like:

The is_multi_node function .buildkite/scripts/hardware_ci/run-amd-test.sh88-100 detects this via either NUM_NODES > 1 or the regex pattern \[.*\].*\&\&.*\[.*\].

The cleanup_network function .buildkite/scripts/hardware_ci/run-amd-test.sh76-86 ensures Docker containers named node0, node1, etc. and the docker-net network are removed after the test completes, both on success and failure.

ROCm-Specific Adaptations in Test Code

Some test files contain explicit ROCm guards. For example, tests/samplers/test_beam_search.py applies additional engine kwargs on ROCm to enforce deterministic output:

tests/samplers/test_beam_search.py22-31

This guards against non-associative floating-point reductions in ROCm attention and GEMM kernels. The current_platform.is_rocm() check resolves through vllm/platforms/__init__.py via the rocm_platform_plugin function vllm/platforms/__init__.py110-128

Similarly, monkeypatch.setenv("VLLM_ROCM_USE_SKINNY_GEMM", "0") is applied in beam search tests when ROCm is detected tests/samplers/test_beam_search.py57-59

XPU (Intel GPU) Testing

XPU testing uses a separate script .buildkite/scripts/hardware_ci/run-xpu-test.sh and a dedicated Dockerfile docker/Dockerfile.xpu

Diagram: XPU Test Execution

Sources: .buildkite/scripts/hardware_ci/run-xpu-test.sh1-56

Unlike AMD testing, XPU testing builds the Docker image locally at test time rather than pulling a pre-built image. The Dockerfile.xpu base image is intel/deep-learning-essentials:2025.3.2-0-devel-ubuntu24.04 and installs PyTorch XPU (torch==2.10.0+xpu) along with the vllm_xpu_kernels wheel.

Key XPU-specific environment variables set in docker/Dockerfile.xpu:

VLLM_TARGET_DEVICE=xpu
VLLM_WORKER_MULTIPROC_METHOD=spawn

The XPUWorker class vllm/v1/worker/xpu_worker.py25-115 extends the base Worker and initializes XPUModelRunner with XPU-specific device setup via torch.xpu APIs.

XPU Test Exclusions

The run-xpu-test.sh script explicitly ignores certain test files that are not yet supported on XPU:

Test Directory	Ignored Files
`v1/sample`	`test_logprobs.py`, `test_logprobs_e2e.py`
`v1/worker`	`test_gpu_model_runner.py`
`v1/spec_decode`	`test_max_len.py`, `test_tree_attention.py`, `test_speculators_eagle3.py`, `test_acceptance_length.py`
`v1/kv_connector/unit`	`test_multi_connector.py`, `test_nixl_connector.py`, `test_example_connector.py`, `test_lmcache_integration.py`

Sources: .buildkite/scripts/hardware_ci/run-xpu-test.sh46-55

Summary of Files

File	Role
.buildkite/test-amd.yaml	Defines all AMD test steps, agent pools, commands, and mirror hardware configs
.buildkite/scripts/hardware_ci/run-amd-test.sh	Orchestrates GPU state, Docker operations, command processing, and routing for AMD tests
.buildkite/scripts/hardware_ci/run-xpu-test.sh	Builds and runs XPU Docker container for Intel GPU tests
docker/Dockerfile.xpu	XPU CI container image definition
vllm/platforms/__init__.py	Platform detection logic (`rocm_platform_plugin`, `xpu_platform_plugin`) used by test guards
vllm/v1/worker/xpu_worker.py	`XPUWorker` implementation for Intel XPU

AMD-Specific Testing

Relevant source files

Infrastructure Overview

The following diagram maps the overall flow from Buildkite to test execution:

Diagram: AMD CI Execution Flow

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh1-484 .buildkite/test-amd.yaml1-50

Agent Pools and Hardware Tiers

GPU Agent Pools

Each test step in test-amd.yaml specifies an agent_pool that determines how many AMD GPUs are allocated:

Agent Pool	GPU Count	Typical Usage
`mi325_1`	1 GPU	Single-GPU unit tests, basic correctness, kernel tests
`mi325_2`	2 GPUs	Metrics/tracing tests, 2-GPU distributed tests
`mi325_4`	4 GPUs	4-GPU distributed, EPLB execution, speculative decode
`mi325_8`	8 GPUs	8-GPU distributed tests (TP=2, DP=4)

Hardware Tiers

The mirror_hardwares field in test-amd.yaml selects which AMD hardware class receives a copy of a given test step:

Tier	Description
`amdproduction`	Stable production AMD machines; most tests mirror here
`amdexperimental`	Experimental AMD hardware; subset of tests
`amdtentative`	Staging/tentative hardware; typically only fast/blocking tests

Example from test-amd.yaml:

In the newer modular test_areas/ YAML format (see 12.2), the equivalent is a mirror.amd block per step:

Sources: .buildkite/test-amd.yaml35-50 .buildkite/test_areas/engine.yaml32-39 .buildkite/test_areas/entrypoints.yaml27-31

The `run-amd-test.sh` Script

The script .buildkite/scripts/hardware_ci/run-amd-test.sh performs all setup and orchestration before invoking the Docker container.

Key Functions

Diagram: run-amd-test.sh Function Map

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh38-86 .buildkite/scripts/hardware_ci/run-amd-test.sh132-237 .buildkite/scripts/hardware_ci/run-amd-test.sh247-331

GPU State Management

Before pulling any image, the script waits up to 300 seconds for the AMD GPU state file at /opt/amdgpu/etc/gpu_state to read clean. After this check, it resets the GPUs and waits again:

.buildkite/scripts/hardware_ci/run-amd-test.sh38-53

wait_for_clean_gpus  →  cleanup_docker  →  echo "reset" > /opt/amdgpu/etc/gpu_state  →  wait_for_clean_gpus

Command Source Selection

Test commands may be passed in two ways:

Method	Variable	Notes
Preferred	`VLLM_TEST_COMMANDS`	Single-quoted assignment preserves inner double quotes
Legacy	Positional args (`$*`)	Inner double quotes may be stripped by calling shell

.buildkite/scripts/hardware_ci/run-amd-test.sh370-390

Pytest Marker Re-quoting (`re_quote_pytest_markers`)

When commands pass through $*, pytest -m and -k expressions lose their quotes. For example:

pytest -v -s -m 'not cpu_test' v1/core

becomes:

pytest -v -s -m not cpu_test v1/core

The re_quote_pytest_markers function reconstructs quotes by collecting tokens after -m or -k until it hits a boundary token (path containing /, .py file, --flag, &&, etc.).

.buildkite/scripts/hardware_ci/run-amd-test.sh132-237

Single-Node Docker Invocation

.buildkite/scripts/hardware_ci/run-amd-test.sh464-483

Key flags:

/dev/kfd — AMD kernel fusion driver device
$BUILDKITE_AGENT_META_DATA_RENDER_DEVICES — render device paths (e.g. /dev/dri/renderD128)
--group-add render_gid — grants GPU access to the render group
$RDMA_FLAGS — conditionally adds --device /dev/infiniband --cap-add=IPC_LOCK if host has RDMA devices

Multi-Node Detection and Routing

Multi-node is detected via two signals, checked in is_multi_node():

NUM_NODES environment variable is greater than 1
Command string contains the bracket syntax: [node0_cmds] && [node1_cmds]

When multi-node is detected, the script parses bracket-delimited per-node command arrays and calls run-multi-node-test.sh for each command pair:

.buildkite/scripts/hardware_ci/run-amd-test.sh422-463

ROCm-Specific Test Exclusions

Diagram: apply_rocm_test_overrides Coverage

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh247-331

Kernel Exclusions Summary

Kernel Category	Excluded Tests
`kernels/core`	`test_fused_quant_layernorm.py`, `test_permute_cols.py`
`kernels/attention`	`test_attention_selector.py`, `test_encoder_decoder_attn.py`, `test_flash_attn.py`, `test_flashinfer.py`, `test_prefix_prefill.py`, `test_cascade_flash_attn.py`, `test_mha_attn.py`, `test_lightning_attn.py`, `test_attention.py`
`kernels/quantization`	`test_int8_quant.py`, `test_machete_mm.py`, `test_block_fp8.py`, `test_block_int8.py`, `test_marlin_gemm.py`, `test_cutlass_scaled_mm.py`, `test_int8_kernel.py`
`kernels/mamba`	`test_mamba_mixer2.py`, `test_causal_conv1d.py`, `test_mamba_ssm_ssd.py`
`kernels/moe`	`test_moe.py`, `test_cutlass_moe.py`, `test_triton_moe_ptpc_fp8.py`

AMD Test Steps

Step Grades

Steps in test-amd.yaml may have a grade field:

Grade	Behavior
`Blocking`	Pipeline fails if this step fails
(absent)	Non-blocking; failure is logged but does not block

Steps with optional: true require manual unblocking unless the run is a scheduled nightly.

Environment Workarounds

Several AMD-specific environment variables are set within test commands:

Variable	Value	Reason
`TORCH_NCCL_BLOCKING_WAIT`	`1`	Workaround for HIP bug ROCm/hip#3876 in distributed tests
`VLLM_ROCM_CUSTOM_PAGED_ATTN`	`0`	Disables custom paged attention for LoRA tests
`VLLM_WORKER_MULTIPROC_METHOD`	`spawn`	Required for some entrypoint integration tests

Sources: .buildkite/test-amd.yaml248-312 .buildkite/scripts/hardware_ci/run-amd-test.sh247-260

Selected Notable Steps

Label	Agent Pool	Notes
`Basic Correctness Test`	`mi325_1`	`fast_check: true`; runs `test_cumem.py`, `test_basic_correctness.py`, `test_cpu_offload.py`
`Distributed Tests (4 GPUs)`	`mi325_4`	Sets `TORCH_NCCL_BLOCKING_WAIT=1`; includes torchrun tests with TP, PP, DP, EP
`Distributed Tests (8 GPUs)`	`mi325_8`	torchrun with TP=2, DP=4, EP
`Kernels Attention Test %N`	`mi325_1`	Parallel sharding with `parallelism: 2`
`OpenAI API correctness`	`mi325_1`	Runs `tools/install_torchcodec_rocm.sh` first
`V1 Test others`	`mi325_1`	Installs `kv_connectors.txt` requirements; runs kv_offload, spec_decode, kv_connector tests

Sources: .buildkite/test-amd.yaml107-124 .buildkite/test-amd.yaml227-290 .buildkite/test-amd.yaml688-744 .buildkite/test-amd.yaml442-468

Multi-Node AMD Test Configuration

Some test steps in test-amd.yaml use the num_nodes field to simulate multi-node setups by launching multiple containers on a single host connected through a docker-net Docker network.

The bracket command syntax used for multi-node steps looks like:

The is_multi_node function .buildkite/scripts/hardware_ci/run-amd-test.sh88-100 detects this via either NUM_NODES > 1 or the regex pattern \[.*\].*\&\&.*\[.*\].

ROCm-Specific Adaptations in Test Code

Some test files contain explicit ROCm guards. For example, tests/samplers/test_beam_search.py applies additional engine kwargs on ROCm to enforce deterministic output:

tests/samplers/test_beam_search.py22-31

Similarly, monkeypatch.setenv("VLLM_ROCM_USE_SKINNY_GEMM", "0") is applied in beam search tests when ROCm is detected tests/samplers/test_beam_search.py57-59

XPU (Intel GPU) Testing

XPU testing uses a separate script .buildkite/scripts/hardware_ci/run-xpu-test.sh and a dedicated Dockerfile docker/Dockerfile.xpu

Diagram: XPU Test Execution

Sources: .buildkite/scripts/hardware_ci/run-xpu-test.sh1-56

Key XPU-specific environment variables set in docker/Dockerfile.xpu:

VLLM_TARGET_DEVICE=xpu
VLLM_WORKER_MULTIPROC_METHOD=spawn

The XPUWorker class vllm/v1/worker/xpu_worker.py25-115 extends the base Worker and initializes XPUModelRunner with XPU-specific device setup via torch.xpu APIs.

XPU Test Exclusions

The run-xpu-test.sh script explicitly ignores certain test files that are not yet supported on XPU:

Test Directory	Ignored Files
`v1/sample`	`test_logprobs.py`, `test_logprobs_e2e.py`
`v1/worker`	`test_gpu_model_runner.py`
`v1/spec_decode`	`test_max_len.py`, `test_tree_attention.py`, `test_speculators_eagle3.py`, `test_acceptance_length.py`
`v1/kv_connector/unit`	`test_multi_connector.py`, `test_nixl_connector.py`, `test_example_connector.py`, `test_lmcache_integration.py`

Sources: .buildkite/scripts/hardware_ci/run-xpu-test.sh46-55

Summary of Files

File	Role
.buildkite/test-amd.yaml	Defines all AMD test steps, agent pools, commands, and mirror hardware configs
.buildkite/scripts/hardware_ci/run-amd-test.sh	Orchestrates GPU state, Docker operations, command processing, and routing for AMD tests
.buildkite/scripts/hardware_ci/run-xpu-test.sh	Builds and runs XPU Docker container for Intel GPU tests
docker/Dockerfile.xpu	XPU CI container image definition
vllm/platforms/__init__.py	Platform detection logic (`rocm_platform_plugin`, `xpu_platform_plugin`) used by test guards
vllm/v1/worker/xpu_worker.py	`XPUWorker` implementation for Intel XPU

AMD-Specific Testing

Infrastructure Overview

Agent Pools and Hardware Tiers

GPU Agent Pools

Hardware Tiers

The run-amd-test.sh Script

Key Functions

GPU State Management

Command Source Selection

Pytest Marker Re-quoting (re_quote_pytest_markers)

Single-Node Docker Invocation

Multi-Node Detection and Routing

ROCm-Specific Test Exclusions

Kernel Exclusions Summary

AMD Test Steps

Step Grades

Environment Workarounds

Selected Notable Steps

Multi-Node AMD Test Configuration

ROCm-Specific Adaptations in Test Code

XPU (Intel GPU) Testing

XPU Test Exclusions

Summary of Files

On this page

AMD-Specific Testing

Infrastructure Overview

Agent Pools and Hardware Tiers

GPU Agent Pools

Hardware Tiers

The run-amd-test.sh Script

Key Functions

GPU State Management

Command Source Selection

Pytest Marker Re-quoting (re_quote_pytest_markers)

Single-Node Docker Invocation

Multi-Node Detection and Routing

ROCm-Specific Test Exclusions

Kernel Exclusions Summary

AMD Test Steps

Step Grades

Environment Workarounds

Selected Notable Steps

Multi-Node AMD Test Configuration

ROCm-Specific Adaptations in Test Code

XPU (Intel GPU) Testing

XPU Test Exclusions

Summary of Files

On this page

The `run-amd-test.sh` Script

Pytest Marker Re-quoting (`re_quote_pytest_markers`)

The `run-amd-test.sh` Script

Pytest Marker Re-quoting (`re_quote_pytest_markers`)