Test Organization

Relevant source files

Purpose and Scope

This document explains the organization of tests in the vLLM codebase, including the directory structure, pytest configuration, test categories and markers, and test filtering mechanisms. It covers how tests are structured, categorized, and selected for execution based on various criteria.

For information about the CI/CD pipeline configuration and test execution infrastructure, see Buildkite CI Pipelines. For AMD-specific testing details, see AMD-Specific Testing. For model correctness testing methodology, see Model Correctness Testing.

Test Directory Structure

vLLM organizes tests in a hierarchical directory structure under the tests/ directory. Each directory corresponds to a major functional area of the codebase.

Top-Level Test Organization

Sources: .buildkite/test-amd.yaml56-1004

Detailed Directory Breakdown

Directory	Purpose	Key Tests
`basic_correctness/`	Fundamental inference correctness	`test_basic_correctness.py`, `test_cpu_offload.py`, `test_cumem.py`
`entrypoints/`	API and interface tests	`llm/`, `openai/`, `rpc/`, `pooling/`, `offline_mode/`
`distributed/`	Parallel execution tests	`test_utils.py`, `test_pynccl.py`, `test_eplb_*.py`
`kernels/`	Low-level kernel operations	`core/`, `attention/`, `quantization/`, `moe/`, `mamba/`, `helion/`
`models/`	Model loading and execution	`test_initialization.py`, `language/`, `multimodal/`
`engine/`	Engine core functionality	Engine scheduling, output processing tests
`compile/`	PyTorch compilation	`fullgraph/`, unit tests for torch.compile
`lora/`	LoRA adapter functionality	Multi-LoRA, LoRA correctness tests
`quantization/`	Quantization methods	FP8, INT4, MXFP4 tests
`v1/`	V1 engine architecture	`e2e/`, `engine/`, `entrypoints/`, `core/`, `attention/`

Sources: .buildkite/test-amd.yaml51-1004

Test Categories and Pytest Markers

vLLM uses pytest markers to categorize tests and enable selective execution. Markers are defined in test files and used in CI configuration to filter test execution.

Core Pytest Markers

Sources: .buildkite/test-amd.yaml901-943

Marker Usage Patterns

Core Model Tests

Tests marked with core_model are essential model tests that run in every CI build. These tests verify basic model functionality.

Sources: .buildkite/test-amd.yaml889-901

Slow Tests

Tests marked as slow_test are resource-intensive tests that are sharded across multiple parallel jobs to reduce total execution time.

Sources: .buildkite/test-amd.yaml903-922

CPU Tests

Tests marked with cpu_test run without GPU access and can execute on CPU-only agents.

Sources: .buildkite/test-amd.yaml477-491

Hybrid Model Tests

Tests for hybrid architecture models (e.g., models with Mamba layers) require special dependencies and are marked accordingly.

Sources: .buildkite/test-amd.yaml924-943

Test Configuration and Selection

Source File Dependencies

Tests are configured with source_file_dependencies to determine when they should run based on changed files. This optimizes CI execution by skipping irrelevant tests.

Example Dependency Configuration

From .buildkite/test-amd.yaml, tests specify which source files should trigger their execution:

Basic Correctness Test:

Kernel Tests:

Model Tests:

Sources: .buildkite/test-amd.yaml114-850

Test Execution Patterns

Parallelism and Sharding

vLLM tests use pytest's parallelism features to distribute tests across multiple jobs, reducing total execution time.

Sharding Configuration Example

LoRA Tests with 4-way Parallelism:

Attention Kernel Tests with 2-way Parallelism:

Sources: .buildkite/test-amd.yaml554-663

Test Filtering and Ignoring

Tests can be filtered using pytest's selection mechanisms:

By Marker

By Test Name Pattern

By Ignored Files/Directories

Sources: .buildkite/test-amd.yaml136-858

Test Categories by Functional Area

Entrypoint Tests

Tests for user-facing interfaces are organized under tests/entrypoints/:

Entrypoint Test Organization Pattern

LLM Tests (30 minutes):

OpenAI API Tests (100+ minutes, split into multiple steps):

Sources: .buildkite/test-amd.yaml139-222

Distributed Tests

Multi-GPU and multi-node tests are organized under tests/distributed/:

Distributed Test Execution Example

4-GPU Tests:

Sources: .buildkite/test-amd.yaml223-284

Kernel Tests

Low-level kernel tests are organized under tests/kernels/:

Kernel Test Timing and Sharding

Test Category	Duration	Sharding	GPU Requirements
Core kernels	48 min	None	1 GPU
Attention kernels	23 min	2-way	1 GPU
Quantization kernels	64 min	2-way	1 GPU
MoE kernels	40 min	2-way	1 GPU
Mamba kernels	31 min	None	1 GPU

Sources: .buildkite/test-amd.yaml638-706

Model Tests

Model-specific tests are organized under tests/models/:

Model Test Organization

Standard Language Models (25 minutes):

Extra Standard Models (45 minutes, sharded):

Hybrid Models (75 minutes, sharded):

Extended Tests (optional, not in fast check):

Sources: .buildkite/test-amd.yaml828-995

Test Configuration Files

Pytest Configuration

vLLM uses pytest configuration to define markers, test collection rules, and execution parameters. The configuration is typically defined in pytest.ini or pyproject.toml.

Common Pytest Options

Tests are typically invoked with consistent options:

-v: Verbose output showing individual test names
-s: Show print statements and stdout during test execution

Test Discovery Patterns

Pytest discovers tests based on naming conventions:

Test files: test_*.py or *_test.py
Test functions: test_*
Test classes: Test*

Sources: .buildkite/test-amd.yaml62-798

Platform-Specific Test Filtering

AMD/ROCm Test Filtering

AMD tests apply additional filtering to exclude tests that are not compatible with ROCm. This is implemented in .buildkite/scripts/hardware_ci/run-amd-test.sh.

ROCm Test Filtering Examples

Kernel Core Tests:

Kernel Attention Tests:

Entrypoints/OpenAI Tests:

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh104-170

Special Test Patterns

Multi-GPU Test Decorator

Tests requiring multiple GPUs use the @multi_gpu_test decorator to specify GPU requirements.

Sources: tests/lora/test_llm_with_multi_loras.py55-67

Test Fixtures and Conftest

Tests use pytest fixtures defined in conftest.py files for shared setup and teardown logic. Common fixtures include:

Model loading fixtures
Tokenizer fixtures
Server startup/shutdown fixtures
Temporary file/directory fixtures
LoRA adapter fixtures (e.g., qwen3_meowing_lora_files, qwen3_woofing_lora_files)

Sources: tests/lora/test_llm_with_multi_loras.py192-241

Standalone Test Scripts

Some tests are implemented as standalone scripts in tests/standalone_tests/:

lazy_imports.py - Verifies lazy import behavior
python_only_compile.sh - Tests Python-only installation
pytorch_nightly_dependency.sh - Checks PyTorch nightly compatibility

These scripts are executed directly rather than through pytest:

Sources: .buildkite/test-amd.yaml38-105

Test Execution Environment

Environment Variables

Tests commonly use environment variables to control behavior:

Variable	Purpose
`VLLM_WORKER_MULTIPROC_METHOD`	Set to `spawn` for clean worker processes
`VLLM_ROCM_CUSTOM_PAGED_ATTN`	Control custom paged attention (ROCm)
`TORCH_NCCL_BLOCKING_WAIT`	Workaround for HIP bug on ROCm
`VLLM_TEST_FORCE_LOAD_FORMAT`	Force specific weight loading format
`PYTHONPATH`	Set to include vLLM workspace

Example Usage:

Sources: .buildkite/test-amd.yaml120-246

Working Directory

Most tests execute from /vllm-workspace/tests directory:

Example tests execute from /vllm-workspace/examples:

Sources: .buildkite/test-amd.yaml130-508

Summary

The vLLM test organization provides a structured approach to testing:

Hierarchical directory structure organizes tests by functional area
Pytest markers enable selective test execution (core_model, slow_test, hybrid_model, cpu_test)
Source file dependencies optimize CI by running only affected tests
Parallelism and sharding reduce execution time for large test suites
Platform-specific filtering handles ROCm/AMD compatibility
Special decorators and fixtures support multi-GPU and complex test scenarios

This organization allows efficient test execution in CI while maintaining comprehensive coverage of vLLM's functionality.

Test Organization

Relevant source files

Purpose and Scope

Test Directory Structure

vLLM organizes tests in a hierarchical directory structure under the tests/ directory. Each directory corresponds to a major functional area of the codebase.

Top-Level Test Organization

Sources: .buildkite/test-amd.yaml56-1004

Detailed Directory Breakdown

Directory	Purpose	Key Tests
`basic_correctness/`	Fundamental inference correctness	`test_basic_correctness.py`, `test_cpu_offload.py`, `test_cumem.py`
`entrypoints/`	API and interface tests	`llm/`, `openai/`, `rpc/`, `pooling/`, `offline_mode/`
`distributed/`	Parallel execution tests	`test_utils.py`, `test_pynccl.py`, `test_eplb_*.py`
`kernels/`	Low-level kernel operations	`core/`, `attention/`, `quantization/`, `moe/`, `mamba/`, `helion/`
`models/`	Model loading and execution	`test_initialization.py`, `language/`, `multimodal/`
`engine/`	Engine core functionality	Engine scheduling, output processing tests
`compile/`	PyTorch compilation	`fullgraph/`, unit tests for torch.compile
`lora/`	LoRA adapter functionality	Multi-LoRA, LoRA correctness tests
`quantization/`	Quantization methods	FP8, INT4, MXFP4 tests
`v1/`	V1 engine architecture	`e2e/`, `engine/`, `entrypoints/`, `core/`, `attention/`

Sources: .buildkite/test-amd.yaml51-1004

Test Categories and Pytest Markers

vLLM uses pytest markers to categorize tests and enable selective execution. Markers are defined in test files and used in CI configuration to filter test execution.

Core Pytest Markers

Sources: .buildkite/test-amd.yaml901-943

Marker Usage Patterns

Core Model Tests

Tests marked with core_model are essential model tests that run in every CI build. These tests verify basic model functionality.

Sources: .buildkite/test-amd.yaml889-901

Slow Tests

Tests marked as slow_test are resource-intensive tests that are sharded across multiple parallel jobs to reduce total execution time.

Sources: .buildkite/test-amd.yaml903-922

CPU Tests

Tests marked with cpu_test run without GPU access and can execute on CPU-only agents.

Sources: .buildkite/test-amd.yaml477-491

Hybrid Model Tests

Tests for hybrid architecture models (e.g., models with Mamba layers) require special dependencies and are marked accordingly.

Sources: .buildkite/test-amd.yaml924-943

Test Configuration and Selection

Source File Dependencies

Tests are configured with source_file_dependencies to determine when they should run based on changed files. This optimizes CI execution by skipping irrelevant tests.

Example Dependency Configuration

From .buildkite/test-amd.yaml, tests specify which source files should trigger their execution:

Basic Correctness Test:

Kernel Tests:

Model Tests:

Sources: .buildkite/test-amd.yaml114-850

Test Execution Patterns

Parallelism and Sharding

vLLM tests use pytest's parallelism features to distribute tests across multiple jobs, reducing total execution time.

Sharding Configuration Example

LoRA Tests with 4-way Parallelism:

Attention Kernel Tests with 2-way Parallelism:

Sources: .buildkite/test-amd.yaml554-663

Test Filtering and Ignoring

Tests can be filtered using pytest's selection mechanisms:

By Marker

By Test Name Pattern

By Ignored Files/Directories

Sources: .buildkite/test-amd.yaml136-858

Test Categories by Functional Area

Entrypoint Tests

Tests for user-facing interfaces are organized under tests/entrypoints/:

Entrypoint Test Organization Pattern

LLM Tests (30 minutes):

OpenAI API Tests (100+ minutes, split into multiple steps):

Sources: .buildkite/test-amd.yaml139-222

Distributed Tests

Multi-GPU and multi-node tests are organized under tests/distributed/:

Distributed Test Execution Example

4-GPU Tests:

Sources: .buildkite/test-amd.yaml223-284

Kernel Tests

Low-level kernel tests are organized under tests/kernels/:

Kernel Test Timing and Sharding

Test Category	Duration	Sharding	GPU Requirements
Core kernels	48 min	None	1 GPU
Attention kernels	23 min	2-way	1 GPU
Quantization kernels	64 min	2-way	1 GPU
MoE kernels	40 min	2-way	1 GPU
Mamba kernels	31 min	None	1 GPU

Sources: .buildkite/test-amd.yaml638-706

Model Tests

Model-specific tests are organized under tests/models/:

Model Test Organization

Standard Language Models (25 minutes):

Extra Standard Models (45 minutes, sharded):

Hybrid Models (75 minutes, sharded):

Extended Tests (optional, not in fast check):

Sources: .buildkite/test-amd.yaml828-995

Test Configuration Files

Pytest Configuration

vLLM uses pytest configuration to define markers, test collection rules, and execution parameters. The configuration is typically defined in pytest.ini or pyproject.toml.

Common Pytest Options

Tests are typically invoked with consistent options:

-v: Verbose output showing individual test names
-s: Show print statements and stdout during test execution

Test Discovery Patterns

Pytest discovers tests based on naming conventions:

Test files: test_*.py or *_test.py
Test functions: test_*
Test classes: Test*

Sources: .buildkite/test-amd.yaml62-798

Platform-Specific Test Filtering

AMD/ROCm Test Filtering

AMD tests apply additional filtering to exclude tests that are not compatible with ROCm. This is implemented in .buildkite/scripts/hardware_ci/run-amd-test.sh.

ROCm Test Filtering Examples

Kernel Core Tests:

Kernel Attention Tests:

Entrypoints/OpenAI Tests:

Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh104-170

Special Test Patterns

Multi-GPU Test Decorator

Tests requiring multiple GPUs use the @multi_gpu_test decorator to specify GPU requirements.

Sources: tests/lora/test_llm_with_multi_loras.py55-67

Test Fixtures and Conftest

Tests use pytest fixtures defined in conftest.py files for shared setup and teardown logic. Common fixtures include:

Model loading fixtures
Tokenizer fixtures
Server startup/shutdown fixtures
Temporary file/directory fixtures
LoRA adapter fixtures (e.g., qwen3_meowing_lora_files, qwen3_woofing_lora_files)

Sources: tests/lora/test_llm_with_multi_loras.py192-241

Standalone Test Scripts

Some tests are implemented as standalone scripts in tests/standalone_tests/:

lazy_imports.py - Verifies lazy import behavior
python_only_compile.sh - Tests Python-only installation
pytorch_nightly_dependency.sh - Checks PyTorch nightly compatibility

These scripts are executed directly rather than through pytest:

Sources: .buildkite/test-amd.yaml38-105

Test Execution Environment

Environment Variables

Tests commonly use environment variables to control behavior:

Variable	Purpose
`VLLM_WORKER_MULTIPROC_METHOD`	Set to `spawn` for clean worker processes
`VLLM_ROCM_CUSTOM_PAGED_ATTN`	Control custom paged attention (ROCm)
`TORCH_NCCL_BLOCKING_WAIT`	Workaround for HIP bug on ROCm
`VLLM_TEST_FORCE_LOAD_FORMAT`	Force specific weight loading format
`PYTHONPATH`	Set to include vLLM workspace

Example Usage:

Sources: .buildkite/test-amd.yaml120-246

Working Directory

Most tests execute from /vllm-workspace/tests directory:

Example tests execute from /vllm-workspace/examples:

Sources: .buildkite/test-amd.yaml130-508

Summary

The vLLM test organization provides a structured approach to testing:

Hierarchical directory structure organizes tests by functional area
Pytest markers enable selective test execution (core_model, slow_test, hybrid_model, cpu_test)
Source file dependencies optimize CI by running only affected tests
Parallelism and sharding reduce execution time for large test suites
Platform-specific filtering handles ROCm/AMD compatibility
Special decorators and fixtures support multi-GPU and complex test scenarios

This organization allows efficient test execution in CI while maintaining comprehensive coverage of vLLM's functionality.

Test Organization

Purpose and Scope

Test Directory Structure

Top-Level Test Organization

Detailed Directory Breakdown

Test Categories and Pytest Markers

Core Pytest Markers

Marker Usage Patterns

Core Model Tests

Slow Tests

CPU Tests

Hybrid Model Tests

Test Configuration and Selection

Source File Dependencies

Example Dependency Configuration

Test Execution Patterns

Parallelism and Sharding

Sharding Configuration Example

Test Filtering and Ignoring

By Marker

By Test Name Pattern

By Ignored Files/Directories

Test Categories by Functional Area

Entrypoint Tests

Entrypoint Test Organization Pattern

Distributed Tests

Distributed Test Execution Example

Kernel Tests

Kernel Test Timing and Sharding

Model Tests

Model Test Organization

Test Configuration Files

Pytest Configuration

Common Pytest Options

Test Discovery Patterns

Platform-Specific Test Filtering

AMD/ROCm Test Filtering

ROCm Test Filtering Examples

Special Test Patterns

Multi-GPU Test Decorator

Test Fixtures and Conftest

Standalone Test Scripts

Test Execution Environment

Environment Variables

Working Directory

Summary

On this page

Test Organization

Purpose and Scope

Test Directory Structure

Top-Level Test Organization

Detailed Directory Breakdown

Test Categories and Pytest Markers

Core Pytest Markers

Marker Usage Patterns

Core Model Tests

Slow Tests

CPU Tests

Hybrid Model Tests

Test Configuration and Selection

Source File Dependencies

Example Dependency Configuration

Test Execution Patterns

Parallelism and Sharding

Sharding Configuration Example

Test Filtering and Ignoring

By Marker

By Test Name Pattern

By Ignored Files/Directories

Test Categories by Functional Area

Entrypoint Tests

Entrypoint Test Organization Pattern

Distributed Tests

Distributed Test Execution Example

Kernel Tests

Kernel Test Timing and Sharding

Model Tests

Model Test Organization

Test Configuration Files

Pytest Configuration