This document explains the organization of tests in the vLLM codebase, including the directory structure, pytest configuration, test categories and markers, and test filtering mechanisms. It covers how tests are structured, categorized, and selected for execution based on various criteria.
For information about the CI/CD pipeline configuration and test execution infrastructure, see Buildkite CI Pipelines. For AMD-specific testing details, see AMD-Specific Testing. For model correctness testing methodology, see Model Correctness Testing.
vLLM organizes tests in a hierarchical directory structure under the tests/ directory. Each directory corresponds to a major functional area of the codebase.
Sources: .buildkite/test-amd.yaml56-1004
| Directory | Purpose | Key Tests |
|---|---|---|
basic_correctness/ | Fundamental inference correctness | test_basic_correctness.py, test_cpu_offload.py, test_cumem.py |
entrypoints/ | API and interface tests | llm/, openai/, rpc/, pooling/, offline_mode/ |
distributed/ | Parallel execution tests | test_utils.py, test_pynccl.py, test_eplb_*.py |
kernels/ | Low-level kernel operations | core/, attention/, quantization/, moe/, mamba/, helion/ |
models/ | Model loading and execution | test_initialization.py, language/, multimodal/ |
engine/ | Engine core functionality | Engine scheduling, output processing tests |
compile/ | PyTorch compilation | fullgraph/, unit tests for torch.compile |
lora/ | LoRA adapter functionality | Multi-LoRA, LoRA correctness tests |
quantization/ | Quantization methods | FP8, INT4, MXFP4 tests |
v1/ | V1 engine architecture | e2e/, engine/, entrypoints/, core/, attention/ |
Sources: .buildkite/test-amd.yaml51-1004
vLLM uses pytest markers to categorize tests and enable selective execution. Markers are defined in test files and used in CI configuration to filter test execution.
Sources: .buildkite/test-amd.yaml901-943
Tests marked with core_model are essential model tests that run in every CI build. These tests verify basic model functionality.
Sources: .buildkite/test-amd.yaml889-901
Tests marked as slow_test are resource-intensive tests that are sharded across multiple parallel jobs to reduce total execution time.
Sources: .buildkite/test-amd.yaml903-922
Tests marked with cpu_test run without GPU access and can execute on CPU-only agents.
Sources: .buildkite/test-amd.yaml477-491
Tests for hybrid architecture models (e.g., models with Mamba layers) require special dependencies and are marked accordingly.
Sources: .buildkite/test-amd.yaml924-943
Tests are configured with source_file_dependencies to determine when they should run based on changed files. This optimizes CI execution by skipping irrelevant tests.
From .buildkite/test-amd.yaml, tests specify which source files should trigger their execution:
Basic Correctness Test:
Kernel Tests:
Model Tests:
Sources: .buildkite/test-amd.yaml114-850
vLLM tests use pytest's parallelism features to distribute tests across multiple jobs, reducing total execution time.
LoRA Tests with 4-way Parallelism:
Attention Kernel Tests with 2-way Parallelism:
Sources: .buildkite/test-amd.yaml554-663
Tests can be filtered using pytest's selection mechanisms:
Sources: .buildkite/test-amd.yaml136-858
Tests for user-facing interfaces are organized under tests/entrypoints/:
LLM Tests (30 minutes):
OpenAI API Tests (100+ minutes, split into multiple steps):
Sources: .buildkite/test-amd.yaml139-222
Multi-GPU and multi-node tests are organized under tests/distributed/:
4-GPU Tests:
Sources: .buildkite/test-amd.yaml223-284
Low-level kernel tests are organized under tests/kernels/:
| Test Category | Duration | Sharding | GPU Requirements |
|---|---|---|---|
| Core kernels | 48 min | None | 1 GPU |
| Attention kernels | 23 min | 2-way | 1 GPU |
| Quantization kernels | 64 min | 2-way | 1 GPU |
| MoE kernels | 40 min | 2-way | 1 GPU |
| Mamba kernels | 31 min | None | 1 GPU |
Sources: .buildkite/test-amd.yaml638-706
Model-specific tests are organized under tests/models/:
Standard Language Models (25 minutes):
Extra Standard Models (45 minutes, sharded):
Hybrid Models (75 minutes, sharded):
Extended Tests (optional, not in fast check):
Sources: .buildkite/test-amd.yaml828-995
vLLM uses pytest configuration to define markers, test collection rules, and execution parameters. The configuration is typically defined in pytest.ini or pyproject.toml.
Tests are typically invoked with consistent options:
-v: Verbose output showing individual test names-s: Show print statements and stdout during test executionPytest discovers tests based on naming conventions:
test_*.py or *_test.pytest_*Test*Sources: .buildkite/test-amd.yaml62-798
AMD tests apply additional filtering to exclude tests that are not compatible with ROCm. This is implemented in .buildkite/scripts/hardware_ci/run-amd-test.sh.
Kernel Core Tests:
Kernel Attention Tests:
Entrypoints/OpenAI Tests:
Sources: .buildkite/scripts/hardware_ci/run-amd-test.sh104-170
Tests requiring multiple GPUs use the @multi_gpu_test decorator to specify GPU requirements.
Sources: tests/lora/test_llm_with_multi_loras.py55-67
Tests use pytest fixtures defined in conftest.py files for shared setup and teardown logic. Common fixtures include:
qwen3_meowing_lora_files, qwen3_woofing_lora_files)Sources: tests/lora/test_llm_with_multi_loras.py192-241
Some tests are implemented as standalone scripts in tests/standalone_tests/:
lazy_imports.py - Verifies lazy import behaviorpython_only_compile.sh - Tests Python-only installationpytorch_nightly_dependency.sh - Checks PyTorch nightly compatibilityThese scripts are executed directly rather than through pytest:
Sources: .buildkite/test-amd.yaml38-105
Tests commonly use environment variables to control behavior:
| Variable | Purpose |
|---|---|
VLLM_WORKER_MULTIPROC_METHOD | Set to spawn for clean worker processes |
VLLM_ROCM_CUSTOM_PAGED_ATTN | Control custom paged attention (ROCm) |
TORCH_NCCL_BLOCKING_WAIT | Workaround for HIP bug on ROCm |
VLLM_TEST_FORCE_LOAD_FORMAT | Force specific weight loading format |
PYTHONPATH | Set to include vLLM workspace |
Example Usage:
Sources: .buildkite/test-amd.yaml120-246
Most tests execute from /vllm-workspace/tests directory:
Example tests execute from /vllm-workspace/examples:
Sources: .buildkite/test-amd.yaml130-508
The vLLM test organization provides a structured approach to testing:
core_model, slow_test, hybrid_model, cpu_test)This organization allows efficient test execution in CI while maintaining comprehensive coverage of vLLM's functionality.
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.