This page covers the shared utilities, test mixin classes, docstring validation tools, and pytest conventions used throughout the transformers test suite. It focuses on the patterns developers need to understand to write or extend tests for models, processors, and training components.
For information about CI orchestration, smart test selection, and how individual test jobs are triggered, see CI/CD Pipeline and Test Orchestration. For Docker environments used by CI, see Docker Images and CI Environments.
Tests live under the top-level tests/ directory. The layout mirrors the source structure:
tests/
├── test_modeling_common.py # ModelTesterMixin (shared model tests)
├── generation/
│ └── test_utils.py # GenerationTesterMixin
├── models/
│ └── <model_name>/
│ └── test_modeling_<name>.py
├── trainer/
│ └── test_trainer.py
├── peft_integration/
│ └── test_peft_integration.py
├── utils/
│ └── test_modeling_utils.py
└── fixtures/ # Sample data (text, audio, images)
The testing install extras are declared in setup.py211-243:
| Extra | Key Packages |
|---|---|
testing | pytest, pytest-xdist, pytest-rerunfailures, pytest-random-order, pytest-timeout, parameterized, evaluate, rouge-score |
deepspeed-testing | testing + deepspeed + optuna |
Sources: setup.py211-243 tests/test_modeling_common.py1-130
testing_utils.py Modulesrc/transformers/testing_utils.py is the central module for all shared test utilities. It provides:
Several boolean environment variables gate entire categories of tests. They are parsed at module import time via parse_flag_from_env.
| Environment Variable | Default | Controls |
|---|---|---|
RUN_SLOW | False | Tests marked @slow |
RUN_FLAKY | True | Tests marked @is_flaky (set to False on PRs in CI) |
RUN_PIPELINE_TESTS | True | Tests marked @is_pipeline_test (set to False in CI) |
RUN_TRAINING_TESTS | True | Tests marked @is_training_test |
RUN_AGENT_TESTS | False | Tests marked @is_agent_test |
HUGGINGFACE_CO_STAGING | False | Tests marked @is_staging_test |
RUN_CUSTOM_TOKENIZERS | False | Custom tokenizer tests |
TRANSFORMERS_IS_CI | — | Set to True by CI; affects safetensors conversion path |
Sources: src/transformers/testing_utils.py268-274 .circleci/create_circleci_config.py25-35
Diagram: Skip Decorator Decision Flow
Test Classification Decorators
| Decorator | Controlling Flag | Behavior |
|---|---|---|
slow | RUN_SLOW | Skip unless env var is truthy |
tooslow | — | Always skipped (must not stay in codebase) |
is_staging_test | HUGGINGFACE_CO_STAGING | Uses staging Hub endpoint |
is_pipeline_test | RUN_PIPELINE_TESTS | Pipeline-specific tests |
is_training_test | RUN_TRAINING_TESTS | Trainer-specific tests |
is_agent_test | RUN_AGENT_TESTS | Agent tests |
is_flaky | RUN_FLAKY | Skipped on PRs, runs on main |
Hardware Requirement Decorators
| Decorator | Underlying Check |
|---|---|
require_torch | is_torch_available() |
require_torch_gpu | is_torch_cuda_available() |
require_torch_mps | is_torch_mps_available() |
require_torch_xpu | is_torch_xpu_available() |
require_torch_npu | is_torch_npu_available() |
require_torch_multi_gpu | GPU count ≥ 2 |
require_torch_multi_accelerator | Accelerator count ≥ 2 |
require_torch_bf16 | BF16 support on device |
require_torch_fp16 | FP16 support on device |
require_torch_tf32 | is_torch_tf32_available() |
require_non_hpu | Not HPU device |
Library Requirement Decorators
| Decorator | Library Checked |
|---|---|
require_accelerate | accelerate |
require_bitsandbytes | bitsandbytes |
require_flash_attn | Flash Attention 2 |
require_flash_attn_3 | Flash Attention 3 |
require_deepspeed | deepspeed |
require_kernels | kernels |
require_peft | peft |
require_sentencepiece | sentencepiece |
require_tokenizers | tokenizers (Rust fast tokenizers) |
require_vision | vision libraries |
require_tensorboard | tensorboard |
require_wandb | wandb |
require_liger_kernel | liger_kernel |
require_galore_torch | galore_torch |
require_lomo | lomo |
require_schedulefree | schedulefree |
require_optuna | optuna |
require_ray | ray[tune] |
Execution Control Decorators
| Decorator | Effect |
|---|---|
run_first | Executes test before others via pytest-order |
run_test_using_subprocess | Forces test to run in a subprocess |
skip_if_not_implemented | Skips if NotImplementedError is raised |
hub_retry | Retries decorated test on Hub connection errors |
Sources: src/transformers/testing_utils.py277-400 tests/test_modeling_common.py81-106 tests/trainer/test_trainer.py56-106
TestCasePlus — A unittest.TestCase subclass that provides:
setUp() and tearDown() with automatic temp directory managementget_auto_remove_tmp_dir() for test-scoped directoriesCaptureLogger — Context manager that captures log output from a logging.Logger instance for inspection in assertions.
LoggingLevel — Context manager that temporarily sets the transformers logging verbosity level.
TemporaryHubRepo — Context manager that creates a temporary Hugging Face Hub repository and deletes it after the test. Used for staging tests that push/pull from the Hub.
torch_device — A module-level string constant ("cpu", "cuda", "xpu", etc.) that reflects the default accelerator available. Used throughout tests to avoid hardcoding device strings.
Backend memory utilities: backend_empty_cache, backend_memory_allocated, backend_max_memory_allocated, backend_reset_max_memory_allocated, backend_device_count — device-agnostic wrappers for CUDA/XPU memory APIs.
execute_subprocess_async — Runs an external command asynchronously; used in distributed training tests.
get_tests_dir — Returns the absolute path to the tests/ directory; used to locate fixtures.
Sources: src/transformers/testing_utils.py197-237 tests/trainer/test_trainer.py56-167
The library uses shared mixin classes to run a standard battery of tests against every model. Each model's test file combines a model-specific Tester configuration class with one or more of these mixins.
Diagram: Model Test Class Composition
Sources: tests/test_modeling_common.py1-130 tests/test_modeling_common.py117
ModelTesterMixinDefined in tests/test_modeling_common.py ModelTesterMixin provides approximately 50 shared test methods that run against every class listed in all_model_classes. Each test method typically:
self.model_tester.prepare_config_and_inputs_for_common() to get a config and sample inputs.all_model_classes.Key test categories in ModelTesterMixin:
| Test Group | Representative Methods |
|---|---|
| Serialization | test_save_load, test_save_load_fast_init_from_base |
| Forward correctness | test_forward_signature, test_model_outputs_equivalence |
| Attention backends | test_eager_matches_sdpa_inference, test_eager_matches_flash_attention_2 |
| Gradients | test_retain_grad_hidden_states_attentions, test_training |
| Initialization | test_initialization, test_model_common_attributes |
| Weight manipulation | test_tied_model_weights_saving_loading |
| Tensor parallelism | test_tensor_parallel_* |
_test_eager_matches_sdpa_inference tests/test_modeling_common.py154-512 is a function (not a method) that is called by the parameterized method. It tests that eager and SDPA attention backends produce numerically close outputs, using per-device tolerance tables.
The parameterization is defined in TEST_EAGER_MATCHES_SDPA_INFERENCE_PARAMETERIZATION tests/test_modeling_common.py138-151 which covers combinations of dtype (fp16, bf16, fp32), padding side, attention mask presence, and SDPA kernel flags.
Key attributes a test class must define:
| Attribute | Type | Purpose |
|---|---|---|
all_model_classes | tuple | All model classes to iterate over in shared tests |
all_generative_model_classes | tuple | Models tested by GenerationTesterMixin |
model_tester | Object | Config/input factory, set up in setUp() |
has_attentions | bool | Whether model supports output_attentions |
fx_compatible | bool | Whether model is torch.fx traceable |
test_pruning | bool | Whether to run pruning tests |
Helper functions for flaky test mitigation:
set_config_for_less_flaky_test(config) — adjusts config to reduce numerical noiseset_model_for_less_flaky_test(model) — sets model properties that improve test stabilitySources: tests/test_modeling_common.py138-512
GenerationTesterMixinDefined in tests/generation/test_utils.py and imported into tests/test_modeling_common.py at tests/test_modeling_common.py117 It provides tests for all decoding strategies supported by GenerationMixin:
A model test class should add GenerationTesterMixin to its bases and populate all_generative_model_classes. Generation tests also rely on model_tester.prepare_config_and_inputs_for_common().
ProcessorTesterMixinProvides shared tests for processor classes (subclasses of ProcessorMixin). Tests cover:
save_pretrained / from_pretrained round-tripsA processor test class sets processor_class and implements get_tokenizer(), get_image_processor(), etc.
pyproject.tomlpyproject.toml contains pytest settings applied globally:
The pytest-env plugin (listed in the testing extras) allows injecting environment variables through pyproject.toml markers.
conftest.pyThe root-level conftest.py registers pytest markers corresponding to the decorators in testing_utils.py, ensures the transformers package is importable, and handles shared fixtures like torch_device.
In CI (via .circleci/create_circleci_config.py), pytest is invoked with:
| Flag | Value | Purpose |
|---|---|---|
--max-worker-restart | 0 | Disable worker restart on crash |
-vvv | — | Verbose output |
--random-order-bucket | module | Randomize within module |
--random-order-seed | $CIRCLE_BUILD_NUM | Reproducible seed per build |
--reruns | 5 | Retry flaky test patterns |
--reruns-delay | 2 | Wait 2 seconds between retries |
--junitxml | test-results/junit.xml | JUnit XML for CI dashboards |
Flaky patterns that trigger reruns include OSError, ConnectionError, HTTPError, Timeout, and AssertionError: Tensor-likes are not close! .circleci/create_circleci_config.py40-55
Sources: .circleci/create_circleci_config.py35-165 setup.py211-243
auto_docstringsrc/transformers/utils/auto_docstring.py provides the @auto_docstring decorator and supporting types. When applied to a model's forward() method or a class, it automatically constructs the docstring argument section by reading parameter annotations and cross-referencing argument docs from upstream classes.
Key exports from src/transformers/utils/auto_docstring.py:
| Symbol | Purpose |
|---|---|
auto_docstring | Decorator that generates/validates a docstring |
auto_class_docstring | Variant for class-level docstrings |
ModelArgs | Container type for standard model forward arguments |
ModelOutputArgs | Container type for model output arguments |
ImageProcessorArgs | Container type for image processor arguments |
ProcessorArgs | Container type for processor arguments |
ClassAttrs | Container type for class-level attribute documentation |
ClassDocstring | Full class docstring structure |
parse_docstring | Parses an existing docstring into structured form |
get_args_doc_from_source | Extracts argument docs from source code |
set_min_indent | Normalizes indentation |
Sources: src/transformers/utils/__init__.py22-33 utils/check_docstrings.py53-61
check_docstrings.pyutils/check_docstrings.py is a standalone script that validates docstring consistency across the codebase.
Usage:
What it checks: For every public function or class decorated with @auto_docstring, it verifies that the docstring's Args: section lists all non-self/*args/**kwargs parameters from the function signature.
The DecoratedItem dataclass utils/check_docstrings.py67-82 tracks each decorated item:
function or class)custom_args_text override passed to the decoratorSources: utils/check_docstrings.py1-65 utils/check_docstrings.py67-82
HfArgumentParserHfArgumentParser (in src/transformers/hf_argparser.py) is a subclass of argparse.ArgumentParser that accepts Python @dataclass classes instead of manually defined arguments. It introspects the dataclass fields and their type annotations to build the corresponding argparse arguments.
Its primary use case is making TrainingArguments and similar config dataclasses usable from the command line:
It is also used in training scripts (examples/) to parse multiple dataclass types at once:
TrainingArguments itself documents this relationship explicitly src/transformers/training_args.py182-183:
HfArgumentParsercan turn this class into argparse arguments that can be specified on the command line.
Sources: src/transformers/training_args.py177-190
The standard pattern for a new model's test file follows a two-class layout:
Diagram: New Model Test File Structure
Step-by-step checklist:
Create MyModelTester — Stores small config values (hidden size, layers, etc.) and implements:
prepare_config_and_inputs() → returns (config, inputs) specific to the architectureprepare_config_and_inputs_for_common() → returns (config, inputs_dict) used by ModelTesterMixincreate_and_check_model(config, ...) → runs a single forward pass and checks output shapesCreate MyModelTest with the base classes:
ModelTesterMixin (always)GenerationTesterMixin (if the model generates text)unittest.TestCaseSet class attributes:
Apply decorators at the class or method level:
@require_torch on the class@slow on tests requiring full model weights@require_torch_gpu for GPU-only tests (e.g., Flash Attention tests)@require_flash_attn for tests exercising FA2 pathsRun the test suite for the new model:
Validate docstrings after implementing the model:
Sources: tests/test_modeling_common.py1-130 tests/test_modeling_common.py138-512 src/transformers/testing_utils.py341-400 utils/check_docstrings.py1-35
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.