Testing Infrastructure

Relevant source files

This page covers the shared utilities, test mixin classes, docstring validation tools, and pytest conventions used throughout the transformers test suite. It focuses on the patterns developers need to understand to write or extend tests for models, processors, and training components.

For information about CI orchestration, smart test selection, and how individual test jobs are triggered, see CI/CD Pipeline and Test Orchestration. For Docker environments used by CI, see Docker Images and CI Environments.

Test Suite Organization

Tests live under the top-level tests/ directory. The layout mirrors the source structure:

tests/
├── test_modeling_common.py        # ModelTesterMixin (shared model tests)
├── generation/
│   └── test_utils.py              # GenerationTesterMixin
├── models/
│   └── <model_name>/
│       └── test_modeling_<name>.py
├── trainer/
│   └── test_trainer.py
├── peft_integration/
│   └── test_peft_integration.py
├── utils/
│   └── test_modeling_utils.py
└── fixtures/                      # Sample data (text, audio, images)

The testing install extras are declared in setup.py211-243:

Extra	Key Packages
`testing`	`pytest`, `pytest-xdist`, `pytest-rerunfailures`, `pytest-random-order`, `pytest-timeout`, `parameterized`, `evaluate`, `rouge-score`
`deepspeed-testing`	`testing` + `deepspeed` + `optuna`

Sources: setup.py211-243 tests/test_modeling_common.py1-130

The `testing_utils.py` Module

src/transformers/testing_utils.py is the central module for all shared test utilities. It provides:

Environment flag parsing
Skip and requirement decorators
Hardware/library detection
Helper context managers and base classes

Environment Flag System

Several boolean environment variables gate entire categories of tests. They are parsed at module import time via parse_flag_from_env.

Environment Variable	Default	Controls
`RUN_SLOW`	`False`	Tests marked `@slow`
`RUN_FLAKY`	`True`	Tests marked `@is_flaky` (set to `False` on PRs in CI)
`RUN_PIPELINE_TESTS`	`True`	Tests marked `@is_pipeline_test` (set to `False` in CI)
`RUN_TRAINING_TESTS`	`True`	Tests marked `@is_training_test`
`RUN_AGENT_TESTS`	`False`	Tests marked `@is_agent_test`
`HUGGINGFACE_CO_STAGING`	`False`	Tests marked `@is_staging_test`
`RUN_CUSTOM_TOKENIZERS`	`False`	Custom tokenizer tests
`TRANSFORMERS_IS_CI`	—	Set to `True` by CI; affects safetensors conversion path

Sources: src/transformers/testing_utils.py268-274 .circleci/create_circleci_config.py25-35

Decorator Reference

Diagram: Skip Decorator Decision Flow

Test Classification Decorators

Decorator	Controlling Flag	Behavior
`slow`	`RUN_SLOW`	Skip unless env var is truthy
`tooslow`	—	Always skipped (must not stay in codebase)
`is_staging_test`	`HUGGINGFACE_CO_STAGING`	Uses staging Hub endpoint
`is_pipeline_test`	`RUN_PIPELINE_TESTS`	Pipeline-specific tests
`is_training_test`	`RUN_TRAINING_TESTS`	Trainer-specific tests
`is_agent_test`	`RUN_AGENT_TESTS`	Agent tests
`is_flaky`	`RUN_FLAKY`	Skipped on PRs, runs on `main`

Hardware Requirement Decorators

Decorator	Underlying Check
`require_torch`	`is_torch_available()`
`require_torch_gpu`	`is_torch_cuda_available()`
`require_torch_mps`	`is_torch_mps_available()`
`require_torch_xpu`	`is_torch_xpu_available()`
`require_torch_npu`	`is_torch_npu_available()`
`require_torch_multi_gpu`	GPU count ≥ 2
`require_torch_multi_accelerator`	Accelerator count ≥ 2
`require_torch_bf16`	BF16 support on device
`require_torch_fp16`	FP16 support on device
`require_torch_tf32`	`is_torch_tf32_available()`
`require_non_hpu`	Not HPU device

Library Requirement Decorators

Decorator	Library Checked
`require_accelerate`	`accelerate`
`require_bitsandbytes`	`bitsandbytes`
`require_flash_attn`	Flash Attention 2
`require_flash_attn_3`	Flash Attention 3
`require_deepspeed`	`deepspeed`
`require_kernels`	`kernels`
`require_peft`	`peft`
`require_sentencepiece`	`sentencepiece`
`require_tokenizers`	`tokenizers` (Rust fast tokenizers)
`require_vision`	vision libraries
`require_tensorboard`	`tensorboard`
`require_wandb`	`wandb`
`require_liger_kernel`	`liger_kernel`
`require_galore_torch`	`galore_torch`
`require_lomo`	`lomo`
`require_schedulefree`	`schedulefree`
`require_optuna`	`optuna`
`require_ray`	`ray[tune]`

Execution Control Decorators

Decorator	Effect
`run_first`	Executes test before others via `pytest-order`
`run_test_using_subprocess`	Forces test to run in a subprocess
`skip_if_not_implemented`	Skips if `NotImplementedError` is raised
`hub_retry`	Retries decorated test on Hub connection errors

Sources: src/transformers/testing_utils.py277-400 tests/test_modeling_common.py81-106 tests/trainer/test_trainer.py56-106

Helper Classes and Utilities

TestCasePlus — A unittest.TestCase subclass that provides:

setUp() and tearDown() with automatic temp directory management
get_auto_remove_tmp_dir() for test-scoped directories
Consistent cross-platform test paths

CaptureLogger — Context manager that captures log output from a logging.Logger instance for inspection in assertions.

LoggingLevel — Context manager that temporarily sets the transformers logging verbosity level.

TemporaryHubRepo — Context manager that creates a temporary Hugging Face Hub repository and deletes it after the test. Used for staging tests that push/pull from the Hub.

torch_device — A module-level string constant ("cpu", "cuda", "xpu", etc.) that reflects the default accelerator available. Used throughout tests to avoid hardcoding device strings.

Backend memory utilities: backend_empty_cache, backend_memory_allocated, backend_max_memory_allocated, backend_reset_max_memory_allocated, backend_device_count — device-agnostic wrappers for CUDA/XPU memory APIs.

execute_subprocess_async — Runs an external command asynchronously; used in distributed training tests.

get_tests_dir — Returns the absolute path to the tests/ directory; used to locate fixtures.

Sources: src/transformers/testing_utils.py197-237 tests/trainer/test_trainer.py56-167

Test Mixin Classes

The library uses shared mixin classes to run a standard battery of tests against every model. Each model's test file combines a model-specific Tester configuration class with one or more of these mixins.

Diagram: Model Test Class Composition

Sources: tests/test_modeling_common.py1-130 tests/test_modeling_common.py117

`ModelTesterMixin`

Defined in tests/test_modeling_common.py ModelTesterMixin provides approximately 50 shared test methods that run against every class listed in all_model_classes. Each test method typically:

Calls self.model_tester.prepare_config_and_inputs_for_common() to get a config and sample inputs.
Instantiates each model class in all_model_classes.
Runs the assertion.

Key test categories in ModelTesterMixin:

Test Group	Representative Methods
Serialization	`test_save_load`, `test_save_load_fast_init_from_base`
Forward correctness	`test_forward_signature`, `test_model_outputs_equivalence`
Attention backends	`test_eager_matches_sdpa_inference`, `test_eager_matches_flash_attention_2`
Gradients	`test_retain_grad_hidden_states_attentions`, `test_training`
Initialization	`test_initialization`, `test_model_common_attributes`
Weight manipulation	`test_tied_model_weights_saving_loading`
Tensor parallelism	`test_tensor_parallel_*`

_test_eager_matches_sdpa_inference tests/test_modeling_common.py154-512 is a function (not a method) that is called by the parameterized method. It tests that eager and SDPA attention backends produce numerically close outputs, using per-device tolerance tables.

The parameterization is defined in TEST_EAGER_MATCHES_SDPA_INFERENCE_PARAMETERIZATION tests/test_modeling_common.py138-151 which covers combinations of dtype (fp16, bf16, fp32), padding side, attention mask presence, and SDPA kernel flags.

Key attributes a test class must define:

Attribute	Type	Purpose
`all_model_classes`	`tuple`	All model classes to iterate over in shared tests
`all_generative_model_classes`	`tuple`	Models tested by `GenerationTesterMixin`
`model_tester`	Object	Config/input factory, set up in `setUp()`
`has_attentions`	`bool`	Whether model supports `output_attentions`
`fx_compatible`	`bool`	Whether model is torch.fx traceable
`test_pruning`	`bool`	Whether to run pruning tests

Helper functions for flaky test mitigation:

set_config_for_less_flaky_test(config) — adjusts config to reduce numerical noise
set_model_for_less_flaky_test(model) — sets model properties that improve test stability

Sources: tests/test_modeling_common.py138-512

`GenerationTesterMixin`

Defined in tests/generation/test_utils.py and imported into tests/test_modeling_common.py at tests/test_modeling_common.py117 It provides tests for all decoding strategies supported by GenerationMixin:

Greedy decoding
Sampling (top-k, top-p, temperature)
Beam search (standard, grouped, constrained)
Contrastive search
Assisted/speculative decoding
Cache-related behaviors (static cache, sliding window cache)

A model test class should add GenerationTesterMixin to its bases and populate all_generative_model_classes. Generation tests also rely on model_tester.prepare_config_and_inputs_for_common().

`ProcessorTesterMixin`

Provides shared tests for processor classes (subclasses of ProcessorMixin). Tests cover:

save_pretrained / from_pretrained round-trips
JSON serialization
Tokenizer + image processor composition

A processor test class sets processor_class and implements get_tokenizer(), get_image_processor(), etc.

Pytest Configuration

`pyproject.toml`

pyproject.toml contains pytest settings applied globally:

The pytest-env plugin (listed in the testing extras) allows injecting environment variables through pyproject.toml markers.

`conftest.py`

The root-level conftest.py registers pytest markers corresponding to the decorators in testing_utils.py, ensures the transformers package is importable, and handles shared fixtures like torch_device.

CI Pytest Flags

In CI (via .circleci/create_circleci_config.py), pytest is invoked with:

Flag	Value	Purpose
`--max-worker-restart`	`0`	Disable worker restart on crash
`-vvv`	—	Verbose output
`--random-order-bucket`	`module`	Randomize within module
`--random-order-seed`	`$CIRCLE_BUILD_NUM`	Reproducible seed per build
`--reruns`	`5`	Retry flaky test patterns
`--reruns-delay`	`2`	Wait 2 seconds between retries
`--junitxml`	`test-results/junit.xml`	JUnit XML for CI dashboards

Flaky patterns that trigger reruns include OSError, ConnectionError, HTTPError, Timeout, and AssertionError: Tensor-likes are not close! .circleci/create_circleci_config.py40-55

Sources: .circleci/create_circleci_config.py35-165 setup.py211-243

Docstring Utilities

`auto_docstring`

src/transformers/utils/auto_docstring.py provides the @auto_docstring decorator and supporting types. When applied to a model's forward() method or a class, it automatically constructs the docstring argument section by reading parameter annotations and cross-referencing argument docs from upstream classes.

Key exports from src/transformers/utils/auto_docstring.py:

Symbol	Purpose
`auto_docstring`	Decorator that generates/validates a docstring
`auto_class_docstring`	Variant for class-level docstrings
`ModelArgs`	Container type for standard model forward arguments
`ModelOutputArgs`	Container type for model output arguments
`ImageProcessorArgs`	Container type for image processor arguments
`ProcessorArgs`	Container type for processor arguments
`ClassAttrs`	Container type for class-level attribute documentation
`ClassDocstring`	Full class docstring structure
`parse_docstring`	Parses an existing docstring into structured form
`get_args_doc_from_source`	Extracts argument docs from source code
`set_min_indent`	Normalizes indentation

Sources: src/transformers/utils/__init__.py22-33 utils/check_docstrings.py53-61

`check_docstrings.py`

utils/check_docstrings.py is a standalone script that validates docstring consistency across the codebase.

Usage:

What it checks: For every public function or class decorated with @auto_docstring, it verifies that the docstring's Args: section lists all non-self/*args/**kwargs parameters from the function signature.

The DecoratedItem dataclass utils/check_docstrings.py67-82 tracks each decorated item:

Line numbers of the decorator, definition, and body
Kind (function or class)
Argument names extracted from the signature
Any custom_args_text override passed to the decorator

Sources: utils/check_docstrings.py1-65 utils/check_docstrings.py67-82

`HfArgumentParser`

HfArgumentParser (in src/transformers/hf_argparser.py) is a subclass of argparse.ArgumentParser that accepts Python @dataclass classes instead of manually defined arguments. It introspects the dataclass fields and their type annotations to build the corresponding argparse arguments.

Its primary use case is making TrainingArguments and similar config dataclasses usable from the command line:

It is also used in training scripts (examples/) to parse multiple dataclass types at once:

TrainingArguments itself documents this relationship explicitly src/transformers/training_args.py182-183:

HfArgumentParser can turn this class into argparse arguments that can be specified on the command line.

Sources: src/transformers/training_args.py177-190

Writing Tests for a New Model

The standard pattern for a new model's test file follows a two-class layout:

Diagram: New Model Test File Structure

Step-by-step checklist:

Create MyModelTester — Stores small config values (hidden size, layers, etc.) and implements:
- prepare_config_and_inputs() → returns (config, inputs) specific to the architecture
- prepare_config_and_inputs_for_common() → returns (config, inputs_dict) used by ModelTesterMixin
- create_and_check_model(config, ...) → runs a single forward pass and checks output shapes
Create MyModelTest with the base classes:
- ModelTesterMixin (always)
- GenerationTesterMixin (if the model generates text)
- unittest.TestCase
Set class attributes:
Apply decorators at the class or method level:
- @require_torch on the class
- @slow on tests requiring full model weights
- @require_torch_gpu for GPU-only tests (e.g., Flash Attention tests)
- @require_flash_attn for tests exercising FA2 paths
Run the test suite for the new model:
Validate docstrings after implementing the model:

Sources: tests/test_modeling_common.py1-130 tests/test_modeling_common.py138-512 src/transformers/testing_utils.py341-400 utils/check_docstrings.py1-35

Testing Infrastructure

Relevant source files

Test Suite Organization

Tests live under the top-level tests/ directory. The layout mirrors the source structure:

tests/
├── test_modeling_common.py        # ModelTesterMixin (shared model tests)
├── generation/
│   └── test_utils.py              # GenerationTesterMixin
├── models/
│   └── <model_name>/
│       └── test_modeling_<name>.py
├── trainer/
│   └── test_trainer.py
├── peft_integration/
│   └── test_peft_integration.py
├── utils/
│   └── test_modeling_utils.py
└── fixtures/                      # Sample data (text, audio, images)

The testing install extras are declared in setup.py211-243:

Extra	Key Packages
`testing`	`pytest`, `pytest-xdist`, `pytest-rerunfailures`, `pytest-random-order`, `pytest-timeout`, `parameterized`, `evaluate`, `rouge-score`
`deepspeed-testing`	`testing` + `deepspeed` + `optuna`

Sources: setup.py211-243 tests/test_modeling_common.py1-130

The `testing_utils.py` Module

src/transformers/testing_utils.py is the central module for all shared test utilities. It provides:

Environment flag parsing
Skip and requirement decorators
Hardware/library detection
Helper context managers and base classes

Environment Flag System

Several boolean environment variables gate entire categories of tests. They are parsed at module import time via parse_flag_from_env.

Environment Variable	Default	Controls
`RUN_SLOW`	`False`	Tests marked `@slow`
`RUN_FLAKY`	`True`	Tests marked `@is_flaky` (set to `False` on PRs in CI)
`RUN_PIPELINE_TESTS`	`True`	Tests marked `@is_pipeline_test` (set to `False` in CI)
`RUN_TRAINING_TESTS`	`True`	Tests marked `@is_training_test`
`RUN_AGENT_TESTS`	`False`	Tests marked `@is_agent_test`
`HUGGINGFACE_CO_STAGING`	`False`	Tests marked `@is_staging_test`
`RUN_CUSTOM_TOKENIZERS`	`False`	Custom tokenizer tests
`TRANSFORMERS_IS_CI`	—	Set to `True` by CI; affects safetensors conversion path

Sources: src/transformers/testing_utils.py268-274 .circleci/create_circleci_config.py25-35

Decorator Reference

Diagram: Skip Decorator Decision Flow

Test Classification Decorators

Decorator	Controlling Flag	Behavior
`slow`	`RUN_SLOW`	Skip unless env var is truthy
`tooslow`	—	Always skipped (must not stay in codebase)
`is_staging_test`	`HUGGINGFACE_CO_STAGING`	Uses staging Hub endpoint
`is_pipeline_test`	`RUN_PIPELINE_TESTS`	Pipeline-specific tests
`is_training_test`	`RUN_TRAINING_TESTS`	Trainer-specific tests
`is_agent_test`	`RUN_AGENT_TESTS`	Agent tests
`is_flaky`	`RUN_FLAKY`	Skipped on PRs, runs on `main`

Hardware Requirement Decorators

Decorator	Underlying Check
`require_torch`	`is_torch_available()`
`require_torch_gpu`	`is_torch_cuda_available()`
`require_torch_mps`	`is_torch_mps_available()`
`require_torch_xpu`	`is_torch_xpu_available()`
`require_torch_npu`	`is_torch_npu_available()`
`require_torch_multi_gpu`	GPU count ≥ 2
`require_torch_multi_accelerator`	Accelerator count ≥ 2
`require_torch_bf16`	BF16 support on device
`require_torch_fp16`	FP16 support on device
`require_torch_tf32`	`is_torch_tf32_available()`
`require_non_hpu`	Not HPU device

Library Requirement Decorators

Decorator	Library Checked
`require_accelerate`	`accelerate`
`require_bitsandbytes`	`bitsandbytes`
`require_flash_attn`	Flash Attention 2
`require_flash_attn_3`	Flash Attention 3
`require_deepspeed`	`deepspeed`
`require_kernels`	`kernels`
`require_peft`	`peft`
`require_sentencepiece`	`sentencepiece`
`require_tokenizers`	`tokenizers` (Rust fast tokenizers)
`require_vision`	vision libraries
`require_tensorboard`	`tensorboard`
`require_wandb`	`wandb`
`require_liger_kernel`	`liger_kernel`
`require_galore_torch`	`galore_torch`
`require_lomo`	`lomo`
`require_schedulefree`	`schedulefree`
`require_optuna`	`optuna`
`require_ray`	`ray[tune]`

Execution Control Decorators

Decorator	Effect
`run_first`	Executes test before others via `pytest-order`
`run_test_using_subprocess`	Forces test to run in a subprocess
`skip_if_not_implemented`	Skips if `NotImplementedError` is raised
`hub_retry`	Retries decorated test on Hub connection errors

Sources: src/transformers/testing_utils.py277-400 tests/test_modeling_common.py81-106 tests/trainer/test_trainer.py56-106

Helper Classes and Utilities

TestCasePlus — A unittest.TestCase subclass that provides:

setUp() and tearDown() with automatic temp directory management
get_auto_remove_tmp_dir() for test-scoped directories
Consistent cross-platform test paths

CaptureLogger — Context manager that captures log output from a logging.Logger instance for inspection in assertions.

LoggingLevel — Context manager that temporarily sets the transformers logging verbosity level.

TemporaryHubRepo — Context manager that creates a temporary Hugging Face Hub repository and deletes it after the test. Used for staging tests that push/pull from the Hub.

torch_device — A module-level string constant ("cpu", "cuda", "xpu", etc.) that reflects the default accelerator available. Used throughout tests to avoid hardcoding device strings.

execute_subprocess_async — Runs an external command asynchronously; used in distributed training tests.

get_tests_dir — Returns the absolute path to the tests/ directory; used to locate fixtures.

Sources: src/transformers/testing_utils.py197-237 tests/trainer/test_trainer.py56-167

Test Mixin Classes

Diagram: Model Test Class Composition

Sources: tests/test_modeling_common.py1-130 tests/test_modeling_common.py117

`ModelTesterMixin`

Defined in tests/test_modeling_common.py ModelTesterMixin provides approximately 50 shared test methods that run against every class listed in all_model_classes. Each test method typically:

Calls self.model_tester.prepare_config_and_inputs_for_common() to get a config and sample inputs.
Instantiates each model class in all_model_classes.
Runs the assertion.

Key test categories in ModelTesterMixin:

Test Group	Representative Methods
Serialization	`test_save_load`, `test_save_load_fast_init_from_base`
Forward correctness	`test_forward_signature`, `test_model_outputs_equivalence`
Attention backends	`test_eager_matches_sdpa_inference`, `test_eager_matches_flash_attention_2`
Gradients	`test_retain_grad_hidden_states_attentions`, `test_training`
Initialization	`test_initialization`, `test_model_common_attributes`
Weight manipulation	`test_tied_model_weights_saving_loading`
Tensor parallelism	`test_tensor_parallel_*`

Key attributes a test class must define:

Attribute	Type	Purpose
`all_model_classes`	`tuple`	All model classes to iterate over in shared tests
`all_generative_model_classes`	`tuple`	Models tested by `GenerationTesterMixin`
`model_tester`	Object	Config/input factory, set up in `setUp()`
`has_attentions`	`bool`	Whether model supports `output_attentions`
`fx_compatible`	`bool`	Whether model is torch.fx traceable
`test_pruning`	`bool`	Whether to run pruning tests

Helper functions for flaky test mitigation:

set_config_for_less_flaky_test(config) — adjusts config to reduce numerical noise
set_model_for_less_flaky_test(model) — sets model properties that improve test stability

Sources: tests/test_modeling_common.py138-512

`GenerationTesterMixin`

Greedy decoding
Sampling (top-k, top-p, temperature)
Beam search (standard, grouped, constrained)
Contrastive search
Assisted/speculative decoding
Cache-related behaviors (static cache, sliding window cache)

A model test class should add GenerationTesterMixin to its bases and populate all_generative_model_classes. Generation tests also rely on model_tester.prepare_config_and_inputs_for_common().

`ProcessorTesterMixin`

Provides shared tests for processor classes (subclasses of ProcessorMixin). Tests cover:

save_pretrained / from_pretrained round-trips
JSON serialization
Tokenizer + image processor composition

A processor test class sets processor_class and implements get_tokenizer(), get_image_processor(), etc.

Pytest Configuration

`pyproject.toml`

pyproject.toml contains pytest settings applied globally:

The pytest-env plugin (listed in the testing extras) allows injecting environment variables through pyproject.toml markers.

`conftest.py`

CI Pytest Flags

In CI (via .circleci/create_circleci_config.py), pytest is invoked with:

Flag	Value	Purpose
`--max-worker-restart`	`0`	Disable worker restart on crash
`-vvv`	—	Verbose output
`--random-order-bucket`	`module`	Randomize within module
`--random-order-seed`	`$CIRCLE_BUILD_NUM`	Reproducible seed per build
`--reruns`	`5`	Retry flaky test patterns
`--reruns-delay`	`2`	Wait 2 seconds between retries
`--junitxml`	`test-results/junit.xml`	JUnit XML for CI dashboards

Flaky patterns that trigger reruns include OSError, ConnectionError, HTTPError, Timeout, and AssertionError: Tensor-likes are not close! .circleci/create_circleci_config.py40-55

Sources: .circleci/create_circleci_config.py35-165 setup.py211-243

Docstring Utilities

`auto_docstring`

Key exports from src/transformers/utils/auto_docstring.py:

Symbol	Purpose
`auto_docstring`	Decorator that generates/validates a docstring
`auto_class_docstring`	Variant for class-level docstrings
`ModelArgs`	Container type for standard model forward arguments
`ModelOutputArgs`	Container type for model output arguments
`ImageProcessorArgs`	Container type for image processor arguments
`ProcessorArgs`	Container type for processor arguments
`ClassAttrs`	Container type for class-level attribute documentation
`ClassDocstring`	Full class docstring structure
`parse_docstring`	Parses an existing docstring into structured form
`get_args_doc_from_source`	Extracts argument docs from source code
`set_min_indent`	Normalizes indentation

Sources: src/transformers/utils/__init__.py22-33 utils/check_docstrings.py53-61

`check_docstrings.py`

utils/check_docstrings.py is a standalone script that validates docstring consistency across the codebase.

Usage:

The DecoratedItem dataclass utils/check_docstrings.py67-82 tracks each decorated item:

Line numbers of the decorator, definition, and body
Kind (function or class)
Argument names extracted from the signature
Any custom_args_text override passed to the decorator

Sources: utils/check_docstrings.py1-65 utils/check_docstrings.py67-82

`HfArgumentParser`

Its primary use case is making TrainingArguments and similar config dataclasses usable from the command line:

It is also used in training scripts (examples/) to parse multiple dataclass types at once:

TrainingArguments itself documents this relationship explicitly src/transformers/training_args.py182-183:

HfArgumentParser can turn this class into argparse arguments that can be specified on the command line.

Sources: src/transformers/training_args.py177-190

Writing Tests for a New Model

The standard pattern for a new model's test file follows a two-class layout:

Diagram: New Model Test File Structure

Step-by-step checklist:

Create MyModelTester — Stores small config values (hidden size, layers, etc.) and implements:
- prepare_config_and_inputs() → returns (config, inputs) specific to the architecture
- prepare_config_and_inputs_for_common() → returns (config, inputs_dict) used by ModelTesterMixin
- create_and_check_model(config, ...) → runs a single forward pass and checks output shapes
Create MyModelTest with the base classes:
- ModelTesterMixin (always)
- GenerationTesterMixin (if the model generates text)
- unittest.TestCase
Set class attributes:
Apply decorators at the class or method level:
- @require_torch on the class
- @slow on tests requiring full model weights
- @require_torch_gpu for GPU-only tests (e.g., Flash Attention tests)
- @require_flash_attn for tests exercising FA2 paths
Run the test suite for the new model:
Validate docstrings after implementing the model:

Sources: tests/test_modeling_common.py1-130 tests/test_modeling_common.py138-512 src/transformers/testing_utils.py341-400 utils/check_docstrings.py1-35

Testing Infrastructure

Test Suite Organization

The `testing_utils.py` Module

Environment Flag System

Decorator Reference

Helper Classes and Utilities

Test Mixin Classes

`ModelTesterMixin`

`GenerationTesterMixin`

`ProcessorTesterMixin`

Pytest Configuration

`pyproject.toml`

`conftest.py`

CI Pytest Flags

Docstring Utilities

`auto_docstring`

`check_docstrings.py`

`HfArgumentParser`

Writing Tests for a New Model

On this page

Testing Infrastructure

Test Suite Organization

The `testing_utils.py` Module

Environment Flag System

Decorator Reference

Helper Classes and Utilities

Test Mixin Classes

`ModelTesterMixin`

`GenerationTesterMixin`

`ProcessorTesterMixin`

Pytest Configuration

`pyproject.toml`

`conftest.py`

CI Pytest Flags

Docstring Utilities

`auto_docstring`

`check_docstrings.py`

`HfArgumentParser`

Writing Tests for a New Model

On this page

Testing Infrastructure

Test Suite Organization

The testing_utils.py Module

Environment Flag System

Decorator Reference

Helper Classes and Utilities

Test Mixin Classes

ModelTesterMixin

GenerationTesterMixin

ProcessorTesterMixin

Pytest Configuration

pyproject.toml

conftest.py

CI Pytest Flags

Docstring Utilities

auto_docstring

check_docstrings.py

HfArgumentParser

Writing Tests for a New Model

On this page

Testing Infrastructure

Test Suite Organization

The testing_utils.py Module

Environment Flag System

Decorator Reference

Helper Classes and Utilities

Test Mixin Classes

ModelTesterMixin

GenerationTesterMixin

ProcessorTesterMixin

Pytest Configuration

pyproject.toml

conftest.py

CI Pytest Flags

Docstring Utilities

auto_docstring

check_docstrings.py

HfArgumentParser

Writing Tests for a New Model

On this page

The `testing_utils.py` Module

`ModelTesterMixin`

`GenerationTesterMixin`

`ProcessorTesterMixin`

`pyproject.toml`

`conftest.py`

`auto_docstring`

`check_docstrings.py`

`HfArgumentParser`

The `testing_utils.py` Module

`ModelTesterMixin`

`GenerationTesterMixin`

`ProcessorTesterMixin`

`pyproject.toml`

`conftest.py`

`auto_docstring`

`check_docstrings.py`

`HfArgumentParser`