Generation Configuration and Modes

Relevant source files

This page documents the configuration system that controls text generation behavior in the Transformers library. It covers the GenerationConfig class, GenerationMode enum, and the logic that determines which generation strategy to use based on configuration parameters.

For information about the actual generation algorithms and strategies, see Logits Processing Pipeline. For details on assisted generation and speculative decoding, see Assisted and Speculative Decoding. For cache implementations used during generation, see Cache System.

Overview

The generation configuration system provides a unified interface for controlling all aspects of text generation through the GenerationConfig class. This configuration determines which generation mode (greedy search, sampling, beam search, or assisted generation) will be used, controls output length, manipulates token probabilities, and manages caching behavior.

Key Components:

GenerationConfig: Configuration dataclass containing all generation parameters
GenerationMode: Enum defining available generation strategies
Mode Selection Logic: Algorithm that determines which generation method to invoke based on configuration parameters

Sources: src/transformers/generation/configuration_utils.py81-638 src/transformers/generation/configuration_utils.py63-79

GenerationConfig Class

The GenerationConfig class is defined in src/transformers/generation/configuration_utils.py81-638 and serves as the central configuration object for all generation operations. It is a PushToHubMixin subclass that can be serialized to JSON and saved/loaded from the Hub.

Configuration Structure

Sources: src/transformers/generation/configuration_utils.py107-337

Parameter Categories

The configuration parameters are organized into logical categories:

Category	Key Parameters	Purpose
Length Control	`max_length`, `max_new_tokens`, `min_length`, `min_new_tokens`, `early_stopping`, `max_time`, `stop_strings`	Control the length of generated sequences
Strategy Selection	`do_sample`, `num_beams`	Determine which generation mode to use
Cache Management	`use_cache`, `cache_implementation`, `cache_config`	Configure KV cache behavior
Logits Manipulation	`temperature`, `top_k`, `top_p`, `min_p`, `top_h`, `typical_p`, `epsilon_cutoff`, `eta_cutoff`, `repetition_penalty`, `encoder_repetition_penalty`, `length_penalty`, `no_repeat_ngram_size`, `bad_words_ids`, `forced_bos_token_id`, `forced_eos_token_id`	Modify token probability distributions
Output Control	`output_scores`, `output_logits`, `output_attentions`, `output_hidden_states`, `return_dict_in_generate`, `num_return_sequences`	Control what information is returned
Special Tokens	`pad_token_id`, `bos_token_id`, `eos_token_id`	Define special token IDs
Encoder-Decoder	`encoder_no_repeat_ngram_size`, `decoder_start_token_id`	Encoder-decoder specific parameters
Assisted Generation	`num_assistant_tokens`, `num_assistant_tokens_schedule`, `assistant_confidence_threshold`, `prompt_lookup_num_tokens`, `max_matching_ngram_size`, `assistant_early_exit`, `assistant_lookbehind`, `target_lookbehind`	Control speculative/assisted decoding
Performance	`compile_config` (`CompileConfig`), `disable_compile`	Control `torch.compile` behavior for static-cache decoding

Sources: src/transformers/generation/configuration_utils.py341-437

GenerationMode Enum

The GenerationMode enum is defined in src/transformers/generation/configuration_utils.py63-79 and represents the available generation strategies:

CONTRASTIVE_SEARCH, DOLA_GENERATION, GROUP_BEAM_SEARCH, and CONSTRAINED_BEAM_SEARCH are deprecated modes. They are no longer implemented directly in GenerationMixin — their logic has been moved to external Hub repositories and must be loaded via the custom_generate mechanism.

Mode to Implementation Mapping

The GENERATION_MODES_MAPPING dict in src/transformers/generation/utils.py132-143 maps each GenerationMode to the private method that implements it:

`GenerationMode`	Implementation	Notes
`GREEDY_SEARCH`	`_sample`	Uses `do_sample=False`
`SAMPLE`	`_sample`	Uses `do_sample=True`
`BEAM_SEARCH`	`_beam_search`	Uses `do_sample=False`
`BEAM_SAMPLE`	`_beam_search`	Uses `do_sample=True`
`ASSISTED_GENERATION`	`_assisted_decoding`	Requires draft model or prompt lookup
`DOLA_GENERATION`	`"transformers-community/dola"`	Deprecated, loads from Hub
`CONTRASTIVE_SEARCH`	`"transformers-community/contrastive-search"`	Deprecated, loads from Hub
`GROUP_BEAM_SEARCH`	`"transformers-community/group-beam-search"`	Deprecated, loads from Hub
`CONSTRAINED_BEAM_SEARCH`	`"transformers-community/constrained-beam-search"`	Deprecated, loads from Hub

Both GREEDY_SEARCH and SAMPLE are dispatched to _sample. The difference is that greedy search is equivalent to sampling with the argmax (i.e., do_sample=False).

Sources: src/transformers/generation/configuration_utils.py63-79 src/transformers/generation/utils.py132-143

Generation Mode Selection Logic

The generation mode is determined by calling GenerationConfig.get_generation_mode(), which is also referenced in GenerationMixin.generate(). The selection follows a priority-based algorithm:

Mode Selection Flowchart

Mode Selection Rules

The mode selection follows this priority order:

Deprecated Modes: If any deprecated mode-specific parameters are set, raise an error directing users to custom generation methods
Assisted Generation: If assistant_model or prompt_lookup_num_tokens is set → ASSISTED_GENERATION
Beam Search: If num_beams > 1:
- If do_sample=True → BEAM_SAMPLE
- Else → BEAM_SEARCH
Sampling: If do_sample=True → SAMPLE
Greedy Search: Default → GREEDY_SEARCH

Sources: src/transformers/generation/configuration_utils.py81-638 src/transformers/generation/utils.py132-143

Implementation Method Selection

Once the mode is determined, GenerationMixin.generate() calls the appropriate private implementation:

Dispatch from generate() in GenerationMixin

Sources: src/transformers/generation/utils.py132-143 src/transformers/generation/utils.py369-491

Configuration Loading and Precedence

The GenerationConfig can be loaded from multiple sources with a defined precedence order:

Configuration Loading Diagram

Precedence Order

Configuration values are resolved in this order (later overrides earlier):

Default Values: Hard-coded defaults in _get_default_generation_params() src/transformers/generation/configuration_utils.py1093-1122
Model Config: Generation-related parameters extracted from the model's config.json
Generation Config File: Values from generation_config.json if it exists
Runtime Arguments: Parameters passed directly to generate() method

Loading Methods

The GenerationConfig class provides several loading methods:

Method	Purpose	Location
`__init__(**kwargs)`	Create config with explicit parameters	src/transformers/generation/configuration_utils.py341-437
`from_pretrained(pretrained_model_name)`	Load from Hub or local directory	src/transformers/generation/configuration_utils.py439-548
`from_model_config(model_config)`	Extract from model configuration	src/transformers/generation/configuration_utils.py550-617
`update(**kwargs)`	Update existing config with new values	src/transformers/generation/configuration_utils.py848-877

Sources: src/transformers/generation/configuration_utils.py439-617 src/transformers/generation/utils.py369-431

Configuration Validation

The GenerationConfig validates itself at initialization time through the validate method src/transformers/generation/configuration_utils.py879-1091 Validation includes:

Validation Rules

Incompatible Parameter Combinations: Detect mutually exclusive settings
Sampling Parameters Without do_sample: Warn if sampling-only parameters are set when do_sample=False
- temperature, top_k, top_p, min_p, typical_p, epsilon_cutoff, eta_cutoff
Beam-Only Parameters Without Beam Search: Warn if beam-specific parameters are set when num_beams=1
- length_penalty, early_stopping
Output Dependencies: Check that output flags requiring return_dict_in_generate=True are valid
- output_scores, output_logits, output_attentions, output_hidden_states
Cache Requirements: Validate that assisted generation uses a rollback-compatible cache (currently only DynamicCache)

Validation Severity Levels

The validation uses different severity levels:

Level	Action	Example
Error	Raise `ValueError`	Impossible parameter combinations (e.g., `num_return_sequences > num_beams` in beam search)
Warning	Log warning via `logger.warning_once()`	Inconsequent but technically valid configs (e.g., setting `temperature` with `do_sample=False`)
Info	No action	Valid configurations

Sources: src/transformers/generation/configuration_utils.py879-1091

Default Parameter Values

When parameters are not explicitly set (None), the generation system applies defaults during the generation loop via _get_default_generation_params() src/transformers/generation/configuration_utils.py1093-1122:

Default Values Table

Parameter	Default Value	Condition
`do_sample`	`False`	Always
`num_beams`	`1`	Always
`use_cache`	`True`	If model supports caching
`max_length`	`20`	If neither `max_length` nor `max_new_tokens` is set
`temperature`	`1.0`	If `do_sample=True`
`top_k`	`50`	If `do_sample=True`
`top_p`	`1.0`	If `do_sample=True`
`renormalize_logits`	`False`	Always
`output_scores`	`False`	Always
`output_logits`	`False`	Always
`output_attentions`	`False`	Always
`output_hidden_states`	`False`	Always
`return_dict_in_generate`	`False`	Always
`num_return_sequences`	`1`	Always

Sources: src/transformers/generation/configuration_utils.py1093-1122

Integration with GenerationMixin

The GenerationConfig is tightly integrated with the GenerationMixin class src/transformers/generation/utils.py337-1492:

Configuration Lifecycle

Configuration Access Pattern

Throughout generation, the configuration is accessed via:

Direct attribute access: generation_config.max_new_tokens
Mode determination: generation_config.get_generation_mode() (method on GenerationConfig)
Updating at call time: generation_config.update(**kwargs)
Default fill-in: generation_config._get_default_generation_params()

Sources: src/transformers/generation/utils.py369-491 src/transformers/generation/configuration_utils.py81-638

Cache Implementation Strings

The cache_implementation field in GenerationConfig accepts a string name that maps to a specific Cache subclass. The recognized values are defined in src/transformers/generation/configuration_utils.py45-56:

String value	Cache class	Notes
`"dynamic"`	`DynamicCache`	Default; grows as tokens are generated
`"dynamic_full"`	`DynamicCache` (full history)	Required for assisted generation rollback
`"offloaded"`	`DynamicCache` (CPU offloaded)	Saves GPU memory
`"quantized"`	`QuantizedCache`	Quantizes KV cache to reduce memory
`"static"`	`StaticCache`	Pre-allocated; required for `torch.compile`
`"offloaded_static"`	`StaticCache` (CPU offloaded)	—
`"sliding_window"`	—	Deprecated
`"hybrid"`	—	Deprecated
`"hybrid_chunked"`	—	Deprecated

Additional parameters for a specific cache class can be passed via cache_config (a dict).

Sources: src/transformers/generation/configuration_utils.py45-56

Special Configuration Cases

Assisted Generation Configuration

Assisted generation has unique configuration requirements src/transformers/generation/candidate_generator.py103-191:

Requires cache_implementation="dynamic_full" for rollback support (set automatically)
The assistant model gets its own GenerationConfig derived from the main config
num_assistant_tokens controls speculation depth (how many draft tokens to propose per step)
num_assistant_tokens_schedule ("heuristic", "heuristic_transient", "constant") controls how that number adapts at runtime
assistant_confidence_threshold enables dynamic speculation cutoff based on the draft model's confidence

Encoder-Decoder Configuration

Encoder-decoder models use specific configuration parameters:

decoder_start_token_id: Initial token ID for the decoder
encoder_no_repeat_ngram_size: Prevent n-grams from the encoder input from appearing in the decoder output
decoder_start_token_id can be a list to allow different start tokens per batch element

Static Cache and Compilation

When using static caches for torch.compile compatibility:

cache_implementation must be "static" or "offloaded_static"
max_cache_len or max_length determines the pre-allocated tensor size
cache_config can pass extra parameters (e.g., max_batch_size)
compile_config (a CompileConfig instance) controls how generate() invokes torch.compile on the forward pass
disable_compile=True suppresses automatic compilation even when a static cache is used

Sources: src/transformers/generation/candidate_generator.py103-191 src/transformers/generation/configuration_utils.py45-56 src/transformers/generation/configuration_utils.py330-337

Output Types Based on Configuration

The generation output type is determined by the mode and return_dict_in_generate flag:

Output Type Selection

Mode	`return_dict_in_generate`	Output Type
`GREEDY_SEARCH` / `SAMPLE`	`False`	`torch.LongTensor` (just sequences)
`GREEDY_SEARCH` / `SAMPLE`	`True` (decoder-only)	`GenerateDecoderOnlyOutput`
`GREEDY_SEARCH` / `SAMPLE`	`True` (encoder-decoder)	`GenerateEncoderDecoderOutput`
`BEAM_SEARCH` / `BEAM_SAMPLE`	`False`	`torch.LongTensor` (just sequences)
`BEAM_SEARCH` / `BEAM_SAMPLE`	`True` (decoder-only)	`GenerateBeamDecoderOnlyOutput`
`BEAM_SEARCH` / `BEAM_SAMPLE`	`True` (encoder-decoder)	`GenerateBeamEncoderDecoderOutput`

Output Dataclass Fields

All output classes defined in src/transformers/generation/utils.py146-329 share common fields:

sequences: Generated token IDs
scores (optional): Processed prediction scores per step
logits (optional): Raw prediction logits per step
past_key_values (optional): Final cache state

Beam outputs additionally include:

sequences_scores: Final beam scores
beam_indices: Beam indices for each generated token

Sources: src/transformers/generation/utils.py146-329

Common Configuration Patterns

Greedy Decoding Configuration

Sampling Configuration

Beam Search Configuration

Assisted Generation Configuration

Sources: docs/source/en/generation_strategies.md23-98

Generation Configuration and Modes

Relevant source files

Overview

Key Components:

GenerationConfig: Configuration dataclass containing all generation parameters
GenerationMode: Enum defining available generation strategies
Mode Selection Logic: Algorithm that determines which generation method to invoke based on configuration parameters

Sources: src/transformers/generation/configuration_utils.py81-638 src/transformers/generation/configuration_utils.py63-79

GenerationConfig Class

Configuration Structure

Sources: src/transformers/generation/configuration_utils.py107-337

Parameter Categories

The configuration parameters are organized into logical categories:

Category	Key Parameters	Purpose
Length Control	`max_length`, `max_new_tokens`, `min_length`, `min_new_tokens`, `early_stopping`, `max_time`, `stop_strings`	Control the length of generated sequences
Strategy Selection	`do_sample`, `num_beams`	Determine which generation mode to use
Cache Management	`use_cache`, `cache_implementation`, `cache_config`	Configure KV cache behavior
Logits Manipulation	`temperature`, `top_k`, `top_p`, `min_p`, `top_h`, `typical_p`, `epsilon_cutoff`, `eta_cutoff`, `repetition_penalty`, `encoder_repetition_penalty`, `length_penalty`, `no_repeat_ngram_size`, `bad_words_ids`, `forced_bos_token_id`, `forced_eos_token_id`	Modify token probability distributions
Output Control	`output_scores`, `output_logits`, `output_attentions`, `output_hidden_states`, `return_dict_in_generate`, `num_return_sequences`	Control what information is returned
Special Tokens	`pad_token_id`, `bos_token_id`, `eos_token_id`	Define special token IDs
Encoder-Decoder	`encoder_no_repeat_ngram_size`, `decoder_start_token_id`	Encoder-decoder specific parameters
Assisted Generation	`num_assistant_tokens`, `num_assistant_tokens_schedule`, `assistant_confidence_threshold`, `prompt_lookup_num_tokens`, `max_matching_ngram_size`, `assistant_early_exit`, `assistant_lookbehind`, `target_lookbehind`	Control speculative/assisted decoding
Performance	`compile_config` (`CompileConfig`), `disable_compile`	Control `torch.compile` behavior for static-cache decoding

Sources: src/transformers/generation/configuration_utils.py341-437

GenerationMode Enum

The GenerationMode enum is defined in src/transformers/generation/configuration_utils.py63-79 and represents the available generation strategies:

Mode to Implementation Mapping

The GENERATION_MODES_MAPPING dict in src/transformers/generation/utils.py132-143 maps each GenerationMode to the private method that implements it:

`GenerationMode`	Implementation	Notes
`GREEDY_SEARCH`	`_sample`	Uses `do_sample=False`
`SAMPLE`	`_sample`	Uses `do_sample=True`
`BEAM_SEARCH`	`_beam_search`	Uses `do_sample=False`
`BEAM_SAMPLE`	`_beam_search`	Uses `do_sample=True`
`ASSISTED_GENERATION`	`_assisted_decoding`	Requires draft model or prompt lookup
`DOLA_GENERATION`	`"transformers-community/dola"`	Deprecated, loads from Hub
`CONTRASTIVE_SEARCH`	`"transformers-community/contrastive-search"`	Deprecated, loads from Hub
`GROUP_BEAM_SEARCH`	`"transformers-community/group-beam-search"`	Deprecated, loads from Hub
`CONSTRAINED_BEAM_SEARCH`	`"transformers-community/constrained-beam-search"`	Deprecated, loads from Hub

Both GREEDY_SEARCH and SAMPLE are dispatched to _sample. The difference is that greedy search is equivalent to sampling with the argmax (i.e., do_sample=False).

Sources: src/transformers/generation/configuration_utils.py63-79 src/transformers/generation/utils.py132-143

Generation Mode Selection Logic

The generation mode is determined by calling GenerationConfig.get_generation_mode(), which is also referenced in GenerationMixin.generate(). The selection follows a priority-based algorithm:

Mode Selection Flowchart

Mode Selection Rules

The mode selection follows this priority order:

Deprecated Modes: If any deprecated mode-specific parameters are set, raise an error directing users to custom generation methods
Assisted Generation: If assistant_model or prompt_lookup_num_tokens is set → ASSISTED_GENERATION
Beam Search: If num_beams > 1:
- If do_sample=True → BEAM_SAMPLE
- Else → BEAM_SEARCH
Sampling: If do_sample=True → SAMPLE
Greedy Search: Default → GREEDY_SEARCH

Sources: src/transformers/generation/configuration_utils.py81-638 src/transformers/generation/utils.py132-143

Implementation Method Selection

Once the mode is determined, GenerationMixin.generate() calls the appropriate private implementation:

Dispatch from generate() in GenerationMixin

Sources: src/transformers/generation/utils.py132-143 src/transformers/generation/utils.py369-491

Configuration Loading and Precedence

The GenerationConfig can be loaded from multiple sources with a defined precedence order:

Configuration Loading Diagram

Precedence Order

Configuration values are resolved in this order (later overrides earlier):

Default Values: Hard-coded defaults in _get_default_generation_params() src/transformers/generation/configuration_utils.py1093-1122
Model Config: Generation-related parameters extracted from the model's config.json
Generation Config File: Values from generation_config.json if it exists
Runtime Arguments: Parameters passed directly to generate() method

Loading Methods

The GenerationConfig class provides several loading methods:

Method	Purpose	Location
`__init__(**kwargs)`	Create config with explicit parameters	src/transformers/generation/configuration_utils.py341-437
`from_pretrained(pretrained_model_name)`	Load from Hub or local directory	src/transformers/generation/configuration_utils.py439-548
`from_model_config(model_config)`	Extract from model configuration	src/transformers/generation/configuration_utils.py550-617
`update(**kwargs)`	Update existing config with new values	src/transformers/generation/configuration_utils.py848-877

Sources: src/transformers/generation/configuration_utils.py439-617 src/transformers/generation/utils.py369-431

Configuration Validation

The GenerationConfig validates itself at initialization time through the validate method src/transformers/generation/configuration_utils.py879-1091 Validation includes:

Validation Rules

Incompatible Parameter Combinations: Detect mutually exclusive settings
Sampling Parameters Without do_sample: Warn if sampling-only parameters are set when do_sample=False
- temperature, top_k, top_p, min_p, typical_p, epsilon_cutoff, eta_cutoff
Beam-Only Parameters Without Beam Search: Warn if beam-specific parameters are set when num_beams=1
- length_penalty, early_stopping
Output Dependencies: Check that output flags requiring return_dict_in_generate=True are valid
- output_scores, output_logits, output_attentions, output_hidden_states
Cache Requirements: Validate that assisted generation uses a rollback-compatible cache (currently only DynamicCache)

Validation Severity Levels

The validation uses different severity levels:

Level	Action	Example
Error	Raise `ValueError`	Impossible parameter combinations (e.g., `num_return_sequences > num_beams` in beam search)
Warning	Log warning via `logger.warning_once()`	Inconsequent but technically valid configs (e.g., setting `temperature` with `do_sample=False`)
Info	No action	Valid configurations

Sources: src/transformers/generation/configuration_utils.py879-1091

Default Parameter Values

Default Values Table

Parameter	Default Value	Condition
`do_sample`	`False`	Always
`num_beams`	`1`	Always
`use_cache`	`True`	If model supports caching
`max_length`	`20`	If neither `max_length` nor `max_new_tokens` is set
`temperature`	`1.0`	If `do_sample=True`
`top_k`	`50`	If `do_sample=True`
`top_p`	`1.0`	If `do_sample=True`
`renormalize_logits`	`False`	Always
`output_scores`	`False`	Always
`output_logits`	`False`	Always
`output_attentions`	`False`	Always
`output_hidden_states`	`False`	Always
`return_dict_in_generate`	`False`	Always
`num_return_sequences`	`1`	Always

Sources: src/transformers/generation/configuration_utils.py1093-1122

Integration with GenerationMixin

The GenerationConfig is tightly integrated with the GenerationMixin class src/transformers/generation/utils.py337-1492:

Configuration Lifecycle

Configuration Access Pattern

Throughout generation, the configuration is accessed via:

Direct attribute access: generation_config.max_new_tokens
Mode determination: generation_config.get_generation_mode() (method on GenerationConfig)
Updating at call time: generation_config.update(**kwargs)
Default fill-in: generation_config._get_default_generation_params()

Sources: src/transformers/generation/utils.py369-491 src/transformers/generation/configuration_utils.py81-638

Cache Implementation Strings

String value	Cache class	Notes
`"dynamic"`	`DynamicCache`	Default; grows as tokens are generated
`"dynamic_full"`	`DynamicCache` (full history)	Required for assisted generation rollback
`"offloaded"`	`DynamicCache` (CPU offloaded)	Saves GPU memory
`"quantized"`	`QuantizedCache`	Quantizes KV cache to reduce memory
`"static"`	`StaticCache`	Pre-allocated; required for `torch.compile`
`"offloaded_static"`	`StaticCache` (CPU offloaded)	—
`"sliding_window"`	—	Deprecated
`"hybrid"`	—	Deprecated
`"hybrid_chunked"`	—	Deprecated

Additional parameters for a specific cache class can be passed via cache_config (a dict).

Sources: src/transformers/generation/configuration_utils.py45-56

Special Configuration Cases

Assisted Generation Configuration

Assisted generation has unique configuration requirements src/transformers/generation/candidate_generator.py103-191:

Requires cache_implementation="dynamic_full" for rollback support (set automatically)
The assistant model gets its own GenerationConfig derived from the main config
num_assistant_tokens controls speculation depth (how many draft tokens to propose per step)
num_assistant_tokens_schedule ("heuristic", "heuristic_transient", "constant") controls how that number adapts at runtime
assistant_confidence_threshold enables dynamic speculation cutoff based on the draft model's confidence

Encoder-Decoder Configuration

Encoder-decoder models use specific configuration parameters:

decoder_start_token_id: Initial token ID for the decoder
encoder_no_repeat_ngram_size: Prevent n-grams from the encoder input from appearing in the decoder output
decoder_start_token_id can be a list to allow different start tokens per batch element

Static Cache and Compilation

When using static caches for torch.compile compatibility:

cache_implementation must be "static" or "offloaded_static"
max_cache_len or max_length determines the pre-allocated tensor size
cache_config can pass extra parameters (e.g., max_batch_size)
compile_config (a CompileConfig instance) controls how generate() invokes torch.compile on the forward pass
disable_compile=True suppresses automatic compilation even when a static cache is used

Sources: src/transformers/generation/candidate_generator.py103-191 src/transformers/generation/configuration_utils.py45-56 src/transformers/generation/configuration_utils.py330-337

Output Types Based on Configuration

The generation output type is determined by the mode and return_dict_in_generate flag:

Output Type Selection

Mode	`return_dict_in_generate`	Output Type
`GREEDY_SEARCH` / `SAMPLE`	`False`	`torch.LongTensor` (just sequences)
`GREEDY_SEARCH` / `SAMPLE`	`True` (decoder-only)	`GenerateDecoderOnlyOutput`
`GREEDY_SEARCH` / `SAMPLE`	`True` (encoder-decoder)	`GenerateEncoderDecoderOutput`
`BEAM_SEARCH` / `BEAM_SAMPLE`	`False`	`torch.LongTensor` (just sequences)
`BEAM_SEARCH` / `BEAM_SAMPLE`	`True` (decoder-only)	`GenerateBeamDecoderOnlyOutput`
`BEAM_SEARCH` / `BEAM_SAMPLE`	`True` (encoder-decoder)	`GenerateBeamEncoderDecoderOutput`

Output Dataclass Fields

All output classes defined in src/transformers/generation/utils.py146-329 share common fields:

sequences: Generated token IDs
scores (optional): Processed prediction scores per step
logits (optional): Raw prediction logits per step
past_key_values (optional): Final cache state

Beam outputs additionally include:

sequences_scores: Final beam scores
beam_indices: Beam indices for each generated token

Sources: src/transformers/generation/utils.py146-329

Common Configuration Patterns

Greedy Decoding Configuration

Sampling Configuration

Beam Search Configuration

Assisted Generation Configuration

Sources: docs/source/en/generation_strategies.md23-98

Generation Configuration and Modes

Overview

GenerationConfig Class

Configuration Structure

Parameter Categories

GenerationMode Enum

Mode to Implementation Mapping

Generation Mode Selection Logic

Mode Selection Flowchart

Mode Selection Rules

Implementation Method Selection

Configuration Loading and Precedence

Configuration Loading Diagram

Precedence Order

Loading Methods

Configuration Validation

Validation Rules

Validation Severity Levels

Default Parameter Values

Default Values Table

Integration with GenerationMixin

Configuration Lifecycle

Configuration Access Pattern

Cache Implementation Strings

Special Configuration Cases

Assisted Generation Configuration

Encoder-Decoder Configuration

Static Cache and Compilation

Output Types Based on Configuration

Output Type Selection

Output Dataclass Fields

Common Configuration Patterns

Greedy Decoding Configuration

Sampling Configuration

Beam Search Configuration

Assisted Generation Configuration

On this page

Generation Configuration and Modes

Overview

GenerationConfig Class

Configuration Structure

Parameter Categories

GenerationMode Enum

Mode to Implementation Mapping

Generation Mode Selection Logic

Mode Selection Flowchart

Mode Selection Rules

Implementation Method Selection

Configuration Loading and Precedence

Configuration Loading Diagram

Precedence Order

Loading Methods

Configuration Validation

Validation Rules

Validation Severity Levels

Default Parameter Values

Default Values Table

Integration with GenerationMixin

Configuration Lifecycle

Configuration Access Pattern

Cache Implementation Strings

Special Configuration Cases

Assisted Generation Configuration

Encoder-Decoder Configuration

Static Cache and Compilation

Output Types Based on Configuration

Output Type Selection

Output Dataclass Fields

Common Configuration Patterns

Greedy Decoding Configuration

Sampling Configuration

Beam Search Configuration

Assisted Generation Configuration

On this page