Configuration Loading

Relevant source files

This page documents how vLLM loads model configurations from various formats and sources. Model configuration loading is a critical step that occurs after model architecture detection (see 5.1) and before the model is instantiated. The configuration determines essential model properties like layer counts, attention mechanisms, vocabulary size, and architecture-specific parameters.

For information about how configurations are used to instantiate models, see Model Registry and Architecture Detection. For details on configuration objects like ModelConfig and VllmConfig, see Configuration Objects.

Configuration Format Support

The ConfigFormat type alias defines the three supported formats:

ConfigFormat = Literal["auto", "hf", "mistral"]

Format	Description	Config File	Primary Use Case
`hf`	Standard Transformers format	`config.json`	Most models from HF Hub or local paths
`mistral`	Mistral AI native format	`params.json`	Mistral-format checkpoints
`auto`	Auto-detect at runtime	—	Tries Mistral detection first, falls back to HF

Note on GGUF: GGUF-format models are not a separate config_format. GGUF detection (is_gguf(), is_remote_gguf()) occurs at the ModelConfig level, and GGUF metadata is patched onto the loaded HF config via maybe_patch_hf_config_from_gguf().

Format Detection Flow

get_config_parser() dispatch and get_config() auto-detection

Sources: vllm/transformers_utils/config.py237-253 vllm/transformers_utils/config.py30-42 vllm/config/model.py36-43

Configuration Parser Architecture

The configuration loading system is built around the ConfigParserBase abstract class defined in vllm/transformers_utils/config_parser_base.py, with two concrete implementations.

Class hierarchy and relationships

Parser Factory Pattern

get_config_parser() looks up _CONFIG_FORMAT_TO_CONFIG_PARSER and returns a new parser instance. Custom parsers can be registered using the @register_config_parser() decorator, which validates subclassing and overwrites any existing registration with a warning.

Sources: vllm/transformers_utils/config.py237-253 vllm/transformers_utils/config.py255-303

HuggingFace Configuration Loading

The HFConfigParser is the default parser and handles the majority of models. It implements a multi-stage resolution process:

Custom Configuration Registry (_CONFIG_REGISTRY)

_CONFIG_REGISTRY is a LazyConfigDict that maps model_type strings (from config.json) to vLLM-specific PretrainedConfig subclasses. Values are string class names resolved lazily via vllm.transformers_utils.configs on first access, avoiding import overhead at startup.

`model_type` key	Config Class	Notes
`afmoe`	`AfmoeConfig`
`chatglm`	`ChatGLMConfig`
`deepseek_vl_v2`	`DeepseekVLV2Config`
`deepseek_v32`	`DeepseekV3Config`	Maps DeepSeek-V3.2 to V3 config
`eagle`	`EAGLEConfig`	Draft model speculator
`flex_olmo`	`FlexOlmoConfig`
`hunyuan_vl`	`HunYuanVLConfig`
`jais`	`JAISConfig`
`kimi_linear`	`KimiLinearConfig`
`kimi_vl`	`KimiVLConfig`
`medusa`	`MedusaConfig`	Draft model speculator
`mlp_speculator`	`MLPSpeculatorConfig`	Draft model speculator
`nemotron`	`NemotronConfig`
`olmo3`	`Olmo3Config`
`RefinedWeb`	`RWConfig`	For `tiiuae/falcon-40b`
`RefinedWebModel`	`RWConfig`	For `tiiuae/falcon-7b`
`speculators`	`SpeculatorsConfig`	Auto-detected via `speculators_config` key
`ultravox`	`UltravoxConfig`

All config classes are lazily imported from vllm/transformers_utils/configs/__init__.py17-68

Sources: vllm/transformers_utils/config.py63-111 vllm/transformers_utils/config.py131-196 vllm/transformers_utils/config.py112-114

Loading from Different Sources

HuggingFace Hub

By default, vLLM loads models from the HuggingFace Hub. The download path is controlled by the HF_HOME environment variable:

Configuration Lookup Process:

Check if local_files_only=True (offline mode via HF_HUB_OFFLINE)
Call PretrainedConfig.get_config_dict() with model identifier
Download config.json to cache if not present
Load and parse JSON configuration

Sources: vllm/transformers_utils/config.py131-148 docs/models/supported_models.md189-221

ModelScope

When VLLM_USE_MODELSCOPE=True, vLLM switches to loading from ModelScope instead of HuggingFace Hub:

This allows users in regions with better access to ModelScope to use that ecosystem while maintaining the same API.

Sources: vllm/transformers_utils/config.py53-56 docs/models/supported_models.md316-339

Local Paths

Models can be loaded from local directories containing a valid config.json file. The parser checks for file existence using helper functions from repo_utils:

file_or_path_exists() - Check if file exists locally or remotely
try_get_local_file() - Attempt to get file from local path
list_repo_files() - List files in model directory

Sources: vllm/transformers_utils/repo_utils.py

Mistral Configuration Format

Mistral AI uses a custom configuration format with params.json instead of HuggingFace's config.json. The MistralConfigParser handles this format:

Mistral Configuration Adaptation

The adapt_config_dict() function in vllm/transformers_utils/configs/mistral.py converts Mistral's native format to a format compatible with vLLM's model implementations. Key adaptations include:

Merging missing fields from HuggingFace config (if available)
Setting max_position_embeddings from HF config if not in params.json
Converting Mistral-specific parameter names to HF equivalents

Sources: vllm/transformers_utils/config.py198-234 vllm/transformers_utils/repo_utils.py

Configuration Post-Processing

After loading the raw configuration, vLLM applies several post-processing steps to ensure compatibility:

RoPE Parameter Patching

RoPE (Rotary Position Embedding) parameters have evolved across Transformers versions. The patch_rope_parameters() function standardizes these:

Backwards Compatibility

The patching handles:

Legacy field names: rotary_emb_base, rotary_pct, rotary_emb_fraction
Legacy rope types: 'su' (→ 'longrope'), 'mrope' (→ 'default')
Nested parameters: For models with interleaved sliding attention
Transformers v4 vs v5: Different field locations and validation

Sources: vllm/transformers_utils/config.py306-399 vllm/transformers_utils/config.py123-129

Attribute Remapping

Some configurations use non-standard attribute names. The _maybe_remap_hf_config_attrs() function standardizes these:

This ensures that vLLM's model code can consistently access attributes using standard names.

Sources: vllm/transformers_utils/config.py112-114 vllm/transformers_utils/config.py476-483

Architecture-Specific Overrides

Some model architectures require special AutoConfig initialization parameters:

These overrides are applied in _maybe_update_auto_config_kwargs() before calling AutoConfig.from_pretrained().

Sources: vllm/transformers_utils/config.py116-121 vllm/transformers_utils/config.py467-474

GGUF Format Handling

GGUF is not a separate config_format value and does not have a ConfigParserBase implementation. Instead, GGUF detection and metadata extraction occur at the ModelConfig level, after get_config() loads the HF-format config.

GGUF handling flow

Key functions (all in vllm/transformers_utils/gguf_utils.py and vllm/config/model.py):

Function	Location	Role
`is_gguf(model)`	`gguf_utils.py`	Detect local `.gguf` path
`is_remote_gguf(model)`	`gguf_utils.py`	Detect remote `.gguf` URL
`check_gguf_file(model)`	`gguf_utils.py`	Validate GGUF file header
`split_remote_gguf(model)`	`gguf_utils.py`	Parse URL into base + filename
`maybe_patch_hf_config_from_gguf(config, path)`	`gguf_utils.py`	Overwrite HF config fields from GGUF metadata

The GGUF metadata embedded in the file contains model architecture, vocabulary size, layer count, and quantization parameters. maybe_patch_hf_config_from_gguf maps these to standard PretrainedConfig fields.

Sources: vllm/config/model.py36-43 vllm/transformers_utils/config.py30-35

Model-Specific Configuration Overrides

After a PretrainedConfig is loaded, some model architectures need additional modifications to the config object before the model can be instantiated. These are handled by VerifyAndUpdateConfig subclasses in vllm/model_executor/models/config.py.

Architecture of model-specific config fixups

These classes are referenced from a mapping (typically called MODELS_CONFIG_MAP or equivalent) that associates architecture class names with their VerifyAndUpdateConfig implementation. The verify_and_update_model_config() method is called during ModelConfig initialization after the HF config is loaded, allowing per-architecture patches such as:

Setting is_causal from model-specific fields
Computing derived RoPE kwargs (rotary_kwargs)
Setting pooler_config fields for embedding models
Patching hidden_act to the correct value for the model's actual activation function

Sources: vllm/model_executor/models/config.py21-170

Registering Custom Configuration Parsers

vLLM allows registration of custom configuration parsers for new formats:

Example Usage:

The decorator:

Validates that the class inherits from ConfigParserBase
Registers it in _CONFIG_FORMAT_TO_CONFIG_PARSER
Logs the registration
Overwrites existing parsers with a warning if the format is already registered

Sources: vllm/transformers_utils/config.py255-303

`hf_overrides` Mechanism

ModelConfig accepts an hf_overrides parameter (type HfOverrides = dict[str, Any] | Callable[[PretrainedConfig], PretrainedConfig]) to modify the loaded config before use:

Dict: Key-value pairs are applied directly to the PretrainedConfig object. Inside HFConfigParser.parse(), hf_overrides can also override model_type before the _CONFIG_REGISTRY lookup, enabling runtime architecture switching.
Callable: Called with the loaded PretrainedConfig and must return the (possibly modified) config.

The override is applied in HFConfigParser.parse() before the registry lookup, so it can direct the parser to use a different custom config class.

Sources: vllm/transformers_utils/config.py156-161 vllm/config/model.py252-255