This page documents how vLLM loads model configurations from various formats and sources. Model configuration loading is a critical step that occurs after model architecture detection (see 5.1) and before the model is instantiated. The configuration determines essential model properties like layer counts, attention mechanisms, vocabulary size, and architecture-specific parameters.
For information about how configurations are used to instantiate models, see Model Registry and Architecture Detection. For details on configuration objects like ModelConfig and VllmConfig, see Configuration Objects.
The ConfigFormat type alias defines the three supported formats:
ConfigFormat = Literal["auto", "hf", "mistral"]
| Format | Description | Config File | Primary Use Case |
|---|---|---|---|
hf | Standard Transformers format | config.json | Most models from HF Hub or local paths |
mistral | Mistral AI native format | params.json | Mistral-format checkpoints |
auto | Auto-detect at runtime | — | Tries Mistral detection first, falls back to HF |
Note on GGUF: GGUF-format models are not a separate
config_format. GGUF detection (is_gguf(),is_remote_gguf()) occurs at theModelConfiglevel, and GGUF metadata is patched onto the loaded HF config viamaybe_patch_hf_config_from_gguf().
get_config_parser() dispatch and get_config() auto-detection
Sources: vllm/transformers_utils/config.py237-253 vllm/transformers_utils/config.py30-42 vllm/config/model.py36-43
The configuration loading system is built around the ConfigParserBase abstract class defined in vllm/transformers_utils/config_parser_base.py, with two concrete implementations.
Class hierarchy and relationships
Parser Factory Pattern
get_config_parser() looks up _CONFIG_FORMAT_TO_CONFIG_PARSER and returns a new parser instance. Custom parsers can be registered using the @register_config_parser() decorator, which validates subclassing and overwrites any existing registration with a warning.
Sources: vllm/transformers_utils/config.py237-253 vllm/transformers_utils/config.py255-303
The HFConfigParser is the default parser and handles the majority of models. It implements a multi-stage resolution process:
Custom Configuration Registry (_CONFIG_REGISTRY)
_CONFIG_REGISTRY is a LazyConfigDict that maps model_type strings (from config.json) to vLLM-specific PretrainedConfig subclasses. Values are string class names resolved lazily via vllm.transformers_utils.configs on first access, avoiding import overhead at startup.
model_type key | Config Class | Notes |
|---|---|---|
afmoe | AfmoeConfig | |
chatglm | ChatGLMConfig | |
deepseek_vl_v2 | DeepseekVLV2Config | |
deepseek_v32 | DeepseekV3Config | Maps DeepSeek-V3.2 to V3 config |
eagle | EAGLEConfig | Draft model speculator |
flex_olmo | FlexOlmoConfig | |
hunyuan_vl | HunYuanVLConfig | |
jais | JAISConfig | |
kimi_linear | KimiLinearConfig | |
kimi_vl | KimiVLConfig | |
medusa | MedusaConfig | Draft model speculator |
mlp_speculator | MLPSpeculatorConfig | Draft model speculator |
nemotron | NemotronConfig | |
olmo3 | Olmo3Config | |
RefinedWeb | RWConfig | For tiiuae/falcon-40b |
RefinedWebModel | RWConfig | For tiiuae/falcon-7b |
speculators | SpeculatorsConfig | Auto-detected via speculators_config key |
ultravox | UltravoxConfig |
All config classes are lazily imported from vllm/transformers_utils/configs/__init__.py17-68
Sources: vllm/transformers_utils/config.py63-111 vllm/transformers_utils/config.py131-196 vllm/transformers_utils/config.py112-114
By default, vLLM loads models from the HuggingFace Hub. The download path is controlled by the HF_HOME environment variable:
Configuration Lookup Process:
local_files_only=True (offline mode via HF_HUB_OFFLINE)PretrainedConfig.get_config_dict() with model identifierconfig.json to cache if not presentSources: vllm/transformers_utils/config.py131-148 docs/models/supported_models.md189-221
When VLLM_USE_MODELSCOPE=True, vLLM switches to loading from ModelScope instead of HuggingFace Hub:
This allows users in regions with better access to ModelScope to use that ecosystem while maintaining the same API.
Sources: vllm/transformers_utils/config.py53-56 docs/models/supported_models.md316-339
Models can be loaded from local directories containing a valid config.json file. The parser checks for file existence using helper functions from repo_utils:
file_or_path_exists() - Check if file exists locally or remotelytry_get_local_file() - Attempt to get file from local pathlist_repo_files() - List files in model directorySources: vllm/transformers_utils/repo_utils.py
Mistral AI uses a custom configuration format with params.json instead of HuggingFace's config.json. The MistralConfigParser handles this format:
Mistral Configuration Adaptation
The adapt_config_dict() function in vllm/transformers_utils/configs/mistral.py converts Mistral's native format to a format compatible with vLLM's model implementations. Key adaptations include:
max_position_embeddings from HF config if not in params.jsonSources: vllm/transformers_utils/config.py198-234 vllm/transformers_utils/repo_utils.py
After loading the raw configuration, vLLM applies several post-processing steps to ensure compatibility:
RoPE (Rotary Position Embedding) parameters have evolved across Transformers versions. The patch_rope_parameters() function standardizes these:
Backwards Compatibility
The patching handles:
rotary_emb_base, rotary_pct, rotary_emb_fraction'su' (→ 'longrope'), 'mrope' (→ 'default')Sources: vllm/transformers_utils/config.py306-399 vllm/transformers_utils/config.py123-129
Some configurations use non-standard attribute names. The _maybe_remap_hf_config_attrs() function standardizes these:
This ensures that vLLM's model code can consistently access attributes using standard names.
Sources: vllm/transformers_utils/config.py112-114 vllm/transformers_utils/config.py476-483
Some model architectures require special AutoConfig initialization parameters:
These overrides are applied in _maybe_update_auto_config_kwargs() before calling AutoConfig.from_pretrained().
Sources: vllm/transformers_utils/config.py116-121 vllm/transformers_utils/config.py467-474
GGUF is not a separate config_format value and does not have a ConfigParserBase implementation. Instead, GGUF detection and metadata extraction occur at the ModelConfig level, after get_config() loads the HF-format config.
GGUF handling flow
Key functions (all in vllm/transformers_utils/gguf_utils.py and vllm/config/model.py):
| Function | Location | Role |
|---|---|---|
is_gguf(model) | gguf_utils.py | Detect local .gguf path |
is_remote_gguf(model) | gguf_utils.py | Detect remote .gguf URL |
check_gguf_file(model) | gguf_utils.py | Validate GGUF file header |
split_remote_gguf(model) | gguf_utils.py | Parse URL into base + filename |
maybe_patch_hf_config_from_gguf(config, path) | gguf_utils.py | Overwrite HF config fields from GGUF metadata |
The GGUF metadata embedded in the file contains model architecture, vocabulary size, layer count, and quantization parameters. maybe_patch_hf_config_from_gguf maps these to standard PretrainedConfig fields.
Sources: vllm/config/model.py36-43 vllm/transformers_utils/config.py30-35
After a PretrainedConfig is loaded, some model architectures need additional modifications to the config object before the model can be instantiated. These are handled by VerifyAndUpdateConfig subclasses in vllm/model_executor/models/config.py.
Architecture of model-specific config fixups
These classes are referenced from a mapping (typically called MODELS_CONFIG_MAP or equivalent) that associates architecture class names with their VerifyAndUpdateConfig implementation. The verify_and_update_model_config() method is called during ModelConfig initialization after the HF config is loaded, allowing per-architecture patches such as:
is_causal from model-specific fieldsrotary_kwargs)pooler_config fields for embedding modelshidden_act to the correct value for the model's actual activation functionSources: vllm/model_executor/models/config.py21-170
vLLM allows registration of custom configuration parsers for new formats:
Example Usage:
The decorator:
ConfigParserBase_CONFIG_FORMAT_TO_CONFIG_PARSERSources: vllm/transformers_utils/config.py255-303
hf_overrides MechanismModelConfig accepts an hf_overrides parameter (type HfOverrides = dict[str, Any] | Callable[[PretrainedConfig], PretrainedConfig]) to modify the loaded config before use:
PretrainedConfig object. Inside HFConfigParser.parse(), hf_overrides can also override model_type before the _CONFIG_REGISTRY lookup, enabling runtime architecture switching.PretrainedConfig and must return the (possibly modified) config.The override is applied in HFConfigParser.parse() before the registry lookup, so it can direct the parser to use a different custom config class.
Sources: vllm/transformers_utils/config.py156-161 vllm/config/model.py252-255
Sources: tests/models/registry.py187-526 examples/offline_inference/vision_language.py
Sources: tests/models/multimodal/processing/test_common.py267-293 examples/offline_inference/vision_language.py40-228
Sources: docs/models/supported_models.md316-339
The configuration loading system provides a flexible, extensible architecture for loading model configurations from multiple sources and formats:
The system prioritizes compatibility with the broader ML ecosystem while providing the flexibility needed for vLLM's specific requirements.
Sources: vllm/transformers_utils/config.py vllm/transformers_utils/configs/__init__.py vllm/transformers_utils/config_parser_base.py
Refresh this wiki
This wiki was recently refreshed. Please wait 5 days to refresh again.