Model Configuration and Deployment

Relevant source files

This document covers the configuration and deployment of models in DB-GPT through .toml configuration files, the model registry system, and various deployment strategies. It explains how to configure both local models (HuggingFace Transformers, vLLM, llama.cpp, MLX) and proxy models (OpenAI, DeepSeek, Ollama, etc.), and how DB-GPT manages model instances through its Service-oriented Multi-model Management Framework (SMMF).

For information about model adapters and proxy implementation details, see Model Adapters and Proxy Models. For worker architecture and inference backends, see Model Workers and Inference Backends. For hardware acceleration options, see Hardware Acceleration and Performance.

Configuration File Structure

DB-GPT uses TOML configuration files to define model deployments. These files specify both LLM (Large Language Model) and embedding model configurations, along with their deployment parameters.

TOML Configuration Schema

Sources: docs/docs/quickstart.md96-104 docs/docs/quickstart.md186-194 docs/docs/quickstart.md228-244

Configuration File Examples

DB-GPT provides pre-configured templates for common deployment scenarios:

Configuration File	Purpose	Provider Type
`configs/dbgpt-proxy-openai.toml`	OpenAI API proxy	`proxy/openai`
`configs/dbgpt-proxy-deepseek.toml`	DeepSeek API proxy	`proxy/deepseek`
`configs/dbgpt-proxy-ollama.toml`	Ollama proxy	`proxy/ollama`
`configs/dbgpt-local-glm.toml`	Local GLM models	`hf` (HuggingFace)
`configs/dbgpt-local-vllm.toml`	vLLM inference	`vllm`
`configs/dbgpt-local-llama-cpp.toml`	llama.cpp inference	`llama.cpp`

Sources: docs/docs/quickstart.md124-135 docs/docs/quickstart.md379-387

Model Registry System

The model registry is the central component that manages model metadata and coordinates between configuration, deployment, and runtime access.

Model Registry Architecture

The model registry reads configuration files at startup and creates model metadata entries for each configured model. The ModelController manages model lifecycle (start, stop, restart), while the WorkerManager handles load balancing across multiple instances of the same model.

Sources: docs/docs/modules/smmf.md10-23 docs/docs/modules/smmf.md87-94

Deployment Strategies

Local Model Deployment

Local model deployment runs models on local infrastructure using various inference frameworks. This approach provides full control over model execution and data privacy.

HuggingFace Transformers Provider

Configuration for HuggingFace models using the Transformers library:

The hf provider automatically downloads models from HuggingFace Hub if path is not specified. The model name follows HuggingFace's repository naming convention (organization/model-name).

Installation command:

Sources: docs/docs/quickstart.md209-244 docs/docs/installation/sourcecode.md186-202

vLLM Provider

Configuration for high-throughput inference using vLLM:

vLLM provides optimized inference with PagedAttention and continuous batching for higher throughput. The provider is specified as vllm in the configuration.

Installation command:

Sources: docs/docs/quickstart.md254-297

llama.cpp Provider

Configuration for CPU and Metal (Apple Silicon) inference:

llama.cpp enables efficient CPU inference and optimized inference on Apple Silicon (M1/M2/M3) chips. For CUDA support, set the environment variable CMAKE_ARGS="-DGGML_CUDA=ON" before installation.

Installation command (with CUDA):

Sources: docs/docs/quickstart.md300-360

MLX Provider

Configuration for Apple Silicon optimized inference:

MLX is Apple's machine learning framework optimized for Apple Silicon. It provides efficient inference on M-series chips with unified memory architecture.

Sources: Inferred from system architecture diagrams

Proxy Model Deployment

Proxy model deployment connects to external API services without running models locally. This reduces infrastructure requirements while accessing powerful commercial models.

OpenAI Proxy Configuration

The proxy/openai provider connects to OpenAI's API. The api_key can be provided in the configuration file or through the OPENAI_API_KEY environment variable.

Installation command:

Sources: docs/docs/quickstart.md110-148 docs/docs/installation/sourcecode.md92-116

DeepSeek Proxy Configuration

DeepSeek proxy configuration demonstrates mixing proxy LLMs with local embedding models. The embedding model uses the hf provider while the LLM uses proxy/deepseek.

Sources: docs/docs/quickstart.md150-207 docs/docs/installation/sourcecode.md133-164

Ollama Proxy Configuration

Ollama proxy connects to a local or remote Ollama server. The api_base parameter specifies the Ollama API endpoint.

Installation command:

Sources: docs/docs/quickstart.md362-400

Starting the Webserver

Command Line Interface

DB-GPT webserver is started using the dbgpt start webserver command with a configuration file:

Alternative command using Python directly:

The --config parameter specifies which .toml configuration file to use for model deployment.

Sources: docs/docs/quickstart.md138-147 docs/docs/quickstart.md196-206

Configuration Loading Flow

Sources: docs/docs/quickstart.md138-148 docs/docs/installation/sourcecode.md106-116

Model Provider Types

Provider Type Mapping

Provider Type	Description	Inference Backend	Use Case
`hf`	HuggingFace Transformers	Transformers library	Standard local inference
`vllm`	vLLM inference engine	vLLM	High-throughput inference
`llama.cpp`	llama.cpp engine	llama.cpp	CPU/Metal inference
`mlx`	Apple MLX framework	MLX	Apple Silicon optimization
`proxy/openai`	OpenAI API proxy	OpenAI API	Commercial API access
`proxy/deepseek`	DeepSeek API proxy	DeepSeek API	Commercial API access
`proxy/qwen`	Qwen API proxy	Qwen API	Commercial API access
`proxy/ollama`	Ollama proxy	Ollama server	Local/remote Ollama

Sources: docs/docs/modules/smmf.md36-42 docs/docs/modules/smmf.md44-76

Provider Selection Logic

Sources: docs/docs/modules/smmf.md10-23

Model Switching

Runtime Model Selection

DB-GPT supports multiple configured models simultaneously. Applications can select models at runtime through the model name specified in API requests.

Sources: docs/docs/modules/smmf.md82-94

Multiple Model Configuration

A single configuration file can define multiple models:

The model registry maintains all configured models, allowing applications to switch between them based on requirements like performance, cost, or privacy.

Sources: docs/docs/quickstart.md126-135 docs/docs/quickstart.md179-194

Configuration Parameters Reference

Common LLM Parameters

Parameter	Type	Required	Description
`name`	string	Yes	Model identifier (HuggingFace repo or API model name)
`provider`	string	Yes	Provider type (hf, vllm, llama.cpp, mlx, proxy/*)
`path`	string	No	Local file system path to model files
`api_key`	string	Conditional	API key for proxy providers
`api_base`	string	No	Custom API base URL for proxy providers
`device`	string	No	Device placement (cuda, cpu, mps, metal)
`model_type`	string	No	Model architecture type
`max_gpu_memory`	string	No	Maximum GPU memory allocation (e.g., "24GB")

Sources: docs/docs/quickstart.md186-244 docs/docs/quickstart.md274-290

Common Embedding Parameters

Parameter	Type	Required	Description
`name`	string	Yes	Model identifier
`provider`	string	Yes	Provider type (hf, proxy/openai, proxy/ollama)
`path`	string	No	Local file system path to model files
`api_key`	string	Conditional	API key for proxy providers
`device`	string	No	Device placement (cuda, cpu, mps)

Sources: docs/docs/quickstart.md238-243 docs/docs/installation/sourcecode.md196-201

Storage Configuration Integration

Configuration files can also specify RAG storage backends:

This integrates model configuration with storage configuration in a single file, enabling complete application deployment from one configuration source.

Sources: docs/docs/installation/integrations/milvus_rag_install.md25-37 docs/docs/installation/integrations/graph_rag_install.md47-60

Environment Variables

Configuration values can be overridden or supplemented by environment variables:

Environment Variable	Purpose	Example
`OPENAI_API_KEY`	OpenAI API key	`sk-...`
`UV_INDEX_URL`	PyPI index URL	`https://pypi.tuna.tsinghua.edu.cn/simple`
`CMAKE_ARGS`	CMake build arguments	`"-DGGML_CUDA=ON"`

Environment variables take precedence over configuration file values for sensitive data like API keys.

Sources: docs/docs/quickstart.md87-92 docs/docs/quickstart.md124-125 docs/docs/quickstart.md302-303

Configuration Best Practices

Separating LLM and Embedding Models

It's common to use different providers for LLM and embedding models:

This approach balances access to powerful LLMs with data privacy for document embeddings.

Sources: docs/docs/quickstart.md164-176 docs/docs/installation/sourcecode.md133-152

Path Resolution

When path is not specified, the hf provider automatically downloads models from HuggingFace Hub to a cache directory. Specify path to:

Use pre-downloaded models
Control model storage location
Avoid network dependencies
Use custom fine-tuned models

Sources: docs/docs/quickstart.md234-243 docs/docs/installation/sourcecode.md186-202

Database Configuration

The configuration file also specifies the application database:

Sources: docs/docs/installation/sourcecode.md248-283

Model Configuration and Deployment

Relevant source files

Configuration File Structure

DB-GPT uses TOML configuration files to define model deployments. These files specify both LLM (Large Language Model) and embedding model configurations, along with their deployment parameters.

TOML Configuration Schema

Sources: docs/docs/quickstart.md96-104 docs/docs/quickstart.md186-194 docs/docs/quickstart.md228-244

Configuration File Examples

DB-GPT provides pre-configured templates for common deployment scenarios:

Configuration File	Purpose	Provider Type
`configs/dbgpt-proxy-openai.toml`	OpenAI API proxy	`proxy/openai`
`configs/dbgpt-proxy-deepseek.toml`	DeepSeek API proxy	`proxy/deepseek`
`configs/dbgpt-proxy-ollama.toml`	Ollama proxy	`proxy/ollama`
`configs/dbgpt-local-glm.toml`	Local GLM models	`hf` (HuggingFace)
`configs/dbgpt-local-vllm.toml`	vLLM inference	`vllm`
`configs/dbgpt-local-llama-cpp.toml`	llama.cpp inference	`llama.cpp`

Sources: docs/docs/quickstart.md124-135 docs/docs/quickstart.md379-387

Model Registry System

The model registry is the central component that manages model metadata and coordinates between configuration, deployment, and runtime access.

Model Registry Architecture

Sources: docs/docs/modules/smmf.md10-23 docs/docs/modules/smmf.md87-94

Deployment Strategies

Local Model Deployment

Local model deployment runs models on local infrastructure using various inference frameworks. This approach provides full control over model execution and data privacy.

HuggingFace Transformers Provider

Configuration for HuggingFace models using the Transformers library:

The hf provider automatically downloads models from HuggingFace Hub if path is not specified. The model name follows HuggingFace's repository naming convention (organization/model-name).

Installation command:

Sources: docs/docs/quickstart.md209-244 docs/docs/installation/sourcecode.md186-202

vLLM Provider

Configuration for high-throughput inference using vLLM:

vLLM provides optimized inference with PagedAttention and continuous batching for higher throughput. The provider is specified as vllm in the configuration.

Installation command:

Sources: docs/docs/quickstart.md254-297

llama.cpp Provider

Configuration for CPU and Metal (Apple Silicon) inference:

llama.cpp enables efficient CPU inference and optimized inference on Apple Silicon (M1/M2/M3) chips. For CUDA support, set the environment variable CMAKE_ARGS="-DGGML_CUDA=ON" before installation.

Installation command (with CUDA):

Sources: docs/docs/quickstart.md300-360

MLX Provider

Configuration for Apple Silicon optimized inference:

MLX is Apple's machine learning framework optimized for Apple Silicon. It provides efficient inference on M-series chips with unified memory architecture.

Sources: Inferred from system architecture diagrams

Proxy Model Deployment

Proxy model deployment connects to external API services without running models locally. This reduces infrastructure requirements while accessing powerful commercial models.

OpenAI Proxy Configuration

The proxy/openai provider connects to OpenAI's API. The api_key can be provided in the configuration file or through the OPENAI_API_KEY environment variable.

Installation command:

Sources: docs/docs/quickstart.md110-148 docs/docs/installation/sourcecode.md92-116

DeepSeek Proxy Configuration

DeepSeek proxy configuration demonstrates mixing proxy LLMs with local embedding models. The embedding model uses the hf provider while the LLM uses proxy/deepseek.

Sources: docs/docs/quickstart.md150-207 docs/docs/installation/sourcecode.md133-164

Ollama Proxy Configuration

Ollama proxy connects to a local or remote Ollama server. The api_base parameter specifies the Ollama API endpoint.

Installation command:

Sources: docs/docs/quickstart.md362-400

Starting the Webserver

Command Line Interface

DB-GPT webserver is started using the dbgpt start webserver command with a configuration file:

Alternative command using Python directly:

The --config parameter specifies which .toml configuration file to use for model deployment.

Sources: docs/docs/quickstart.md138-147 docs/docs/quickstart.md196-206

Configuration Loading Flow

Sources: docs/docs/quickstart.md138-148 docs/docs/installation/sourcecode.md106-116

Model Provider Types

Provider Type Mapping

Provider Type	Description	Inference Backend	Use Case
`hf`	HuggingFace Transformers	Transformers library	Standard local inference
`vllm`	vLLM inference engine	vLLM	High-throughput inference
`llama.cpp`	llama.cpp engine	llama.cpp	CPU/Metal inference
`mlx`	Apple MLX framework	MLX	Apple Silicon optimization
`proxy/openai`	OpenAI API proxy	OpenAI API	Commercial API access
`proxy/deepseek`	DeepSeek API proxy	DeepSeek API	Commercial API access
`proxy/qwen`	Qwen API proxy	Qwen API	Commercial API access
`proxy/ollama`	Ollama proxy	Ollama server	Local/remote Ollama

Sources: docs/docs/modules/smmf.md36-42 docs/docs/modules/smmf.md44-76

Provider Selection Logic

Sources: docs/docs/modules/smmf.md10-23

Model Switching

Runtime Model Selection

DB-GPT supports multiple configured models simultaneously. Applications can select models at runtime through the model name specified in API requests.

Sources: docs/docs/modules/smmf.md82-94

Multiple Model Configuration

A single configuration file can define multiple models:

The model registry maintains all configured models, allowing applications to switch between them based on requirements like performance, cost, or privacy.

Sources: docs/docs/quickstart.md126-135 docs/docs/quickstart.md179-194

Configuration Parameters Reference

Common LLM Parameters

Parameter	Type	Required	Description
`name`	string	Yes	Model identifier (HuggingFace repo or API model name)
`provider`	string	Yes	Provider type (hf, vllm, llama.cpp, mlx, proxy/*)
`path`	string	No	Local file system path to model files
`api_key`	string	Conditional	API key for proxy providers
`api_base`	string	No	Custom API base URL for proxy providers
`device`	string	No	Device placement (cuda, cpu, mps, metal)
`model_type`	string	No	Model architecture type
`max_gpu_memory`	string	No	Maximum GPU memory allocation (e.g., "24GB")

Sources: docs/docs/quickstart.md186-244 docs/docs/quickstart.md274-290

Common Embedding Parameters

Parameter	Type	Required	Description
`name`	string	Yes	Model identifier
`provider`	string	Yes	Provider type (hf, proxy/openai, proxy/ollama)
`path`	string	No	Local file system path to model files
`api_key`	string	Conditional	API key for proxy providers
`device`	string	No	Device placement (cuda, cpu, mps)

Sources: docs/docs/quickstart.md238-243 docs/docs/installation/sourcecode.md196-201

Storage Configuration Integration

Configuration files can also specify RAG storage backends:

This integrates model configuration with storage configuration in a single file, enabling complete application deployment from one configuration source.

Sources: docs/docs/installation/integrations/milvus_rag_install.md25-37 docs/docs/installation/integrations/graph_rag_install.md47-60

Environment Variables

Configuration values can be overridden or supplemented by environment variables:

Environment Variable	Purpose	Example
`OPENAI_API_KEY`	OpenAI API key	`sk-...`
`UV_INDEX_URL`	PyPI index URL	`https://pypi.tuna.tsinghua.edu.cn/simple`
`CMAKE_ARGS`	CMake build arguments	`"-DGGML_CUDA=ON"`

Environment variables take precedence over configuration file values for sensitive data like API keys.

Sources: docs/docs/quickstart.md87-92 docs/docs/quickstart.md124-125 docs/docs/quickstart.md302-303

Configuration Best Practices

Separating LLM and Embedding Models

It's common to use different providers for LLM and embedding models:

This approach balances access to powerful LLMs with data privacy for document embeddings.

Sources: docs/docs/quickstart.md164-176 docs/docs/installation/sourcecode.md133-152

Path Resolution

When path is not specified, the hf provider automatically downloads models from HuggingFace Hub to a cache directory. Specify path to:

Use pre-downloaded models
Control model storage location
Avoid network dependencies
Use custom fine-tuned models

Sources: docs/docs/quickstart.md234-243 docs/docs/installation/sourcecode.md186-202

Database Configuration

The configuration file also specifies the application database:

Sources: docs/docs/installation/sourcecode.md248-283

Model Configuration and Deployment

Configuration File Structure

TOML Configuration Schema

Configuration File Examples

Model Registry System

Model Registry Architecture

Deployment Strategies

Local Model Deployment

HuggingFace Transformers Provider

vLLM Provider

llama.cpp Provider

MLX Provider

Proxy Model Deployment

OpenAI Proxy Configuration

DeepSeek Proxy Configuration

Ollama Proxy Configuration

Starting the Webserver

Command Line Interface

Configuration Loading Flow

Model Provider Types

Provider Type Mapping

Provider Selection Logic

Model Switching

Runtime Model Selection

Multiple Model Configuration

Configuration Parameters Reference

Common LLM Parameters

Common Embedding Parameters

Storage Configuration Integration

Environment Variables

Configuration Best Practices

Separating LLM and Embedding Models

Path Resolution

Database Configuration

On this page

Model Configuration and Deployment

Configuration File Structure

TOML Configuration Schema

Configuration File Examples

Model Registry System

Model Registry Architecture

Deployment Strategies

Local Model Deployment

HuggingFace Transformers Provider

vLLM Provider

llama.cpp Provider

MLX Provider

Proxy Model Deployment

OpenAI Proxy Configuration

DeepSeek Proxy Configuration

Ollama Proxy Configuration

Starting the Webserver

Command Line Interface

Configuration Loading Flow

Model Provider Types

Provider Type Mapping

Provider Selection Logic

Model Switching

Runtime Model Selection

Multiple Model Configuration

Configuration Parameters Reference

Common LLM Parameters

Common Embedding Parameters

Storage Configuration Integration

Environment Variables

Configuration Best Practices

Separating LLM and Embedding Models

Path Resolution

Database Configuration

On this page