This document covers the configuration and deployment of models in DB-GPT through .toml configuration files, the model registry system, and various deployment strategies. It explains how to configure both local models (HuggingFace Transformers, vLLM, llama.cpp, MLX) and proxy models (OpenAI, DeepSeek, Ollama, etc.), and how DB-GPT manages model instances through its Service-oriented Multi-model Management Framework (SMMF).
For information about model adapters and proxy implementation details, see Model Adapters and Proxy Models. For worker architecture and inference backends, see Model Workers and Inference Backends. For hardware acceleration options, see Hardware Acceleration and Performance.
DB-GPT uses TOML configuration files to define model deployments. These files specify both LLM (Large Language Model) and embedding model configurations, along with their deployment parameters.
Sources: docs/docs/quickstart.md96-104 docs/docs/quickstart.md186-194 docs/docs/quickstart.md228-244
DB-GPT provides pre-configured templates for common deployment scenarios:
| Configuration File | Purpose | Provider Type |
|---|---|---|
configs/dbgpt-proxy-openai.toml | OpenAI API proxy | proxy/openai |
configs/dbgpt-proxy-deepseek.toml | DeepSeek API proxy | proxy/deepseek |
configs/dbgpt-proxy-ollama.toml | Ollama proxy | proxy/ollama |
configs/dbgpt-local-glm.toml | Local GLM models | hf (HuggingFace) |
configs/dbgpt-local-vllm.toml | vLLM inference | vllm |
configs/dbgpt-local-llama-cpp.toml | llama.cpp inference | llama.cpp |
Sources: docs/docs/quickstart.md124-135 docs/docs/quickstart.md379-387
The model registry is the central component that manages model metadata and coordinates between configuration, deployment, and runtime access.
The model registry reads configuration files at startup and creates model metadata entries for each configured model. The ModelController manages model lifecycle (start, stop, restart), while the WorkerManager handles load balancing across multiple instances of the same model.
Sources: docs/docs/modules/smmf.md10-23 docs/docs/modules/smmf.md87-94
Local model deployment runs models on local infrastructure using various inference frameworks. This approach provides full control over model execution and data privacy.
Configuration for HuggingFace models using the Transformers library:
The hf provider automatically downloads models from HuggingFace Hub if path is not specified. The model name follows HuggingFace's repository naming convention (organization/model-name).
Installation command:
Sources: docs/docs/quickstart.md209-244 docs/docs/installation/sourcecode.md186-202
Configuration for high-throughput inference using vLLM:
vLLM provides optimized inference with PagedAttention and continuous batching for higher throughput. The provider is specified as vllm in the configuration.
Installation command:
Sources: docs/docs/quickstart.md254-297
Configuration for CPU and Metal (Apple Silicon) inference:
llama.cpp enables efficient CPU inference and optimized inference on Apple Silicon (M1/M2/M3) chips. For CUDA support, set the environment variable CMAKE_ARGS="-DGGML_CUDA=ON" before installation.
Installation command (with CUDA):
Sources: docs/docs/quickstart.md300-360
Configuration for Apple Silicon optimized inference:
MLX is Apple's machine learning framework optimized for Apple Silicon. It provides efficient inference on M-series chips with unified memory architecture.
Sources: Inferred from system architecture diagrams
Proxy model deployment connects to external API services without running models locally. This reduces infrastructure requirements while accessing powerful commercial models.
The proxy/openai provider connects to OpenAI's API. The api_key can be provided in the configuration file or through the OPENAI_API_KEY environment variable.
Installation command:
Sources: docs/docs/quickstart.md110-148 docs/docs/installation/sourcecode.md92-116
DeepSeek proxy configuration demonstrates mixing proxy LLMs with local embedding models. The embedding model uses the hf provider while the LLM uses proxy/deepseek.
Sources: docs/docs/quickstart.md150-207 docs/docs/installation/sourcecode.md133-164
Ollama proxy connects to a local or remote Ollama server. The api_base parameter specifies the Ollama API endpoint.
Installation command:
Sources: docs/docs/quickstart.md362-400
DB-GPT webserver is started using the dbgpt start webserver command with a configuration file:
Alternative command using Python directly:
The --config parameter specifies which .toml configuration file to use for model deployment.
Sources: docs/docs/quickstart.md138-147 docs/docs/quickstart.md196-206
Sources: docs/docs/quickstart.md138-148 docs/docs/installation/sourcecode.md106-116
| Provider Type | Description | Inference Backend | Use Case |
|---|---|---|---|
hf | HuggingFace Transformers | Transformers library | Standard local inference |
vllm | vLLM inference engine | vLLM | High-throughput inference |
llama.cpp | llama.cpp engine | llama.cpp | CPU/Metal inference |
mlx | Apple MLX framework | MLX | Apple Silicon optimization |
proxy/openai | OpenAI API proxy | OpenAI API | Commercial API access |
proxy/deepseek | DeepSeek API proxy | DeepSeek API | Commercial API access |
proxy/qwen | Qwen API proxy | Qwen API | Commercial API access |
proxy/ollama | Ollama proxy | Ollama server | Local/remote Ollama |
Sources: docs/docs/modules/smmf.md36-42 docs/docs/modules/smmf.md44-76
Sources: docs/docs/modules/smmf.md10-23
DB-GPT supports multiple configured models simultaneously. Applications can select models at runtime through the model name specified in API requests.
Sources: docs/docs/modules/smmf.md82-94
A single configuration file can define multiple models:
The model registry maintains all configured models, allowing applications to switch between them based on requirements like performance, cost, or privacy.
Sources: docs/docs/quickstart.md126-135 docs/docs/quickstart.md179-194
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Model identifier (HuggingFace repo or API model name) |
provider | string | Yes | Provider type (hf, vllm, llama.cpp, mlx, proxy/*) |
path | string | No | Local file system path to model files |
api_key | string | Conditional | API key for proxy providers |
api_base | string | No | Custom API base URL for proxy providers |
device | string | No | Device placement (cuda, cpu, mps, metal) |
model_type | string | No | Model architecture type |
max_gpu_memory | string | No | Maximum GPU memory allocation (e.g., "24GB") |
Sources: docs/docs/quickstart.md186-244 docs/docs/quickstart.md274-290
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Model identifier |
provider | string | Yes | Provider type (hf, proxy/openai, proxy/ollama) |
path | string | No | Local file system path to model files |
api_key | string | Conditional | API key for proxy providers |
device | string | No | Device placement (cuda, cpu, mps) |
Sources: docs/docs/quickstart.md238-243 docs/docs/installation/sourcecode.md196-201
Configuration files can also specify RAG storage backends:
This integrates model configuration with storage configuration in a single file, enabling complete application deployment from one configuration source.
Sources: docs/docs/installation/integrations/milvus_rag_install.md25-37 docs/docs/installation/integrations/graph_rag_install.md47-60
Configuration values can be overridden or supplemented by environment variables:
| Environment Variable | Purpose | Example |
|---|---|---|
OPENAI_API_KEY | OpenAI API key | sk-... |
UV_INDEX_URL | PyPI index URL | https://pypi.tuna.tsinghua.edu.cn/simple |
CMAKE_ARGS | CMake build arguments | "-DGGML_CUDA=ON" |
Environment variables take precedence over configuration file values for sensitive data like API keys.
Sources: docs/docs/quickstart.md87-92 docs/docs/quickstart.md124-125 docs/docs/quickstart.md302-303
It's common to use different providers for LLM and embedding models:
This approach balances access to powerful LLMs with data privacy for document embeddings.
Sources: docs/docs/quickstart.md164-176 docs/docs/installation/sourcecode.md133-152
When path is not specified, the hf provider automatically downloads models from HuggingFace Hub to a cache directory. Specify path to:
Sources: docs/docs/quickstart.md234-243 docs/docs/installation/sourcecode.md186-202
The configuration file also specifies the application database:
Refresh this wiki