This page provides an overview of DB-GPT's model integration and inference capabilities. It covers the Service-oriented Multi-model Management Framework (SMMF), the two primary deployment strategies (local and API proxy), and how models are registered, configured, and used for inference.
For detailed information on specific topics, see:
For information about how models are used within RAG pipelines, see RAG Pipeline and Knowledge Management. For information about multi-agent workflows that use models, see Multi-Agents and AWEL Workflows.
The SMMF is DB-GPT's core abstraction for managing and serving large language models. It provides a unified interface for interacting with 50+ different LLMs, regardless of whether they are deployed locally or accessed through API proxies. The framework decouples application logic from model deployment details, enabling developers to switch between models without code changes.
The key design principles of SMMF are:
Sources: README.md1-363 README.zh.md1-409 docs/docs/modules/smmf.md1-12
DB-GPT supports two primary deployment strategies for language models, each optimized for different use cases and operational requirements.
Local deployment runs models on infrastructure controlled by the user, providing maximum data privacy and customization. This strategy is implemented through multiple inference backends:
| Backend | Description | Use Case |
|---|---|---|
| HuggingFace Transformers | Direct model loading using the Transformers library | Development, smaller models, CPU inference |
| vLLM | High-throughput inference engine with PagedAttention | Production deployments, high concurrency |
| LLAMA.cpp | Optimized C++ implementation | CPU/Metal inference, edge devices |
| MLX | Apple Silicon optimized framework | MacOS deployment, M-series chips |
Local deployment provides:
API proxy deployment accesses models through external API providers, enabling rapid deployment without infrastructure requirements. Supported providers include:
API proxy deployment provides:
Sources: README.md180-293 README.zh.md191-318
Figure 1: SMMF Architecture - Model Integration Flow
This diagram illustrates the complete flow from application code through the SMMF to model inference. The ModelRegistry serves as the central hub for model discovery and routing, while the Worker Manager handles lifecycle management and load balancing across multiple model instances.
Sources: README.md180-293 docs/docs/modules/smmf.md1-12
The Model Registry is the central component that maintains information about available models and routes requests to appropriate workers. Models are configured through .toml configuration files that define:
Configuration file naming conventions:
dbgpt-proxy-*.toml: Configuration for API proxy modelsdbgpt-local-*.toml: Configuration for locally deployed modelsThe registry provides:
Sources: README.md180-293
Figure 2: Model Inference Request Sequence
This sequence diagram shows the complete lifecycle of an inference request from application code to model execution. The flow supports both synchronous and streaming responses, with the worker handling protocol translation between the application and backend.
Sources: README.md55-81
DB-GPT supports extensive model families across both local and proxy deployments. The following table summarizes major model families and their deployment options:
| Model Family | Provider | Local Support | API Proxy | Notable Models |
|---|---|---|---|---|
| DeepSeek | DeepSeek AI | ✅ | ✅ | DeepSeek-R1, DeepSeek-V3, DeepSeek-Coder |
| Qwen | Alibaba | ✅ | ✅ | Qwen3-235B, QwQ-32B, Qwen2.5-Coder |
| GLM | Tsinghua (Zhipu) | ✅ | ✅ | GLM-Z1-32B, GLM-4, ChatGLM |
| Llama | Meta | ✅ | ❌ | Llama-3.1-405B, Llama-3.1-70B |
| Gemma | ✅ | ❌ | Gemma-2-27B, Gemma-7B | |
| Yi | 01.AI | ✅ | ✅ | Yi-1.5-34B, Yi-34B |
| GPT | OpenAI | ❌ | ✅ | GPT-4, GPT-3.5-Turbo |
| Baichuan | Baichuan | ✅ | ✅ | Baichuan2-13B |
| InternLM | Shanghai AI Lab | ✅ | ❌ | InternLM2 |
| Mixtral | Mistral AI | ✅ | ❌ | Mixtral-8x7B |
| Phi | Microsoft | ✅ | ❌ | Phi-3 |
The complete list of supported models is maintained in the model configuration directory and can be extended through custom adapters.
Sources: README.md184-293 README.zh.md195-318
Figure 3: Worker Pool Management and Load Balancing
The Worker Manager maintains a pool of model worker instances, each capable of handling inference requests. The Model Controller routes incoming requests to available workers based on:
Worker lifecycle management includes:
For detailed information on worker implementation, see Model Workers and Inference Backends.
Sources: README.md55-81
The SMMF integrates with higher-level DB-GPT components through standardized interfaces:
AWEL (Agentic Workflow Expression Language) workflows can reference models by name in their DAG definitions. The workflow engine automatically resolves model names to worker instances through the Model Registry.
Multi-agent systems use models for reasoning, planning, and tool selection. Agents specify model requirements (e.g., reasoning capability, context length) and the SMMF selects appropriate models.
RAG pipelines use models for both embedding generation and text generation. The SMMF supports separate model configurations for embedding models and generation models, enabling optimized model selection for each task.
For more details on these integrations, see:
Sources: README.md55-81
Model configuration files follow a standardized TOML format:
Configuration files are loaded by the Model Registry during startup or when models are dynamically registered.
Sources: README.md180-293
The SMMF is designed for high-throughput inference with several optimization techniques:
For detailed information on hardware acceleration and performance optimization, see Hardware Acceleration and Performance.
Sources: README.md180-293 docs/docs/modules/smmf.md1-12
The SMMF supports dynamic model switching and fallback mechanisms:
These capabilities enable sophisticated deployment strategies such as:
Sources: README.md55-81
Model integration includes security considerations:
For privacy-focused deployments, local deployment is recommended. For more information, see Model Configuration and Deployment.
Sources: README.md294-297
The SMMF is designed for extensibility. Adding support for new models involves:
BaseModelAdapter interface.toml configuration fileCustom adapters can be developed for:
For detailed implementation guidance, see Model Adapters and Proxy Models.
Sources: README.md180-293
Next Steps: For detailed information on specific aspects of model integration:
Refresh this wiki