DB-GPT is an open-source AI-native data application development framework designed to build infrastructure for large language model applications. This document provides a high-level overview of the system's architecture, core capabilities, and design philosophy.
Scope: This page covers the fundamental concepts, architecture, and capabilities of DB-GPT. For detailed installation instructions, see Installation. For deep dives into specific subsystems like RAG, AWEL, or Multi-Agents, refer to their respective sections (RAG, AWEL and Core Framework, Multi-Agents).
DB-GPT is a comprehensive framework for developing data-driven AI applications with minimal code. Built as a modular monorepo, it provides production-ready components for:
The framework enables enterprises and developers to build bespoke applications in the "Data 3.0" era, combining large models with structured and unstructured data sources.
Sources: README.md53-59 README.md68-81
DB-GPT follows a layered architecture organized as a monorepo with seven specialized packages.
Architecture Principles:
dbgpt-core allow pluggable implementations in dbgpt-extdbgpt-serveSources: README.md62-66 Diagram 1 from provided context
DB-GPT v0.7.0 introduced a monorepo structure with seven packages, managed using the uv package manager for faster dependency resolution.
| Package | Purpose | Key Components |
|---|---|---|
packages/dbgpt-core | Foundational abstractions and interfaces | AWEL engine, Agent framework, Storage interfaces (StorageInterface, SQLAlchemyStorage) |
packages/dbgpt-ext | Pluggable storage and connector implementations | Vector stores (Milvus, Chroma, etc.), Knowledge graphs (TuGraph, Neo4j), Data sources (MySQL, PostgreSQL, etc.) |
packages/dbgpt-serve | Service layer for business logic | RAG service, Evaluation service, Storage manager |
packages/dbgpt-app | Main application entry point | dbgpt_server.py, Knowledge service, Application routing |
packages/dbgpt-client | Client SDK for API access | Python client for REST API |
packages/dbgpt-accelerator | Hardware acceleration modules | vLLM, Flash Attention, Quantization (BitsAndBytes, GPTQ) |
packages/dbgpt-sandbox | Isolated execution environment | Code execution sandbox |
DB-GPT uses uv for fast dependency management with optional extras. Installation follows the pattern:
Configuration is managed through TOML files in the configs/ directory:
configs/dbgpt-proxy-openai.toml - OpenAI proxy model configurationconfigs/dbgpt-proxy-deepseek.toml - DeepSeek proxy configurationconfigs/dbgpt-local-glm.toml - Local GLM model configurationSources: README.md117-128 docs/docs/quickstart.md29-30 docs/docs/quickstart.md77-95
DB-GPT provides a complete RAG framework for building knowledge-based applications. The pipeline supports:
KnowledgeFactoryChunkManager for intelligent text splittingIndexStoreBaseEmbeddingAssemblerFor detailed RAG implementation, see RAG Pipeline and Knowledge Management.
Sources: README.md70 Diagram 2 from provided context
The GBI system bridges natural language queries with structured data sources through an intelligent Text2SQL pipeline:
Supported data sources include MySQL, PostgreSQL, Oracle, MSSQL, ClickHouse, DuckDB, Hive, and Spark.
For Text2SQL implementation details, see Generative Business Intelligence (GBI).
Sources: README.md72-74 README.md168-171 Diagram 6 from provided context
DB-GPT implements a data-driven multi-agent system with:
Agent types include Data Agents, Plugin Agents, Code Agents, Chat Agents, and Custom Agents.
For multi-agent implementation, see Multi-Agents and AWEL Workflows.
Sources: README.md76 README.md172-174 Diagram 4 from provided context
AWEL is a declarative workflow language for composing complex data processing pipelines as Directed Acyclic Graphs (DAGs):
AWEL enables developers to compose sophisticated workflows that integrate RAG, agents, models, and data sources.
For AWEL details, see Core Framework and AWEL.
Sources: README.md55 README.md200-203 Diagram 4 from provided context
SMMF provides unified model management supporting 50+ LLMs through two deployment strategies:
Local Deployment:
API Proxies:
The framework includes hardware acceleration through Flash Attention, quantization (BitsAndBytes 8-bit/4-bit, GPTQ), and CUDA support (11.8, 12.1, 12.4).
For model configuration, see Service-oriented Multi-model Management Framework (SMMF).
Sources: README.md180-292 README.md-zh.md191-318 Diagram 3 from provided context
sqlalchemy, pydantic, fastapistorage_milvus, storage_chromadb, graph_rag)vllm, flash-attn, bitsandbytes, gptqfastapi, uvicorn, sqlalchemyThe default installation uses SQLite for metadata storage (no external database required). Production deployments can switch to MySQL or PostgreSQL via configuration.
Sources: packages/dbgpt-core/src/dbgpt/storage/metadata/db_storage.py1-49 docs/docs/quickstart.md27-28 Diagram 5 from provided context
The primary entry point is dbgpt_server.py:
packages/dbgpt-app/src/dbgpt_app/dbgpt_server.py
Start the server using:
Or directly:
Sources: docs/docs/quickstart.md137-147 docs/docs/installation/sourcecode.md106-116
Configuration follows a TOML-based system in the configs/ directory:
| Configuration File | Purpose | Key Sections |
|---|---|---|
configs/dbgpt-proxy-openai.toml | OpenAI proxy setup | [models.llms], [models.embeddings] with api_key |
configs/dbgpt-proxy-deepseek.toml | DeepSeek proxy setup | LLM and embedding configuration |
configs/dbgpt-local-glm.toml | Local GLM-4 model | HuggingFace model paths with provider = "hf" |
Example configuration structure:
For storage configuration:
Sources: docs/docs/quickstart.md124-135 docs/docs/installation/integrations/milvus_rag_install.md27-37 docs/docs/installation/integrations/oceanbase_rag_install.md27-37
DB-GPT implements a flexible storage abstraction through StorageInterface in dbgpt-core:
The SQLAlchemyStorage class provides the core metadata persistence:
This abstraction allows seamless switching between storage backends through configuration without code changes.
For detailed storage implementations, see Storage Architecture and Databases.
Sources: packages/dbgpt-core/src/dbgpt/storage/metadata/db_storage.py21-49 Diagram 2 from provided context
DB-GPT supports multiple deployment modes:
Sources: docs/docs/installation/sourcecode.md14-21 docs/docs/installation/sourcecode.md69-89
Multi-stage Dockerfiles support different installation modes:
Build and run:
Sources: Diagram 5 from provided context, docs/sidebars.js127-140
Orchestrates multiple services (database, webserver) with volume management:
The docker-compose.yml file defines service dependencies and networking.
Sources: docs/sidebars.js131-134
For production deployments, install via pip:
Individual packages can be installed separately:
Sources: Diagram 5 from provided context
For comprehensive deployment instructions, see Deployment and Configuration.
To begin using DB-GPT:
uv run dbgpt start webserver --config <config-file>http://localhost:5670 in your browserFor detailed quickstart instructions, see Quickstart.
For development setup and contributing, see Development Guide.
Sources: docs/docs/quickstart.md1-217 README.md139-159
Sources: README.md49 README.md323-349
Refresh this wiki