The Storage Manager and Connectors subsystem provides a unified factory pattern and abstraction layer for managing multiple storage backends in DB-GPT's RAG pipeline. This system enables configuration-driven selection and instantiation of vector stores, knowledge graphs, and full-text search engines without requiring application-level code changes.
Related Pages: For details on specific vector store implementations, see Vector Stores and Embedding Systems. For knowledge graph storage, see Knowledge Graphs and GraphRAG. For the overall RAG pipeline integration, see RAG Pipeline and Knowledge Management.
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py1-297 packages/dbgpt-serve/src/dbgpt_serve/rag/connector.py1-330 packages/dbgpt-core/src/dbgpt/storage/base.py19-28
The StorageManager class serves as the central factory for all storage backends in DB-GPT. It implements the Singleton pattern through the BaseComponent framework and manages the lifecycle of storage instances.
| Responsibility | Description | Implementation |
|---|---|---|
| Factory Pattern | Creates appropriate storage instances based on type | get_storage_connector() method |
| Configuration Management | Reads and validates storage configuration | storage_config() property |
| Instance Caching | Maintains cache of created stores with thread-safety | _store_cache + _cache_lock |
| Type Resolution | Maps storage types to concrete implementations | get_vector_supported_types property |
| Validation | Ensures required configuration is present | Error messages with setup instructions |
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py18-42
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py39-87
The StorageManager is registered as a system component with ComponentType.RAG_STORAGE_MANAGER:
Key implementation details:
threading.Lock() to synchronize access to the _store_cache dictionarySources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py18-32
The VectorStoreConnector provides a consistent interface that wraps different vector store implementations. This abstraction allows the rest of the system to work with a uniform API regardless of the underlying storage technology.
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/connector.py48-132
The connector follows a lazy initialization pattern to optimize resource usage:
| Phase | Description | Trigger |
|---|---|---|
| Construction | Stores configuration parameters but doesn't create the actual store | VectorStoreConnector.__init__() |
| Lazy Creation | Creates the underlying vector store on first method call | First access to _vector_store property |
| Configuration Resolution | Converts configuration dictionary to typed config object | VectorStoreConfig.from_dict() |
| Store Instantiation | Calls create_store() on the config to get concrete implementation | config.create_store(**kwargs) |
| Method Delegation | All subsequent operations delegate to the created store | Store-specific methods |
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/connector.py73-132
The storage configuration is organized hierarchically in the application configuration:
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py34-37 packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py89-120
The system determines which concrete store implementation to use based on configuration:
| Store Category | Configuration Path | Type Values | Implementation |
|---|---|---|---|
| Vector Store | rag.storage.vector.type | milvus, chroma, elasticsearch, oceanbase, pgvector, weaviate | Corresponding *Store class |
| Knowledge Graph | rag.storage.graph.type | TuGraph, Neo4j | BuiltinKnowledgeGraph, CommunitySummaryKG |
| Full-Text Search | rag.storage.full_text.type | elasticsearch | ElasticDocumentStore |
The resolution process:
get_storage_connector(index_name, storage_type, llm_model)StorageManager reads configuration using storage_config()storage_type parameter, routes to appropriate factory methodSources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py39-87
The create_vector_store() method handles vector database instantiation:
Key aspects:
_cache_lockEmbeddingFactorymax_chunks_once_load, max_threads)Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py89-120
The create_kg_store() method creates knowledge graph storage:
Configuration requirements:
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py122-185
The create_full_text_store() method creates Elasticsearch-based full-text search:
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py187-209
All storage configurations extend VectorStoreConfig or similar base classes:
Each configuration class implements the factory method pattern:
__type__ class variable for identificationcreate_store() to instantiate the corresponding storeSources: packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/milvus_store.py102-177 packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/chroma_store.py43-64 packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/elastic_store.py77-133
Store configurations are registered with AWEL (Agentic Workflow Expression Language) for UI integration:
| Configuration | Resource Name | Category | Parameters |
|---|---|---|---|
MilvusVectorConfig | "milvus_vector_config" | VECTOR_STORE | uri, port, alias, primary_field, text_field, embedding_field |
ChromaVectorConfig | "chroma_vector_config" | VECTOR_STORE | persist_path, collection_metadata |
ElasticsearchStoreConfig | "elasticsearch_vector_config" | VECTOR_STORE | uri, port, alias, index_name |
OceanBaseConfig | "oceanbase_vector_config" | VECTOR_STORE | ob_host, ob_port, ob_user, ob_password, ob_database |
PGVectorConfig | "pg_vector_config" | VECTOR_STORE | connection_string |
WeaviateConfig | "weaviate_vector_config" | VECTOR_STORE | weaviate_url, api_key |
The @register_resource decorator enables these configurations to be used in AWEL workflows and exposed in the UI.
Sources: packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/milvus_store.py30-101 packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/chroma_store.py26-42 packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/elastic_store.py25-76
Services interact with storage through the StorageManager:
Sources: packages/dbgpt-app/src/dbgpt_app/knowledge/service.py84-85 packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py89-120
Applications can work with multiple storage types simultaneously:
This enables hybrid retrieval strategies combining:
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py39-87
| Store | Type String | Key Features | Configuration Keys |
|---|---|---|---|
| Milvus | milvus | HNSW index, distributed, high-performance | uri, port, primary_field, text_field, embedding_field |
| Chroma | chroma | Embedded, local persistence, simple setup | persist_path, collection_metadata |
| Elasticsearch | elasticsearch | Full-text + vector, BM25 hybrid search | uri, port, index_name |
| OceanBase | oceanbase | SQL + vector, HNSW index, normalized vectors | ob_host, ob_port, ob_user, ob_password, ob_database |
| PGVector | pgvector | PostgreSQL extension, SQL integration | connection_string |
| Weaviate | weaviate | Cloud-native, GraphQL API, multi-modal | weaviate_url, api_key |
Sources: packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/milvus_store.py196-296 packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/chroma_store.py84-145 packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/elastic_store.py152-194
Sources: packages/dbgpt-ext/src/dbgpt_ext/storage/knowledge_graph/knowledge_graph.py78-169 packages/dbgpt-ext/src/dbgpt_ext/storage/knowledge_graph/community_summary.py46-167
The StorageManager validates that required configuration is present before attempting to create stores:
Example error messages provided:
For missing Knowledge Graph configuration:
Graph storage is not configured. To use Knowledge Graph:
1. Make sure TuGraph database is running
(docker run -d -p 7687:7687 --name tugraph_demo tugraph/tugraph-runtime-centos7:latest)
2. Start the webserver with graph storage config:
python dbgpt_server.py --config configs/dbgpt-graphrag.toml
3. Or add [rag.storage.graph] section to your current config file
For missing Full-Text configuration:
FullText storage is not configured. To use Full-Text Search (BM25):
1. Make sure Elasticsearch is running
(docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:8.7.0)
2. Start the webserver with full-text config:
python dbgpt_server.py --config configs/dbgpt-bm25-rag.toml
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py48-85
Individual store implementations handle connection errors:
| Store | Error Handling | Fallback Behavior |
|---|---|---|
| Milvus | Validates connection parameters, checks collection existence | Creates collection if not exists |
| Chroma | Creates persist directory if needed | Initializes in-memory if persist fails |
| Elasticsearch | Tests connection on initialization | Raises descriptive error with connection string |
| OceanBase | Validates MySQL connection, checks vector extension | Creates table schema if missing |
| TuGraph | Tests HTTP/Bolt connection, validates credentials | Provides detailed error with endpoint info |
Sources: packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/milvus_store.py298-393 packages/dbgpt-ext/src/dbgpt_ext/storage/vector_store/chroma_store.py112-145
The StorageManager implements store-level caching to avoid recreating connections:
Cache key: index_name (typically the knowledge space name)
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py26-27 packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py89-120
The connector supports configurable batch loading for large document sets:
| Parameter | Purpose | Default | Configuration |
|---|---|---|---|
max_chunks_once_load | Number of chunks to load per batch | 10 | Passed to VectorStoreConnector.__init__() |
max_threads | Parallel loading threads | 1 | Passed to VectorStoreConnector.__init__() |
These parameters control the load_document_with_limit() method behavior, which:
max_chunks_once_loadmax_threads workersSources: packages/dbgpt-core/src/dbgpt/storage/base.py115-157 packages/dbgpt-serve/src/dbgpt_serve/rag/connector.py73-98
The Storage Manager and Connectors subsystem provides:
StorageManager.get_storage_connector()) for all storage types.toml configuration filesVectorStoreConnector and similar classes provide consistent APIs across backendsThis design enables DB-GPT applications to switch storage backends without code changes, supporting deployment flexibility from embedded databases (Chroma) to distributed systems (Milvus, OceanBase) based on scale requirements.
Sources: packages/dbgpt-serve/src/dbgpt_serve/rag/storage_manager.py18-297 packages/dbgpt-serve/src/dbgpt_serve/rag/connector.py48-330
Refresh this wiki