This document describes RAGFlow's pluggable document storage architecture, which supports three backend engines for storing document chunks and their vector embeddings: Elasticsearch, Infinity, and OpenSearch. It covers how to select an engine via configuration, the architecture that enables this flexibility, and the process for switching between engines.
For information about the broader storage architecture including MySQL, MinIO, and Redis, see Data Storage Architecture. For object storage configuration (MinIO/S3/OSS/Azure), see Configuration Management.
RAGFlow uses a document store to persist parsed document chunks along with their full-text indexes and vector embeddings. The system supports three pluggable backend engines that can be selected via a single environment variable without code changes.
| Engine | Version | Default | Primary Strengths | Limitations |
|---|---|---|---|---|
| Elasticsearch | 8.11.3 | ✓ Yes | Mature full-text search, proven at scale, extensive tooling | Higher memory usage (~8GB), complex configuration |
| Infinity | 0.7.0 | ✗ No | Native vector storage, SQL interface, lower memory footprint | Newer project, no official ARM64 support |
| OpenSearch | 2.19.1 | ✗ No | AWS compatibility, Apache 2.0 license, Elasticsearch-compatible API | Similar memory requirements to ES |
All three engines implement the DocStoreConnection interface defined in common/doc_store/doc_store_base.py38-187 ensuring API compatibility. The application code remains unchanged regardless of which engine is selected.
Architecture diagram: Pluggable Document Engine
Sources: docker/.env13-20 README.md273-291 docker/README.md1-19 common/doc_store/doc_store_base.py38-187
The document engine is selected via the DOC_ENGINE variable in docker/.env:
Valid values:
elasticsearch (default)infinityopensearchoceanbase (experimental)seekdb (experimental)This variable controls two aspects:
Sources: docker/.env13-20 README.md273-291
The COMPOSE_PROFILES variable in docker/.env28 is constructed as:
This ensures only the selected engine's container is started when running docker compose up.
Sources: docker/.env28 docker/docker-compose.yml
Each engine has its own set of configuration variables in docker/.env:
Elasticsearch:
Infinity:
OpenSearch:
These variables are interpolated into service_conf.yaml.template when containers start, creating the runtime configuration.
Sources: docker/.env30-52 docker/README.md23-46 docs/configurations.md44-70
All document engines implement the DocStoreConnection abstract base class defined in common/doc_store/doc_store_base.py38-187:
The base class defines the contract that all engines must fulfill:
| Method | Purpose | Return Type |
|---|---|---|
search() | Hybrid search combining text and vector | (DataFrame, int) |
get() | Retrieve single document by ID | dict | None |
insert() | Bulk insert documents | list[str] |
update() | Update documents matching condition | bool |
delete() | Delete documents matching condition | bool |
create_idx() | Initialize index/table with schema | bool |
delete_idx() | Drop index/table | bool |
Sources: common/doc_store/doc_store_base.py38-187
All engines support a unified query expression language defined by common/doc_store/doc_store_base.py21-35:
Each engine translates these abstract expressions into engine-specific query syntax. For example, a MatchTextExpr becomes:
query_string query rag/utils/es_conn.py95-98match_text() builder method rag/utils/infinity_conn.py241-250query_string query rag/utils/opensearch_conn.pySources: common/doc_store/doc_store_base.py21-35 rag/utils/es_conn.py92-98 rag/utils/infinity_conn.py241-250
The application instantiates the correct connection class based on settings.DOC_ENGINE:
All three connection classes use the @singleton decorator common/decorator.py to ensure only one instance exists per process.
Sources: rag/utils/es_conn.py33 rag/utils/infinity_conn.py29 rag/utils/opensearch_conn.py40
Connection parameters (rag/utils/es_conn.py common/doc_store/es_conn_base.py27-66):
Key characteristics:
vm.max_map_count >= 262144 README.md158-178query_string rag/utils/es_conn.py95-98knn field type rag/utils/es_conn.py106-112MEM_LIMIT docker/.env65Index mapping: Elasticsearch uses dynamic mappings defined at index creation time common/doc_store/es_conn_base.py68-133 Key fields include:
docnm_kwd: Keyword field for document namescontent_ltks: Text field for full-text contentq_*_vec: Dense vector fields for embeddings (size determined at runtime)Sources: docker/.env30-42 rag/utils/es_conn.py33-173 common/doc_store/es_conn_base.py27-133
Connection parameters (rag/utils/infinity_conn.py29-30 common/doc_store/infinity_conn_base.py35-84):
Key characteristics:
InfinityConnectionPool common/doc_store/infinity_conn_pool.py15-119{index_name}_{kb_id} pattern rag/utils/infinity_conn.py233-239Field mapping: Infinity uses a different field naming scheme than Elasticsearch:
| ES Field | Infinity Field | Type | Analyzer |
|---|---|---|---|
docnm_kwd | docnm | varchar | rag-coarse, rag-fine |
title_tks | docnm | varchar | rag-coarse, rag-fine |
important_kwd | important_keywords | varchar | rag-coarse, rag-fine |
content_with_weight | content | varchar | rag-coarse, rag-fine |
The conversion logic is in rag/utils/infinity_conn.py44-86 and rag/utils/infinity_conn.py362-389
Table creation: When a table doesn't exist, Infinity automatically creates it with the appropriate schema rag/utils/infinity_conn.py326-347 The schema includes:
Sources: docker/.env67-73 rag/utils/infinity_conn.py29-440 common/doc_store/infinity_conn_base.py35-427 conf/infinity_mapping.json1-41
Connection parameters (rag/utils/opensearch_conn.py41-97):
Key characteristics:
knn queries for vector similarity rag/utils/opensearch_conn.pySources: docker/.env44-52 rag/utils/opensearch_conn.py41-770
Before switching engines, ensure:
vm.max_map_count for ES/Infinity)Warning: Switching engines requires clearing existing data volumes. The down -v command deletes all document chunks and vectors README.md283
Sources: README.md273-294
Step-by-step process:
Stop all containers and clear volumes:
The -v flag is critical—it removes Docker volumes containing existing data README.md283
Update the engine selection in docker/.env20:
Restart containers:
Docker Compose reads the updated COMPOSE_PROFILES and starts only the infinity container.
Verify the switch:
Sources: README.md273-294 README_zh.md273-293 docs/configurations.md1-42
Critical warning: The docker compose down -v command used during engine switching permanently deletes all document chunks and vector embeddings. Only MySQL metadata (dataset configurations, user data) is preserved README.md283
Impact of switching engines:
| Data Type | Preserved | Lost | Recovery Method |
|---|---|---|---|
| User accounts | ✓ Yes (MySQL) | - | N/A |
| Dataset configurations | ✓ Yes (MySQL) | - | N/A |
| Document metadata | ✓ Yes (MySQL) | - | N/A |
| Parsed chunks | ✗ No (ES/Infinity/OS) | All chunks | Re-parse documents |
| Vector embeddings | ✗ No (ES/Infinity/OS) | All vectors | Re-parse documents |
| Original files | ✓ Yes (MinIO/S3) | - | N/A |
Migration workflow for production systems:
Step-by-step migration procedure:
Pre-migration backup:
Document current state:
Perform engine switch (see Switching Between Engines)
Post-migration recovery:
Re-parse documents:
Downtime estimation:
Alternative: Test before migration (recommended for production):
Field name compatibility issues:
Infinity uses different field names than Elasticsearch rag/utils/infinity_conn.py362-389 The application handles conversion automatically, but manual exports/imports are not supported:
| Elasticsearch Field | Infinity Field | Notes |
|---|---|---|
docnm_kwd | docnm | Automatic conversion |
important_kwd | important_keywords | Array handling differs |
content_with_weight | content | Weight calculation differs |
Sources: README.md283 docker/.env13-20 rag/utils/infinity_conn.py362-389
ARM64 platforms:
Sources: README.md188-189 README.md294
The document store connection is accessed throughout the service layer:
Services never directly instantiate engine-specific classes. Instead, they import from a common module that returns the appropriate singleton based on configuration.
Sources: rag/utils/es_conn.py33 rag/utils/infinity_conn.py29
Each engine uses slightly different naming patterns:
Elasticsearch/OpenSearch:
{index_name} (e.g., ragflow_chunk)ragflow_doc_meta_{tenant_id}Infinity:
{index_name}_{kb_id} (e.g., ragflow_chunk_abc123)ragflow_doc_meta_{tenant_id}This difference is transparent to the application layer due to the abstraction interface.
Sources: rag/utils/infinity_conn.py229-292 rag/utils/es_conn.py175-211
| Aspect | Elasticsearch | Infinity | OpenSearch |
|---|---|---|---|
| Maturity | Very mature (10+ years) | Newer project (~2 years) | Mature (ES fork, 3+ years) |
| Vector Search Performance | Good (knn plugin) | Excellent (native HNSW) | Good (knn plugin) |
| Full-Text Search Quality | Excellent (BM25) | Good (custom analyzers) | Excellent (BM25) |
| Memory Usage | High (~8GB min) | Moderate (~4-6GB) | High (~8GB min) |
| Disk I/O | Moderate | Lower (optimized storage) | Moderate |
| Scalability | Excellent (sharding/clustering) | Good (single-node optimized) | Excellent (sharding/clustering) |
| Query Language | JSON DSL | SQL + Builder API | JSON DSL |
| Multi-Tenancy | Index per tenant | Table per KB | Index per tenant |
| Monitoring Tools | Kibana, extensive ecosystem | Limited | OpenSearch Dashboards |
| ARM64 Support | Unofficial (custom build) | Not supported | Unofficial (custom build) |
| Production Adoption | Very high | Growing | High (especially AWS) |
Sources: docker/.env65 README.md294 docker/.env30-73
Minimum system requirements (from README.md147-151):
CPU: 4+ cores (x86_64 recommended)
RAM: 16+ GB (system total)
Disk: 50+ GB SSD (for index storage)
Per-container memory limits docker/.env65:
Storage growth estimates:
Sources: README.md147-151 docker/.env65
Choose Elasticsearch if:
Choose Infinity if:
Choose OpenSearch if:
Sources: README.md294 docker/.env13-20 Analysis based on implementation characteristics
| File | Purpose | Key Settings |
|---|---|---|
| docker/.env13-73 | Engine selection and connection parameters | DOC_ENGINE, ES_PORT, INFINITY_HOST |
| docker/service_conf.yaml.template | Runtime service configuration (auto-generated) | Interpolated from .env |
| conf/infinity_mapping.json1-41 | Infinity table schema definition | Field types and analyzers |
| docker/docker-compose.yml | Container orchestration | Service profiles and dependencies |
| docker/docker-compose-base.yml | Dependency-only compose file | ES, Infinity, MySQL, Redis, MinIO |
Sources: docker/.env docker/README.md1-165 docs/configurations.md1-207
Refresh this wiki