Vector Database Providers are abstraction implementations for storing and retrieving document embeddings in AnythingLLM. This page documents the 10+ supported vector database integrations, their common interface, and provider-specific features. For information about the document vectorization pipeline that feeds these providers, see Document Vectorization Pipeline. For embedding engine selection, see LLM Provider Integration.
Each vector database provider in AnythingLLM extends the VectorDatabase base class and implements a standardized interface for vector storage operations. Providers are selected via the VECTOR_DB environment variable and instantiated through a factory pattern. All providers support multi-tenancy through namespace isolation, where each workspace receives its own namespace/collection in the vector database.
The system supports both local (LanceDB, Chroma) and cloud-hosted (Pinecone, Qdrant, Weaviate, AstraDB, Milvus/Zilliz) vector databases, enabling flexibility from development to production deployments.
Sources: server/utils/vectorDbProviders/lance/index.js1-507 server/utils/vectorDbProviders/chroma/index.js1-485 server/utils/vectorDbProviders/base.js1-50
All vector database providers implement a common interface defined by the VectorDatabase base class. Each provider handles vendor-specific API calls, data formats, and operational quirks while exposing consistent methods to the rest of the application.
Sources: server/utils/vectorDbProviders/lance/index.js17-507 server/utils/vectorDbProviders/chroma/index.js14-485 server/utils/vectorDbProviders/qdrant/index.js10-443 server/utils/vectorDbProviders/pinecone/index.js10-318 server/utils/vectorDbProviders/milvus/index.js15-435 server/utils/vectorDbProviders/zilliz/index.js8-37 server/utils/vectorDbProviders/weaviate/index.js11-512 server/utils/vectorDbProviders/astra/index.js31-475
AnythingLLM supports 10+ vector database providers, each with specific configuration requirements and operational characteristics.
| Provider | Class Name | Environment Variable | Default/Local | Notable Features |
|---|---|---|---|---|
| LanceDB | LanceDb | VECTOR_DB=lance | ✓ Default | File-based, reranking support, no external service required |
| Chroma | Chroma | VECTOR_DB=chroma | ✓ Local option | Collection name normalization, header-based auth |
| Qdrant | QDrant | VECTOR_DB=qdrant | ✗ | Cosine distance, batch upsert operations |
| Pinecone | Pinecone | VECTOR_DB=pinecone | ✗ | Cloud-hosted, namespace-based multi-tenancy |
| Milvus | Milvus | VECTOR_DB=milvus | ✗ | Self-hosted, AUTOINDEX with cosine metric |
| Zilliz | Zilliz | VECTOR_DB=zilliz | ✗ | Cloud Milvus, token-based auth |
| Weaviate | Weaviate | VECTOR_DB=weaviate | ✗ | Schema-based, camelCase class names, object flattening |
| AstraDB | AstraDB | VECTOR_DB=astra | ✗ | Serverless Cassandra, 20-record batch limit |
Sources: server/utils/vectorDbProviders/lance/index.js17-31 server/utils/vectorDbProviders/chroma/index.js14-22 server/utils/vectorDbProviders/qdrant/index.js10-17 server/utils/vectorDbProviders/pinecone/index.js10-17 server/utils/vectorDbProviders/milvus/index.js15-22 server/utils/vectorDbProviders/zilliz/index.js8-15 server/utils/vectorDbProviders/weaviate/index.js11-18 server/utils/vectorDbProviders/astra/index.js31-38
Each provider implements a connect() method that establishes a client connection and validates the service is operational through a heartbeat check.
Connection Example - LanceDB:
Connection Example - Chroma:
Sources: server/utils/vectorDbProviders/lance/index.js34-37 server/utils/vectorDbProviders/chroma/index.js67-91 server/utils/vectorDbProviders/qdrant/index.js19-37 server/utils/vectorDbProviders/pinecone/index.js19-32
Namespaces (also called collections or classes depending on the provider) provide workspace-level isolation. Each workspace in AnythingLLM receives a unique namespace in the vector database.
Namespace Operations:
| Method | Purpose | Return Type |
|---|---|---|
hasNamespace(namespace) | Check if namespace exists | Promise<Boolean> |
namespaceExists(client, namespace) | Check with existing client | Promise<Boolean> |
namespace(client, namespace) | Get namespace metadata | Promise<Object> |
namespaceCount(namespace) | Count vectors in namespace | Promise<Number> |
deleteVectorsInNamespace(client, namespace) | Delete entire namespace | Promise<Boolean> |
Provider-Specific Namespace Handling:
Some providers require namespace normalization due to naming restrictions:
Chroma: Collection names must match regex /^(?!\d+\.\d+\.\d+\.\d+$)(?!.*\.\.)(?=^[a-zA-Z0-9][a-zA-Z0-9_-]{1,61}[a-zA-Z0-9]$).{3,63}$/
Milvus/Zilliz: Collections must start with a letter or underscore and contain only letters, numbers, and underscores.
Weaviate: Class names must be in camelCase format.
AstraDB: Collections must be prefixed with ns_ and contain only valid characters.
Sources: server/utils/vectorDbProviders/chroma/index.js31-64 server/utils/vectorDbProviders/milvus/index.js28-33 server/utils/vectorDbProviders/weaviate/index.js148-161 server/utils/vectorDbProviders/astra/index.js10-16
The addDocumentToNamespace() method is the primary entry point for adding vectorized documents to a namespace. It implements a sophisticated caching mechanism to avoid redundant embedding operations.
Key Implementation Details:
Vector Caching: Vectors are cached in vector-cache/ directory using file path UUIDs as keys. Cache format preserves chunk structure for batch operations.
Batch Sizing: Different providers have different optimal batch sizes:
Text Splitting Configuration: Documents are split using TextSplitter with system-configured chunk size and overlap, respecting embedder model limitations.
Metadata Injection: Each vector includes metadata fields (title, source, published date) formatted as XML headers, plus the original document metadata.
Document-Vector Mapping: The DocumentVectors table maintains docId to vectorId relationships, enabling efficient document deletion and updates.
Sources: server/utils/vectorDbProviders/lance/index.js301-404 server/utils/vectorDbProviders/chroma/index.js203-342 server/utils/vectorDbProviders/qdrant/index.js155-330 server/utils/vectorDbProviders/pinecone/index.js115-216
The performSimilaritySearch() method queries the vector database for relevant document chunks based on semantic similarity to the input query.
Similarity Search Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
namespace | String | - | Workspace namespace to search |
input | String | - | Query text to embed and search |
LLMConnector | Object | - | LLM instance for query embedding |
similarityThreshold | Number | 0.25 | Minimum similarity score (0-1) |
topN | Number | 4 | Maximum number of results |
filterIdentifiers | Array<String> | [] | Source identifiers to exclude (pinned docs) |
rerank | Boolean | false | Enable reranking (LanceDB only) |
Distance to Similarity Conversion:
All providers implement distanceToSimilarity() to normalize distance metrics to 0-1 similarity scores:
Sources: server/utils/vectorDbProviders/lance/index.js406-456 server/utils/vectorDbProviders/chroma/index.js363-407 server/utils/vectorDbProviders/qdrant/index.js351-389 server/utils/vectorDbProviders/pinecone/index.js263-298
LanceDB supports native reranking using the NativeEmbeddingReranker to improve result quality by re-scoring an expanded candidate set.
Reranking Performance:
Sources: server/utils/vectorDbProviders/lance/index.js88-159 server/utils/vectorDbProviders/lance/index.js428-437
File Path: server/utils/vectorDbProviders/lance/index.js17-507
Storage Location:
Key Features:
STORAGE_DIR/lancedb/ or server/storage/lancedb/Unique Methods:
rerankedSimilarityResponse() - Implements two-stage retrieval with rerankingFile Path: server/utils/vectorDbProviders/chroma/index.js14-485
Configuration:
CHROMA_ENDPOINT - Chroma server endpoint (default: http://localhost:8000)CHROMA_API_HEADER - Optional auth header name (default: X-Api-Key)CHROMA_API_KEY - Optional API key for authenticationKey Features:
[-1, 1] unit vector)smartAdd() and smartDelete() wrappers for batch operationsCollection Name Rules:
..File Path: server/utils/vectorDbProviders/qdrant/index.js10-443
Configuration:
QDRANT_ENDPOINT - Qdrant server endpointQDRANT_API_KEY - Optional API key for authenticationKey Features:
upsert() with wait: true for synchronous operationspayload field, vectors as vector fieldCollection Creation:
File Path: server/utils/vectorDbProviders/pinecone/index.js10-318
Configuration:
PINECONE_API_KEY - Pinecone API keyPINECONE_INDEX - Pinecone index nameKey Features:
ready status before operationsIndex Statistics:
File Path: server/utils/vectorDbProviders/milvus/index.js15-435
Configuration:
MILVUS_ADDRESS - Milvus server addressMILVUS_USERNAME - Authentication usernameMILVUS_PASSWORD - Authentication passwordKey Features:
DataType.VarChar for ID, DataType.FloatVector for vectors, DataType.JSON for metadataIndexType.AUTOINDEX with MetricType.COSINEflushSync() after insertions for immediate consistencySchema Structure:
File Path: server/utils/vectorDbProviders/zilliz/index.js8-37
Configuration:
ZILLIZ_ENDPOINT - Zilliz Cloud endpointZILLIZ_API_TOKEN - Zilliz Cloud API tokenKey Features:
connect() methodFile Path: server/utils/vectorDbProviders/weaviate/index.js11-512
Configuration:
WEAVIATE_ENDPOINT - Weaviate server endpoint URLWEAVIATE_API_KEY - Optional API key for authenticationKey Features:
camelCase() helperwithNearVector() searchcertainty scores (0-1) instead of distancesClass Creation:
Object Flattening for Weaviate:
File Path: server/utils/vectorDbProviders/astra/index.js31-475
Configuration:
ASTRA_DB_APPLICATION_TOKEN - DataStax Astra DB application tokenASTRA_DB_ENDPOINT - Astra DB API endpointKey Features:
ns_ automatically_id, $vector, and metadata fieldsCollection Validation:
Sanitization:
Sources: server/utils/vectorDbProviders/astra/index.js10-475
All providers leverage a shared vector caching mechanism to avoid redundant embedding operations. Cached vectors are stored in the vector-cache/ directory with file-path-based UUIDs as keys.
Cache Structure: Each cached file contains an array of chunk arrays, preserving the batch structure for efficient insertion:
Cache Benefits:
Sources: server/utils/vectorDbProviders/lance/index.js313-334 server/utils/vectorDbProviders/chroma/index.js215-253 server/utils/files/index.js1-100
The DocumentVectors database table maintains bidirectional relationships between documents and their vector IDs, enabling efficient document deletion and updates.
Table Schema:
Mapping Flow:
Deletion Implementation Example:
Sources: server/utils/vectorDbProviders/lance/index.js280-299 server/utils/vectorDbProviders/chroma/index.js344-361 server/utils/vectorDbProviders/qdrant/index.js332-349 server/models/vectors.js1-100
Each provider requires specific environment variables for configuration. The system validates these during startup and when changing vector database providers.
Common Configuration Pattern:
| Provider | Required Variables | Optional Variables |
|---|---|---|
| LanceDB | VECTOR_DB=lance | STORAGE_DIR (custom storage path) |
| Chroma | VECTOR_DB=chroma | CHROMA_ENDPOINT, CHROMA_API_HEADER, CHROMA_API_KEY |
| Qdrant | VECTOR_DB=qdrant, QDRANT_ENDPOINT | QDRANT_API_KEY |
| Pinecone | VECTOR_DB=pinecone, PINECONE_API_KEY, PINECONE_INDEX | - |
| Milvus | VECTOR_DB=milvus, MILVUS_ADDRESS, MILVUS_USERNAME, MILVUS_PASSWORD | - |
| Zilliz | VECTOR_DB=zilliz, ZILLIZ_ENDPOINT, ZILLIZ_API_TOKEN | - |
| Weaviate | VECTOR_DB=weaviate, WEAVIATE_ENDPOINT | WEAVIATE_API_KEY |
| AstraDB | VECTOR_DB=astra, ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_ENDPOINT | - |
Vector Database Reset:
When switching vector database providers via the configuration system, the handleVectorStoreReset() post-update hook purges all existing vectors to prevent data corruption. See Settings Validation and Lifecycle Hooks for details.
Sources: server/utils/vectorDbProviders/lance/index.js22-27 server/utils/vectorDbProviders/chroma/index.js67-91 server/utils/vectorDbProviders/qdrant/index.js19-37 server/utils/vectorDbProviders/pinecone/index.js19-32
All providers implement administrative operations for namespace management and database statistics.
Method: "namespace-stats"(reqBody)
Returns comprehensive statistics about a namespace including vector count and provider-specific metadata.
Method: "delete-namespace"(reqBody)
Deletes an entire namespace and all its vectors. Returns confirmation message with vector count.
Method: reset()
Completely wipes all data from the vector database. Used during testing and when switching providers.
LanceDB Implementation:
Chroma Implementation:
Sources: server/utils/vectorDbProviders/lance/index.js458-487 server/utils/vectorDbProviders/chroma/index.js409-438 server/utils/vectorDbProviders/qdrant/index.js391-423
All vector database providers integrate with the TextSplitter class for consistent document chunking before vectorization. The text splitter respects both system configuration and embedder model limitations.
Text Splitter Initialization:
Chunk Size Determination:
The determineMaxChunkSize() static method ensures the preferred chunk size does not exceed the embedder model's maximum:
Metadata Header Injection: Document metadata is formatted as XML and prepended to each chunk:
Sources: server/utils/vectorDbProviders/lance/index.js341-355 server/utils/TextSplitter/index.js21-207
Refresh this wiki