This document explains how AnythingLLM performs similarity search against vector databases to retrieve relevant document chunks for RAG (Retrieval-Augmented Generation) operations. It covers the common performSimilaritySearch interface implemented by all vector database providers, the parameters that control search behavior (similarity threshold, topN, filter identifiers), and LanceDB's unique reranking capability that improves retrieval quality.
For information about document vectorization and insertion into vector databases, see Document Vectorization Pipeline. For details about text splitting and chunking strategies, see Text Splitting and Chunking.
All vector database providers in AnythingLLM implement a common interface for similarity search operations. The system uses cosine similarity as the distance metric across all providers, ensuring consistent behavior regardless of which vector database is configured.
Sources: server/utils/vectorDbProviders/lance/index.js406-456 server/utils/vectorDbProviders/chroma/index.js363-407 server/utils/vectorDbProviders/qdrant/index.js351-389
The performSimilaritySearch method accepts standardized parameters across all vector database providers:
| Parameter | Type | Default | Purpose |
|---|---|---|---|
namespace | string | required | The workspace/collection to search in |
input | string | required | The user's query text (plain text) |
LLMConnector | object | required | Used to generate query embeddings via embedTextInput() |
similarityThreshold | number | 0.25 | Minimum similarity score (0-1) to include results |
topN | number | 4 | Maximum number of results to return |
filterIdentifiers | string[] | [] | Source identifiers to exclude (pinned documents) |
rerank | boolean | false | Enable reranking (LanceDB only) |
The similarityThreshold filters out low-quality matches. Each provider converts its native distance metric to a normalized similarity score (0-1 scale) where higher values indicate better matches:
Sources: server/utils/vectorDbProviders/lance/index.js39-44 server/utils/vectorDbProviders/chroma/index.js112-117
Each vector database provider implements the similarity search interface differently based on the underlying database's query API, but all follow the same logical flow:
Sources: server/utils/vectorDbProviders/lance/index.js172-213 server/utils/vectorDbProviders/chroma/index.js125-166 server/utils/vectorDbProviders/qdrant/index.js62-101
Sources: server/utils/vectorDbProviders/lance/index.js187-191
Sources: server/utils/vectorDbProviders/chroma/index.js133-145
Sources: server/utils/vectorDbProviders/qdrant/index.js76-80
Sources: server/utils/vectorDbProviders/pinecone/index.js64-69
The filterIdentifiers parameter prevents duplicate context when documents are both pinned and retrieved via similarity search. The system uses the sourceIdentifier() helper to generate consistent identifiers for documents:
The filtering prevents situations where a pinned document appears twice in the final context, wasting tokens and potentially confusing the LLM.
Example Filter Logic:
Sources: server/utils/vectorDbProviders/lance/index.js193-210 server/utils/vectorDbProviders/chroma/index.js147-163 server/utils/vectorDbProviders/qdrant/index.js82-98
LanceDB is the only vector database provider that supports native reranking, which improves retrieval quality by re-scoring initial results using a more sophisticated scoring mechanism.
The reranking process deliberately retrieves more results than requested to give the reranker a larger candidate pool:
Rationale:
Performance Benchmarks:
Sources: server/utils/vectorDbProviders/lance/index.js106-122
The NativeEmbeddingReranker class (imported from server/utils/EmbeddingRerankers/native) applies cross-encoder scoring to re-rank the initial results based on semantic similarity between the query and each document.
Sources: server/utils/vectorDbProviders/lance/index.js129-156 server/utils/vectorDbProviders/lance/index.js8
Reranking is enabled by passing rerank: true to performSimilaritySearch:
Sources: server/utils/vectorDbProviders/lance/index.js428-445
After retrieving similarity search results, the curateSources() method formats the raw database responses into a consistent structure for consumption by the chat engine:
Example Implementation:
The curation step removes internal database fields (vector, _distance, etc.) while preserving user-facing metadata like title, url, published, and score.
Sources: server/utils/vectorDbProviders/lance/index.js489-503 server/utils/vectorDbProviders/chroma/index.js440-455
The following diagram shows the complete end-to-end flow of a similarity search operation from the chat engine to returning formatted sources:
Key Observations:
Query Vector Generation: The LLM connector's embedTextInput() method must match the same embedding model used during document ingestion to ensure semantic alignment.
Namespace Validation: All providers check namespace existence before attempting queries to return early with clear error messages.
Reranking Trade-off: LanceDB retrieves 10-50 candidates initially (vs. the requested topN=4), performs reranking, then filters down to the final topN. This increases latency but improves result quality.
Consistent Output Format: Despite different database APIs, all providers return the same structure: {contextTexts, sources, message}.
Sources: server/utils/vectorDbProviders/lance/index.js406-456 server/utils/vectorDbProviders/chroma/index.js363-407 server/utils/vectorDbProviders/qdrant/index.js351-389 server/utils/vectorDbProviders/pinecone/index.js263-298
All vector database providers use cosine distance as their similarity metric, but the way they report and normalize scores varies:
| Provider | Native Metric | Range | Normalization |
|---|---|---|---|
| LanceDB | Cosine distance | [0, 2] | 1 - distance |
| Chroma | Cosine distance | [0, 2] | 1 - distance |
| Qdrant | Cosine similarity | [0, 1] | Direct (no conversion) |
| Pinecone | Cosine similarity | [0, 1] | Direct (no conversion) |
| Milvus | Cosine similarity | [0, 1] | Direct (no conversion) |
| Weaviate | Certainty | [0, 1] | Direct (no conversion) |
| AstraDB | Cosine similarity | [0, 1] | Direct (via $similarity) |
The distanceToSimilarity() method ensures all scores are normalized to [0, 1] where 1 represents perfect similarity:
This normalization allows the similarityThreshold parameter to work consistently across all providers.
Sources: server/utils/vectorDbProviders/lance/index.js39-44 server/utils/vectorDbProviders/chroma/index.js112-117 server/utils/vectorDbProviders/qdrant/index.js62-101 server/utils/vectorDbProviders/pinecone/index.js50-89
Different providers use different batch sizes for vector queries:
topN (4 by default)nResults: topNlimit: topNtopK: topNlimit: topNReranking adds significant computational overhead:
The system caps reranking at 50 documents maximum to balance quality vs. performance on large workspaces.
The vector search results are NOT cached—each chat request performs a fresh similarity search. This ensures:
However, the query embeddings may be cached by the LLM connector if the same query is repeated within a session.
Sources: server/utils/vectorDbProviders/lance/index.js106-122
All similarity search implementations follow consistent error handling patterns:
This early return prevents database query errors when workspaces have no documents.
Parameter validation ensures all required inputs are present before proceeding with expensive operations.
Database-specific errors (connection failures, timeout, etc.) propagate up to the chat engine, which handles them gracefully by returning error messages to the user.
Sources: server/utils/vectorDbProviders/lance/index.js415-425 server/utils/vectorDbProviders/chroma/index.js371-382 server/utils/vectorDbProviders/qdrant/index.js359-369
Refresh this wiki