Similarity Search and Reranking

Relevant source files

Purpose and Scope

This document explains how AnythingLLM performs similarity search against vector databases to retrieve relevant document chunks for RAG (Retrieval-Augmented Generation) operations. It covers the common performSimilaritySearch interface implemented by all vector database providers, the parameters that control search behavior (similarity threshold, topN, filter identifiers), and LanceDB's unique reranking capability that improves retrieval quality.

For information about document vectorization and insertion into vector databases, see Document Vectorization Pipeline. For details about text splitting and chunking strategies, see Text Splitting and Chunking.

Similarity Search Architecture

All vector database providers in AnythingLLM implement a common interface for similarity search operations. The system uses cosine similarity as the distance metric across all providers, ensuring consistent behavior regardless of which vector database is configured.

Common Interface Flow

Sources: server/utils/vectorDbProviders/lance/index.js406-456 server/utils/vectorDbProviders/chroma/index.js363-407 server/utils/vectorDbProviders/qdrant/index.js351-389

Core Search Parameters

The performSimilaritySearch method accepts standardized parameters across all vector database providers:

Parameter	Type	Default	Purpose
`namespace`	string	required	The workspace/collection to search in
`input`	string	required	The user's query text (plain text)
`LLMConnector`	object	required	Used to generate query embeddings via `embedTextInput()`
`similarityThreshold`	number	0.25	Minimum similarity score (0-1) to include results
`topN`	number	4	Maximum number of results to return
`filterIdentifiers`	string[]	[]	Source identifiers to exclude (pinned documents)
`rerank`	boolean	false	Enable reranking (LanceDB only)

Similarity Threshold Behavior

The similarityThreshold filters out low-quality matches. Each provider converts its native distance metric to a normalized similarity score (0-1 scale) where higher values indicate better matches:

Sources: server/utils/vectorDbProviders/lance/index.js39-44 server/utils/vectorDbProviders/chroma/index.js112-117

Vector Database Implementations

Each vector database provider implements the similarity search interface differently based on the underlying database's query API, but all follow the same logical flow:

Standard Similarity Response Pattern

Sources: server/utils/vectorDbProviders/lance/index.js172-213 server/utils/vectorDbProviders/chroma/index.js125-166 server/utils/vectorDbProviders/qdrant/index.js62-101

Provider-Specific Query Examples

Filter Identifiers System

The filterIdentifiers parameter prevents duplicate context when documents are both pinned and retrieved via similarity search. The system uses the sourceIdentifier() helper to generate consistent identifiers for documents:

The filtering prevents situations where a pinned document appears twice in the final context, wasting tokens and potentially confusing the LLM.

Example Filter Logic:

Sources: server/utils/vectorDbProviders/lance/index.js193-210 server/utils/vectorDbProviders/chroma/index.js147-163 server/utils/vectorDbProviders/qdrant/index.js82-98

LanceDB Reranking

LanceDB is the only vector database provider that supports native reranking, which improves retrieval quality by re-scoring initial results using a more sophisticated scoring mechanism.

Reranking Architecture

Search Limit Calculation

The reranking process deliberately retrieves more results than requested to give the reranker a larger candidate pool:

Rationale:

Minimum of 10: Ensures sufficient candidates even for small workspaces
Maximum of 50: Prevents excessive computational cost on large workspaces
10% of total: Scales with workspace size up to the cap

Performance Benchmarks:

Intel Mac (2.6 GHz 6-Core Intel Core i7): ~5.2 seconds to rerank 20 documents

Sources: server/utils/vectorDbProviders/lance/index.js106-122

Reranker Integration

The NativeEmbeddingReranker class (imported from server/utils/EmbeddingRerankers/native) applies cross-encoder scoring to re-rank the initial results based on semantic similarity between the query and each document.

Sources: server/utils/vectorDbProviders/lance/index.js129-156 server/utils/vectorDbProviders/lance/index.js8

Enabling Reranking

Reranking is enabled by passing rerank: true to performSimilaritySearch:

Sources: server/utils/vectorDbProviders/lance/index.js428-445

Source Curation

After retrieving similarity search results, the curateSources() method formats the raw database responses into a consistent structure for consumption by the chat engine:

Curation Process

Example Implementation:

The curation step removes internal database fields (vector, _distance, etc.) while preserving user-facing metadata like title, url, published, and score.

Sources: server/utils/vectorDbProviders/lance/index.js489-503 server/utils/vectorDbProviders/chroma/index.js440-455

Complete Similarity Search Flow

The following diagram shows the complete end-to-end flow of a similarity search operation from the chat engine to returning formatted sources:

Key Observations:

Query Vector Generation: The LLM connector's embedTextInput() method must match the same embedding model used during document ingestion to ensure semantic alignment.
Namespace Validation: All providers check namespace existence before attempting queries to return early with clear error messages.
Reranking Trade-off: LanceDB retrieves 10-50 candidates initially (vs. the requested topN=4), performs reranking, then filters down to the final topN. This increases latency but improves result quality.
Consistent Output Format: Despite different database APIs, all providers return the same structure: {contextTexts, sources, message}.

Sources: server/utils/vectorDbProviders/lance/index.js406-456 server/utils/vectorDbProviders/chroma/index.js363-407 server/utils/vectorDbProviders/qdrant/index.js351-389 server/utils/vectorDbProviders/pinecone/index.js263-298

Distance Metrics and Normalization

All vector database providers use cosine distance as their similarity metric, but the way they report and normalize scores varies:

Provider	Native Metric	Range	Normalization
LanceDB	Cosine distance	[0, 2]	`1 - distance`
Chroma	Cosine distance	[0, 2]	`1 - distance`
Qdrant	Cosine similarity	[0, 1]	Direct (no conversion)
Pinecone	Cosine similarity	[0, 1]	Direct (no conversion)
Milvus	Cosine similarity	[0, 1]	Direct (no conversion)
Weaviate	Certainty	[0, 1]	Direct (no conversion)
AstraDB	Cosine similarity	[0, 1]	Direct (via `$similarity`)

The distanceToSimilarity() method ensures all scores are normalized to [0, 1] where 1 represents perfect similarity:

This normalization allows the similarityThreshold parameter to work consistently across all providers.

Sources: server/utils/vectorDbProviders/lance/index.js39-44 server/utils/vectorDbProviders/chroma/index.js112-117 server/utils/vectorDbProviders/qdrant/index.js62-101 server/utils/vectorDbProviders/pinecone/index.js50-89

Performance Considerations

Search Limit Optimization

Different providers use different batch sizes for vector queries:

LanceDB (standard): topN (4 by default)
LanceDB (reranking): Dynamic 10-50 based on workspace size
Chroma: nResults: topN
Qdrant: limit: topN
Pinecone: topK: topN
Milvus: limit: topN

Reranking Cost

Reranking adds significant computational overhead:

Initial search: Retrieves 10-50x more results than needed
Reranking computation: Cross-encoder inference on all candidates
Filtering: Post-rerank threshold and identifier filtering

The system caps reranking at 50 documents maximum to balance quality vs. performance on large workspaces.

Caching Implications

The vector search results are NOT cached—each chat request performs a fresh similarity search. This ensures:

Real-time reflection of document updates
Dynamic filtering based on pinned documents
Fresh context for each query

However, the query embeddings may be cached by the LLM connector if the same query is repeated within a session.

Sources: server/utils/vectorDbProviders/lance/index.js106-122

Error Handling

All similarity search implementations follow consistent error handling patterns:

Namespace Missing

This early return prevents database query errors when workspaces have no documents.

Invalid Parameters

Parameter validation ensures all required inputs are present before proceeding with expensive operations.

Query Failures

Database-specific errors (connection failures, timeout, etc.) propagate up to the chat engine, which handles them gracefully by returning error messages to the user.

Sources: server/utils/vectorDbProviders/lance/index.js415-425 server/utils/vectorDbProviders/chroma/index.js371-382 server/utils/vectorDbProviders/qdrant/index.js359-369

Similarity Search and Reranking

Relevant source files

Purpose and Scope

Similarity Search Architecture

Common Interface Flow

Sources: server/utils/vectorDbProviders/lance/index.js406-456 server/utils/vectorDbProviders/chroma/index.js363-407 server/utils/vectorDbProviders/qdrant/index.js351-389

Core Search Parameters

The performSimilaritySearch method accepts standardized parameters across all vector database providers:

Parameter	Type	Default	Purpose
`namespace`	string	required	The workspace/collection to search in
`input`	string	required	The user's query text (plain text)
`LLMConnector`	object	required	Used to generate query embeddings via `embedTextInput()`
`similarityThreshold`	number	0.25	Minimum similarity score (0-1) to include results
`topN`	number	4	Maximum number of results to return
`filterIdentifiers`	string[]	[]	Source identifiers to exclude (pinned documents)
`rerank`	boolean	false	Enable reranking (LanceDB only)

Similarity Threshold Behavior

The similarityThreshold filters out low-quality matches. Each provider converts its native distance metric to a normalized similarity score (0-1 scale) where higher values indicate better matches:

Sources: server/utils/vectorDbProviders/lance/index.js39-44 server/utils/vectorDbProviders/chroma/index.js112-117

Vector Database Implementations

Each vector database provider implements the similarity search interface differently based on the underlying database's query API, but all follow the same logical flow:

Standard Similarity Response Pattern

Sources: server/utils/vectorDbProviders/lance/index.js172-213 server/utils/vectorDbProviders/chroma/index.js125-166 server/utils/vectorDbProviders/qdrant/index.js62-101

Provider-Specific Query Examples

LanceDB (Default Provider)

Sources: server/utils/vectorDbProviders/lance/index.js187-191

Chroma

Sources: server/utils/vectorDbProviders/chroma/index.js133-145

Qdrant

Sources: server/utils/vectorDbProviders/qdrant/index.js76-80

Pinecone

Sources: server/utils/vectorDbProviders/pinecone/index.js64-69

Filter Identifiers System

The filtering prevents situations where a pinned document appears twice in the final context, wasting tokens and potentially confusing the LLM.

Example Filter Logic:

Sources: server/utils/vectorDbProviders/lance/index.js193-210 server/utils/vectorDbProviders/chroma/index.js147-163 server/utils/vectorDbProviders/qdrant/index.js82-98

LanceDB Reranking

LanceDB is the only vector database provider that supports native reranking, which improves retrieval quality by re-scoring initial results using a more sophisticated scoring mechanism.

Reranking Architecture

Search Limit Calculation

The reranking process deliberately retrieves more results than requested to give the reranker a larger candidate pool:

Rationale:

Minimum of 10: Ensures sufficient candidates even for small workspaces
Maximum of 50: Prevents excessive computational cost on large workspaces
10% of total: Scales with workspace size up to the cap

Performance Benchmarks:

Intel Mac (2.6 GHz 6-Core Intel Core i7): ~5.2 seconds to rerank 20 documents

Sources: server/utils/vectorDbProviders/lance/index.js106-122

Reranker Integration

Sources: server/utils/vectorDbProviders/lance/index.js129-156 server/utils/vectorDbProviders/lance/index.js8

Enabling Reranking

Reranking is enabled by passing rerank: true to performSimilaritySearch:

Sources: server/utils/vectorDbProviders/lance/index.js428-445

Source Curation

After retrieving similarity search results, the curateSources() method formats the raw database responses into a consistent structure for consumption by the chat engine:

Curation Process

Example Implementation:

The curation step removes internal database fields (vector, _distance, etc.) while preserving user-facing metadata like title, url, published, and score.

Sources: server/utils/vectorDbProviders/lance/index.js489-503 server/utils/vectorDbProviders/chroma/index.js440-455

Complete Similarity Search Flow

The following diagram shows the complete end-to-end flow of a similarity search operation from the chat engine to returning formatted sources:

Key Observations:

Query Vector Generation: The LLM connector's embedTextInput() method must match the same embedding model used during document ingestion to ensure semantic alignment.
Namespace Validation: All providers check namespace existence before attempting queries to return early with clear error messages.
Reranking Trade-off: LanceDB retrieves 10-50 candidates initially (vs. the requested topN=4), performs reranking, then filters down to the final topN. This increases latency but improves result quality.
Consistent Output Format: Despite different database APIs, all providers return the same structure: {contextTexts, sources, message}.

Distance Metrics and Normalization

All vector database providers use cosine distance as their similarity metric, but the way they report and normalize scores varies:

Provider	Native Metric	Range	Normalization
LanceDB	Cosine distance	[0, 2]	`1 - distance`
Chroma	Cosine distance	[0, 2]	`1 - distance`
Qdrant	Cosine similarity	[0, 1]	Direct (no conversion)
Pinecone	Cosine similarity	[0, 1]	Direct (no conversion)
Milvus	Cosine similarity	[0, 1]	Direct (no conversion)
Weaviate	Certainty	[0, 1]	Direct (no conversion)
AstraDB	Cosine similarity	[0, 1]	Direct (via `$similarity`)

The distanceToSimilarity() method ensures all scores are normalized to [0, 1] where 1 represents perfect similarity:

This normalization allows the similarityThreshold parameter to work consistently across all providers.

Performance Considerations

Search Limit Optimization

Different providers use different batch sizes for vector queries:

LanceDB (standard): topN (4 by default)
LanceDB (reranking): Dynamic 10-50 based on workspace size
Chroma: nResults: topN
Qdrant: limit: topN
Pinecone: topK: topN
Milvus: limit: topN

Reranking Cost

Reranking adds significant computational overhead:

Initial search: Retrieves 10-50x more results than needed
Reranking computation: Cross-encoder inference on all candidates
Filtering: Post-rerank threshold and identifier filtering

The system caps reranking at 50 documents maximum to balance quality vs. performance on large workspaces.

Caching Implications

The vector search results are NOT cached—each chat request performs a fresh similarity search. This ensures:

Real-time reflection of document updates
Dynamic filtering based on pinned documents
Fresh context for each query

However, the query embeddings may be cached by the LLM connector if the same query is repeated within a session.

Sources: server/utils/vectorDbProviders/lance/index.js106-122

Error Handling

All similarity search implementations follow consistent error handling patterns:

Namespace Missing

This early return prevents database query errors when workspaces have no documents.

Invalid Parameters

Parameter validation ensures all required inputs are present before proceeding with expensive operations.

Query Failures

Database-specific errors (connection failures, timeout, etc.) propagate up to the chat engine, which handles them gracefully by returning error messages to the user.

Sources: server/utils/vectorDbProviders/lance/index.js415-425 server/utils/vectorDbProviders/chroma/index.js371-382 server/utils/vectorDbProviders/qdrant/index.js359-369

Similarity Search and Reranking

Purpose and Scope

Similarity Search Architecture

Common Interface Flow

Core Search Parameters

Similarity Threshold Behavior

Vector Database Implementations

Standard Similarity Response Pattern

Provider-Specific Query Examples

LanceDB (Default Provider)

Chroma

Qdrant

Pinecone

Filter Identifiers System

LanceDB Reranking

Reranking Architecture

Search Limit Calculation

Reranker Integration

Enabling Reranking

Source Curation

Curation Process

Complete Similarity Search Flow

Distance Metrics and Normalization

Performance Considerations

Search Limit Optimization

Reranking Cost

Caching Implications

Error Handling

Namespace Missing

Invalid Parameters

Query Failures

On this page

Similarity Search and Reranking

Purpose and Scope

Similarity Search Architecture

Common Interface Flow

Core Search Parameters

Similarity Threshold Behavior

Vector Database Implementations

Standard Similarity Response Pattern

Provider-Specific Query Examples

LanceDB (Default Provider)

Chroma

Qdrant

Pinecone

Filter Identifiers System

LanceDB Reranking

Reranking Architecture

Search Limit Calculation

Reranker Integration

Enabling Reranking

Source Curation

Curation Process

Complete Similarity Search Flow

Distance Metrics and Normalization

Performance Considerations

Search Limit Optimization

Reranking Cost

Caching Implications

Error Handling

Namespace Missing

Invalid Parameters

Query Failures

On this page