Supported Integrations

Relevant source files

AnythingLLM supports 30+ LLM providers, 10+ vector databases, multiple embedding engines, and 10+ data connectors through standardized interfaces. Each integration implements a common interface pattern while handling provider-specific configurations.

LLM Providers (30+)

AnythingLLM supports 30+ LLM providers through a unified provider interface. Each provider class implements core methods: getChatCompletion(), streamGetChatCompletion(), handleStream(), constructPrompt(), embedTextInput(), embedChunks(), promptWindowLimit(), and compressMessages().

LLM Provider Architecture

Sources: server/utils/AiProviders/openAi/index.js13-292 server/utils/AiProviders/gemini/index.js27-447 server/utils/AiProviders/anthropic/index.js13-275 server/utils/AiProviders/ollama/index.js13-408

Complete LLM Provider List

Provider	Class Name	Location	Model Discovery	Special Features
OpenAI	`OpenAiLLM`	server/utils/AiProviders/openAi/	Dynamic via API	O-type model support, native response API
Anthropic	`AnthropicLLM`	server/utils/AiProviders/anthropic/	Static list	Prompt caching (5m/1h TTL)
Google Gemini	`GeminiLLM`	server/utils/AiProviders/gemini/	Dynamic (v1 + v1beta APIs)	Experimental model support, no system prompt for Gemma models
Azure OpenAI	`AzureOpenAiLLM`	server/utils/AiProviders/azureOpenAi/	User-defined deployments	Reasoning model detection via ENV flag
Ollama	`OllamaAILLM`	server/utils/AiProviders/ollama/	Local model list	Reasoning token wrapping, custom timeout support
LM Studio	`LMStudioLLM`	server/utils/AiProviders/lmStudio/	Local API discovery	Context window caching
LocalAI	`LocalAiLLM`	server/utils/AiProviders/localAi/	User-configured	OpenAI-compatible interface
Together AI	`TogetherAiLLM`	server/utils/AiProviders/togetherAi/	Cached model list (1 week)	Organization-grouped models
Mistral	`MistralLLM`	server/utils/AiProviders/mistral/	Static defaults	OpenAI-compatible API
Groq	`GroqLLM`	server/utils/AiProviders/groq/	API-based	Fast inference
Cohere	`CohereLLM`	server/utils/AiProviders/cohere/	Static: command-r, command-r-plus, etc.	Native Cohere SDK
AWS Bedrock	`AWSBedrockLLM`	server/utils/AiProviders/bedrock/	AWS model catalog	Multiple model families
NVIDIA NIM	`NvidiaIMLLM`	server/utils/AiProviders/nvidia-nim/	API-based	NVIDIA inference
Fireworks AI	`FireworksAILLM`	server/utils/AiProviders/fireworksai/	Organization-grouped	Fast model serving
HuggingFace	`HuggingFaceLLM`	server/utils/AiProviders/huggingface/	Inference endpoint	TGI-compatible
xAI	`xAILLM`	server/utils/AiProviders/xai/	Static: grok-beta	Grok models
Generic OpenAI	`GenericOpenAILLM`	server/utils/AiProviders/generic-openai/	User-configured	Custom OpenAI-compatible endpoints
Text Generation WebUI	`TextGenWebUILLM`	server/utils/AiProviders/textgenwebui/	Local endpoint	Oobabooga interface
OpenRouter	`OpenRouterLLM`	server/utils/AiProviders/openrouter/	Organization-grouped	Multi-provider routing
Perplexity	`PerplexityLLM`	server/utils/AiProviders/perplexity/	API-based	Search-augmented models
Novita	`NovitaLLM`	server/utils/AiProviders/novita/	Organization-grouped	Cloud inference
PPIO	`PPIOLLLM`	server/utils/AiProviders/ppio/	Organization-grouped	Distributed inference
SambaNova	`SambaNovaLLM`	server/utils/AiProviders/sambanova/	API-based	High-performance inference
Docker Model Runner	`DockerModelRunnerLLM`	server/utils/AiProviders/docker-model-runner/	Local container	Containerized models

Frontend Integration Constants:

DISABLED_PROVIDERS: ["azure", "textgenwebui", "generic-openai", "bedrock"] - No model selection UI
groupedProviders: ["togetherai", "fireworksai", "openai", "novita", "openrouter", "ppio", "docker-model-runner", "sambanova"] - Models organized by creator organization
PROVIDER_DEFAULT_MODELS: Hardcoded model lists for providers without dynamic discovery

Sources: frontend/src/hooks/useGetProvidersModels.js5-10 frontend/src/hooks/useGetProvidersModels.js11-35 frontend/src/hooks/useGetProvidersModels.js48-57 server/utils/AiProviders/ollama/index.js1-489 server/utils/AiProviders/gemini/index.js20-25 server/utils/AiProviders/anthropic/index.js63-86

Model Selection Pattern

Model discovery uses the useGetProvidersModels hook and System.customModels() API:

Sources: frontend/src/hooks/useGetProvidersModels.js56-84 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx64-85 server/utils/AiProviders/gemini/index.js81-89 server/utils/AiProviders/gemini/index.js171-301

Vector Databases (10+)

AnythingLLM supports 10+ vector database providers through a unified VectorDatabase base class interface. Each provider implements connection, namespace management, document operations, and similarity search methods.

Vector Database Interface

Sources: server/utils/vectorDbProviders/lance/index.js15-477 server/utils/vectorDbProviders/chroma/index.js13-433 server/utils/vectorDbProviders/pinecone/index.js9-294 server/utils/vectorDbProviders/qdrant/index.js9-405

Vector Database Provider Details

Provider	Class Name	Connection Method	Client SDK	Namespace Pattern	Storage Type
LanceDB	`LanceDb`	`lancedb.connect(uri)`	`@lancedb/lancedb`	Table names	Local file-based (`${STORAGE_DIR}/lancedb`)
ChromaDB	`Chroma`	`new ChromaClient({path, headers})`	`chromadb`	`normalize()` (3-63 chars, alphanumeric+`-_`)	HTTP client (local/remote)
Pinecone	`PineconeDB`	`new Pinecone({apiKey})`	`@pinecone-database/pinecone`	Built-in namespaces	Cloud-hosted index
Qdrant	`QDrant`	`new QdrantClient({url, apiKey})`	`@qdrant/js-client-rest`	Collections	REST API (local/cloud)
Milvus	`Milvus`	`new MilvusClient({address, username, password})`	`@zilliz/milvus2-sdk-node`	`normalize()` (alphanumeric+`_`)	gRPC (self-hosted)
Zilliz	`Zilliz`	`new MilvusClient({address, token})`	`@zilliz/milvus2-sdk-node`	`normalize()` (alphanumeric+`_`)	Cloud Milvus
Weaviate	`Weaviate`	`weaviate.client(options)`	`weaviate-ts-client`	`camelCase()` classes	GraphQL API (local/cloud)
AstraDB	`AstraDB`	`new AstraClient(token, endpoint)`	`@datastax/astra-db-ts`	`sanitizeNamespace()` (`ns_` prefix)	Cloud Cassandra

Special Features by Provider:

LanceDB: Native reranking support via NativeEmbeddingReranker, local file storage with no external service required
ChromaDB: Collection name validation regex /^(?!\d+\.\d+\.\d+\.\d+$)(?!.*\.\.)(?=^[a-zA-Z0-9][a-zA-Z0-9_-]{1,61}[a-zA-Z0-9]$).{3,63}$/, batch operations via smartAdd()/smartDelete()
Pinecone: Serverless indexes, automatic scaling, namespace-based multi-tenancy
Qdrant: Collection creation with dimension inference, VarChar ID support
Milvus/Zilliz: Auto-indexing (IndexType.AUTOINDEX), dimension-based collection creation, requires explicit flushing
Weaviate: Class-based schema with automatic property detection, GraphQL query interface, batch operations
AstraDB: isRealCollection() validation check, DataStax-specific authentication

Normalization Methods:

Chroma.normalize(): server/utils/vectorDbProviders/chroma/index.js31-64
Milvus.normalize(): server/utils/vectorDbProviders/milvus/index.js28-33
sanitizeNamespace(): server/utils/vectorDbProviders/astra/index.js10-16

Sources: server/utils/vectorDbProviders/lance/index.js17-20 server/utils/vectorDbProviders/chroma/index.js14-21 server/utils/vectorDbProviders/pinecone/index.js10-17 server/utils/vectorDbProviders/qdrant/index.js10-17 server/utils/vectorDbProviders/milvus/index.js15-22 server/utils/vectorDbProviders/zilliz/index.js8-15 server/utils/vectorDbProviders/weaviate/index.js11-18 server/utils/vectorDbProviders/astra/index.js31-38

Document Storage Flow

Sources: server/utils/vectorDbProviders/lance/index.js279-382 server/utils/vectorDbProviders/chroma/index.js184-325 server/utils/TextSplitter/index.js21-170

Embedding Engines

Embedding engines convert text into vector representations. Each LLM provider includes an embedder (defaults to NativeEmbedder). Embedders are selected via getEmbeddingEngineSelection() and expose embeddingMaxChunkLength and embeddingPrefix properties used by TextSplitter.

Supported Embedding Engines

Engine	Class/Module	Provider	Key Methods	Configuration
Native Embedder	`NativeEmbedder`	Built-in (Xenova/transformers.js)	`embedTextInput()`, `embedChunks()`	`embeddingMaxChunkLength`, `embeddingPrefix`
OpenAI Embeddings	Integrated in `OpenAiLLM`	OpenAI API	`embedTextInput()`, `embedChunks()`	Paired with LLM provider
Azure Embeddings	Integrated in `AzureOpenAiLLM`	Azure OpenAI	`embedTextInput()`, `embedChunks()`	Paired with LLM provider
Ollama Embeddings	Integrated in `OllamaAILLM`	Local Ollama	`embedTextInput()`, `embedChunks()`	Paired with LLM provider
Cohere Embeddings	Integrated in `CohereLLM`	Cohere API	`embedTextInput()`, `embedChunks()`	Paired with LLM provider
LMStudio Embeddings	Integrated in `LMStudioLLM`	Local LMStudio	`embedTextInput()`, `embedChunks()`	Paired with LLM provider

Embedder Selection: Accessed via getEmbeddingEngineSelection() which reads EMBEDDING_ENGINE and EMBEDDING_MODEL_PREF environment variables. Defaults to NativeEmbedder if not specified.

Integration with TextSplitter:

Sources: server/utils/AiProviders/openAi/index.js29 server/utils/AiProviders/ollama/index.js38 server/utils/AiProviders/gemini/index.js52 server/utils/vectorDbProviders/lance/index.js340-354 server/utils/TextSplitter/index.js47-57

Data Connectors (10+)

AnythingLLM supports 10+ data source integrations through the Collector service. Each connector fetches content and normalizes it into documentData format with pageContent, docId, and metadata fields.

Supported Data Connectors

Connector	Frontend Name	API Endpoint	Key Features	Resync Support
GitHub	`github`	`/ext/github-repo`	Branch selection, `.gitignore` filtering, issue/PR fetching	Yes
GitLab	`gitlab`	`/ext/gitlab-repo`	Branch selection, issue/wiki fetching, access token auth	Yes
YouTube	`youtube`	`/ext/youtube-transcript`	Transcript extraction, video metadata	Yes
Website Depth	`websiteDepth`	`/ext/website-depth`	Configurable depth crawling, max link limits	No
Confluence	`confluence`	`/ext/confluence`	Cloud/Server support, space-based fetching, SSL bypass	Yes
DrupalWiki	`drupalwiki`	`/ext/drupalwiki`	Space ID filtering, custom auth	No
Obsidian Vault	`obsidian`	`/ext/obsidian-vault`	Vault structure preservation, markdown parsing	No
PaperlessNgx	`paperlessNgx`	`/ext/paperless-ngx`	Document management system integration	No
File Upload	N/A	`/system/upload-document`	20+ file formats (PDF, DOCX, images, audio)	No
Link Scraping	N/A	`/system/process-link`	Single URL scraping with Puppeteer	Yes

Frontend Integration:

Connector metadata: frontend/src/components/Modals/ManageWorkspace/DataConnectors/index.jsx15-43
Connector images: frontend/src/components/DataConnectorOption/media/index.js1-21
API client methods: frontend/src/models/dataConnector.js1-223

Backend Endpoints:

Extension router: collector/extensions/index.js9-135
Resync methods: collector/extensions/resync/index.js1-238
Main API endpoints: server/endpoints/extensions/index.js12-102

Sources: frontend/src/components/Modals/ManageWorkspace/DataConnectors/index.jsx15-43 frontend/src/components/DataConnectorOption/media/index.js1-21 frontend/src/models/dataConnector.js6-167 collector/extensions/index.js9-135 collector/extensions/resync/index.js1-238

Citation Support by Connector

Citations in the chat UI display different icons and links based on the chunkSource metadata field:

Source Identifier	Display Icon	Link Behavior	Example `chunkSource`
`link://`	`LinkSimple`	Opens URL in new tab	`link://https://example.com/page`
`youtube://`	`YoutubeLogo`	Opens YouTube video	`youtube://https://youtube.com/watch?v=xyz`
`github://`	`GithubLogo`	Opens repository/file	`github://https://github.com/user/repo/blob/main/file.md`
`gitlab://`	`GitlabLogo`	Opens GitLab resource	`gitlab://https://gitlab.com/user/project/-/blob/main/file.md`
`confluence://`	Custom image	Opens Confluence page	`confluence://https://company.atlassian.net/wiki/spaces/...`
`drupalwiki://`	Custom image	Opens DrupalWiki page	`drupalwiki://https://wiki.example.com/...`
`obsidian://`	Custom image	Displays title only	`obsidian://vault-name/path/to/note`
`paperless-ngx://`	Custom image	Opens document URL	`paperless-ngx://https://paperless.example.com/documents/123`
Default (file)	`FileText`	No external link	Local file path

Citation Parsing Logic: frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/Citation/index.jsx212-316

Supported Source Identifiers: frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/Citation/index.jsx212-221

Sources: frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/Citation/index.jsx212-316 frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/Citation/index.jsx337-347

Connector-Specific Configuration

GitHub Connector:

Branch Selection: Fetches available branches via /ext/github-repo/branches endpoint
Ignore Patterns: Supports .gitignore-style path filtering
Authentication: Personal access token for private repositories

Confluence Connector:

Cloud vs Server: Separate authentication methods (PAT vs username/password)
SSL Bypass: Optional SSL certificate validation bypass for self-hosted instances
Space-Based: Fetches pages by Confluence space key

YouTube Connector:

Transcript Extraction: Uses YouTube transcript API
Metadata: Captures video title, author, and publication date
Source Identifier: Sets chunkSource to youtube://VIDEO_URL

Website Depth Connector:

Configurable Depth: Controls crawl depth (default: 1)
Max Links: Limits total pages scraped (default: 20)
Link Extraction: Uses Puppeteer for JavaScript-rendered content

Obsidian Vault Connector:

Vault Structure: Preserves folder hierarchy
Markdown Parsing: Native markdown file processing
No External Link: Citations show title only (no URL)

Sources: frontend/src/models/dataConnector.js7-43 frontend/src/models/dataConnector.js135-167 collector/extensions/index.js90-113 collector/extensions/index.js115-132

Document Resync Capabilities

Several connectors support document re-synchronization to update content without re-uploading:

Resync Methods

Connector	Resync Method	Implementation	Use Case
Link	`resyncLink()`	Re-scrapes URL content	Update webpage content
YouTube	`resyncYoutube()`	Re-fetches transcript	Updated transcripts (rare)
GitHub	`resyncGithubRepo()`	Re-clones specific file	Update repository file
GitLab	`resyncGitlabRepo()`	Re-clones specific file	Update repository file
Confluence	`resyncConfluence()`	Re-fetches page by ID	Update wiki page content

Resync Flow:

Resync Type Mapping: collector/extensions/resync/index.js1-238

Supported Types:

link: collector/extensions/resync/index.js8-21
youtube: collector/extensions/resync/index.js31-45
github: collector/extensions/resync/index.js55-96
gitlab: collector/extensions/resync/index.js106-147
confluence: collector/extensions/resync/index.js157-196

Sources: collector/extensions/resync/index.js1-238 collector/extensions/index.js11-30 server/endpoints/extensions/index.js12-102

Provider Status and Capabilities

Model Selection Capabilities

Different providers have varying levels of model selection support:

Sources: frontend/src/hooks/useGetProvidersModels.js4-10 frontend/src/hooks/useGetProvidersModels.js40-55 frontend/src/hooks/useGetProvidersModels.js48-55

Provider Integration Status

Provider Type	Implementation Status	Configuration Method	Model Discovery
OpenAI	Full support	API key + model selection	Dynamic API
Anthropic	Full support	API key + model selection	Static list
Gemini	Full support with experimental models	API key + model selection	Dynamic API (v1 + v1beta)
Azure OpenAI	Basic support	Endpoint + deployment	Disabled model selection
Ollama	Full support	Local connection	Model discovery
AWS Bedrock	Basic support	AWS credentials	Disabled model selection
Local providers	Full support	Endpoint configuration	Local discovery

Sources: server/utils/AiProviders/gemini/index.js171-301 frontend/src/components/LLMSelection/AnthropicAiOptions/index.jsx42-79 frontend/src/hooks/useGetProvidersModels.js5-10

Supported Integrations

Relevant source files

LLM Providers (30+)

LLM Provider Architecture

Sources: server/utils/AiProviders/openAi/index.js13-292 server/utils/AiProviders/gemini/index.js27-447 server/utils/AiProviders/anthropic/index.js13-275 server/utils/AiProviders/ollama/index.js13-408

Complete LLM Provider List

Provider	Class Name	Location	Model Discovery	Special Features
OpenAI	`OpenAiLLM`	server/utils/AiProviders/openAi/	Dynamic via API	O-type model support, native response API
Anthropic	`AnthropicLLM`	server/utils/AiProviders/anthropic/	Static list	Prompt caching (5m/1h TTL)
Google Gemini	`GeminiLLM`	server/utils/AiProviders/gemini/	Dynamic (v1 + v1beta APIs)	Experimental model support, no system prompt for Gemma models
Azure OpenAI	`AzureOpenAiLLM`	server/utils/AiProviders/azureOpenAi/	User-defined deployments	Reasoning model detection via ENV flag
Ollama	`OllamaAILLM`	server/utils/AiProviders/ollama/	Local model list	Reasoning token wrapping, custom timeout support
LM Studio	`LMStudioLLM`	server/utils/AiProviders/lmStudio/	Local API discovery	Context window caching
LocalAI	`LocalAiLLM`	server/utils/AiProviders/localAi/	User-configured	OpenAI-compatible interface
Together AI	`TogetherAiLLM`	server/utils/AiProviders/togetherAi/	Cached model list (1 week)	Organization-grouped models
Mistral	`MistralLLM`	server/utils/AiProviders/mistral/	Static defaults	OpenAI-compatible API
Groq	`GroqLLM`	server/utils/AiProviders/groq/	API-based	Fast inference
Cohere	`CohereLLM`	server/utils/AiProviders/cohere/	Static: command-r, command-r-plus, etc.	Native Cohere SDK
AWS Bedrock	`AWSBedrockLLM`	server/utils/AiProviders/bedrock/	AWS model catalog	Multiple model families
NVIDIA NIM	`NvidiaIMLLM`	server/utils/AiProviders/nvidia-nim/	API-based	NVIDIA inference
Fireworks AI	`FireworksAILLM`	server/utils/AiProviders/fireworksai/	Organization-grouped	Fast model serving
HuggingFace	`HuggingFaceLLM`	server/utils/AiProviders/huggingface/	Inference endpoint	TGI-compatible
xAI	`xAILLM`	server/utils/AiProviders/xai/	Static: grok-beta	Grok models
Generic OpenAI	`GenericOpenAILLM`	server/utils/AiProviders/generic-openai/	User-configured	Custom OpenAI-compatible endpoints
Text Generation WebUI	`TextGenWebUILLM`	server/utils/AiProviders/textgenwebui/	Local endpoint	Oobabooga interface
OpenRouter	`OpenRouterLLM`	server/utils/AiProviders/openrouter/	Organization-grouped	Multi-provider routing
Perplexity	`PerplexityLLM`	server/utils/AiProviders/perplexity/	API-based	Search-augmented models
Novita	`NovitaLLM`	server/utils/AiProviders/novita/	Organization-grouped	Cloud inference
PPIO	`PPIOLLLM`	server/utils/AiProviders/ppio/	Organization-grouped	Distributed inference
SambaNova	`SambaNovaLLM`	server/utils/AiProviders/sambanova/	API-based	High-performance inference
Docker Model Runner	`DockerModelRunnerLLM`	server/utils/AiProviders/docker-model-runner/	Local container	Containerized models

Frontend Integration Constants:

DISABLED_PROVIDERS: ["azure", "textgenwebui", "generic-openai", "bedrock"] - No model selection UI
groupedProviders: ["togetherai", "fireworksai", "openai", "novita", "openrouter", "ppio", "docker-model-runner", "sambanova"] - Models organized by creator organization
PROVIDER_DEFAULT_MODELS: Hardcoded model lists for providers without dynamic discovery

Model Selection Pattern

Model discovery uses the useGetProvidersModels hook and System.customModels() API:

Vector Databases (10+)

Vector Database Interface

Vector Database Provider Details

Provider	Class Name	Connection Method	Client SDK	Namespace Pattern	Storage Type
LanceDB	`LanceDb`	`lancedb.connect(uri)`	`@lancedb/lancedb`	Table names	Local file-based (`${STORAGE_DIR}/lancedb`)
ChromaDB	`Chroma`	`new ChromaClient({path, headers})`	`chromadb`	`normalize()` (3-63 chars, alphanumeric+`-_`)	HTTP client (local/remote)
Pinecone	`PineconeDB`	`new Pinecone({apiKey})`	`@pinecone-database/pinecone`	Built-in namespaces	Cloud-hosted index
Qdrant	`QDrant`	`new QdrantClient({url, apiKey})`	`@qdrant/js-client-rest`	Collections	REST API (local/cloud)
Milvus	`Milvus`	`new MilvusClient({address, username, password})`	`@zilliz/milvus2-sdk-node`	`normalize()` (alphanumeric+`_`)	gRPC (self-hosted)
Zilliz	`Zilliz`	`new MilvusClient({address, token})`	`@zilliz/milvus2-sdk-node`	`normalize()` (alphanumeric+`_`)	Cloud Milvus
Weaviate	`Weaviate`	`weaviate.client(options)`	`weaviate-ts-client`	`camelCase()` classes	GraphQL API (local/cloud)
AstraDB	`AstraDB`	`new AstraClient(token, endpoint)`	`@datastax/astra-db-ts`	`sanitizeNamespace()` (`ns_` prefix)	Cloud Cassandra

Special Features by Provider:

LanceDB: Native reranking support via NativeEmbeddingReranker, local file storage with no external service required
ChromaDB: Collection name validation regex /^(?!\d+\.\d+\.\d+\.\d+$)(?!.*\.\.)(?=^[a-zA-Z0-9][a-zA-Z0-9_-]{1,61}[a-zA-Z0-9]$).{3,63}$/, batch operations via smartAdd()/smartDelete()
Pinecone: Serverless indexes, automatic scaling, namespace-based multi-tenancy
Qdrant: Collection creation with dimension inference, VarChar ID support
Milvus/Zilliz: Auto-indexing (IndexType.AUTOINDEX), dimension-based collection creation, requires explicit flushing
Weaviate: Class-based schema with automatic property detection, GraphQL query interface, batch operations
AstraDB: isRealCollection() validation check, DataStax-specific authentication

Normalization Methods:

Chroma.normalize(): server/utils/vectorDbProviders/chroma/index.js31-64
Milvus.normalize(): server/utils/vectorDbProviders/milvus/index.js28-33
sanitizeNamespace(): server/utils/vectorDbProviders/astra/index.js10-16

Document Storage Flow

Sources: server/utils/vectorDbProviders/lance/index.js279-382 server/utils/vectorDbProviders/chroma/index.js184-325 server/utils/TextSplitter/index.js21-170

Embedding Engines

Supported Embedding Engines

Engine	Class/Module	Provider	Key Methods	Configuration
Native Embedder	`NativeEmbedder`	Built-in (Xenova/transformers.js)	`embedTextInput()`, `embedChunks()`	`embeddingMaxChunkLength`, `embeddingPrefix`
OpenAI Embeddings	Integrated in `OpenAiLLM`	OpenAI API	`embedTextInput()`, `embedChunks()`	Paired with LLM provider
Azure Embeddings	Integrated in `AzureOpenAiLLM`	Azure OpenAI	`embedTextInput()`, `embedChunks()`	Paired with LLM provider
Ollama Embeddings	Integrated in `OllamaAILLM`	Local Ollama	`embedTextInput()`, `embedChunks()`	Paired with LLM provider
Cohere Embeddings	Integrated in `CohereLLM`	Cohere API	`embedTextInput()`, `embedChunks()`	Paired with LLM provider
LMStudio Embeddings	Integrated in `LMStudioLLM`	Local LMStudio	`embedTextInput()`, `embedChunks()`	Paired with LLM provider

Embedder Selection: Accessed via getEmbeddingEngineSelection() which reads EMBEDDING_ENGINE and EMBEDDING_MODEL_PREF environment variables. Defaults to NativeEmbedder if not specified.

Integration with TextSplitter:

Data Connectors (10+)

Supported Data Connectors

Connector	Frontend Name	API Endpoint	Key Features	Resync Support
GitHub	`github`	`/ext/github-repo`	Branch selection, `.gitignore` filtering, issue/PR fetching	Yes
GitLab	`gitlab`	`/ext/gitlab-repo`	Branch selection, issue/wiki fetching, access token auth	Yes
YouTube	`youtube`	`/ext/youtube-transcript`	Transcript extraction, video metadata	Yes
Website Depth	`websiteDepth`	`/ext/website-depth`	Configurable depth crawling, max link limits	No
Confluence	`confluence`	`/ext/confluence`	Cloud/Server support, space-based fetching, SSL bypass	Yes
DrupalWiki	`drupalwiki`	`/ext/drupalwiki`	Space ID filtering, custom auth	No
Obsidian Vault	`obsidian`	`/ext/obsidian-vault`	Vault structure preservation, markdown parsing	No
PaperlessNgx	`paperlessNgx`	`/ext/paperless-ngx`	Document management system integration	No
File Upload	N/A	`/system/upload-document`	20+ file formats (PDF, DOCX, images, audio)	No
Link Scraping	N/A	`/system/process-link`	Single URL scraping with Puppeteer	Yes

Frontend Integration:

Connector metadata: frontend/src/components/Modals/ManageWorkspace/DataConnectors/index.jsx15-43
Connector images: frontend/src/components/DataConnectorOption/media/index.js1-21
API client methods: frontend/src/models/dataConnector.js1-223

Backend Endpoints:

Extension router: collector/extensions/index.js9-135
Resync methods: collector/extensions/resync/index.js1-238
Main API endpoints: server/endpoints/extensions/index.js12-102

Citation Support by Connector

Citations in the chat UI display different icons and links based on the chunkSource metadata field:

Source Identifier	Display Icon	Link Behavior	Example `chunkSource`
`link://`	`LinkSimple`	Opens URL in new tab	`link://https://example.com/page`
`youtube://`	`YoutubeLogo`	Opens YouTube video	`youtube://https://youtube.com/watch?v=xyz`
`github://`	`GithubLogo`	Opens repository/file	`github://https://github.com/user/repo/blob/main/file.md`
`gitlab://`	`GitlabLogo`	Opens GitLab resource	`gitlab://https://gitlab.com/user/project/-/blob/main/file.md`
`confluence://`	Custom image	Opens Confluence page	`confluence://https://company.atlassian.net/wiki/spaces/...`
`drupalwiki://`	Custom image	Opens DrupalWiki page	`drupalwiki://https://wiki.example.com/...`
`obsidian://`	Custom image	Displays title only	`obsidian://vault-name/path/to/note`
`paperless-ngx://`	Custom image	Opens document URL	`paperless-ngx://https://paperless.example.com/documents/123`
Default (file)	`FileText`	No external link	Local file path

Citation Parsing Logic: frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/Citation/index.jsx212-316

Supported Source Identifiers: frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/Citation/index.jsx212-221

Sources: frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/Citation/index.jsx212-316 frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/Citation/index.jsx337-347

Connector-Specific Configuration

GitHub Connector:

Branch Selection: Fetches available branches via /ext/github-repo/branches endpoint
Ignore Patterns: Supports .gitignore-style path filtering
Authentication: Personal access token for private repositories

Confluence Connector:

Cloud vs Server: Separate authentication methods (PAT vs username/password)
SSL Bypass: Optional SSL certificate validation bypass for self-hosted instances
Space-Based: Fetches pages by Confluence space key

YouTube Connector:

Transcript Extraction: Uses YouTube transcript API
Metadata: Captures video title, author, and publication date
Source Identifier: Sets chunkSource to youtube://VIDEO_URL

Website Depth Connector:

Configurable Depth: Controls crawl depth (default: 1)
Max Links: Limits total pages scraped (default: 20)
Link Extraction: Uses Puppeteer for JavaScript-rendered content

Obsidian Vault Connector:

Vault Structure: Preserves folder hierarchy
Markdown Parsing: Native markdown file processing
No External Link: Citations show title only (no URL)

Sources: frontend/src/models/dataConnector.js7-43 frontend/src/models/dataConnector.js135-167 collector/extensions/index.js90-113 collector/extensions/index.js115-132

Document Resync Capabilities

Several connectors support document re-synchronization to update content without re-uploading:

Resync Methods

Connector	Resync Method	Implementation	Use Case
Link	`resyncLink()`	Re-scrapes URL content	Update webpage content
YouTube	`resyncYoutube()`	Re-fetches transcript	Updated transcripts (rare)
GitHub	`resyncGithubRepo()`	Re-clones specific file	Update repository file
GitLab	`resyncGitlabRepo()`	Re-clones specific file	Update repository file
Confluence	`resyncConfluence()`	Re-fetches page by ID	Update wiki page content

Resync Flow:

Resync Type Mapping: collector/extensions/resync/index.js1-238

Supported Types:

link: collector/extensions/resync/index.js8-21
youtube: collector/extensions/resync/index.js31-45
github: collector/extensions/resync/index.js55-96
gitlab: collector/extensions/resync/index.js106-147
confluence: collector/extensions/resync/index.js157-196

Sources: collector/extensions/resync/index.js1-238 collector/extensions/index.js11-30 server/endpoints/extensions/index.js12-102

Provider Status and Capabilities

Model Selection Capabilities

Different providers have varying levels of model selection support:

Sources: frontend/src/hooks/useGetProvidersModels.js4-10 frontend/src/hooks/useGetProvidersModels.js40-55 frontend/src/hooks/useGetProvidersModels.js48-55

Provider Integration Status

Provider Type	Implementation Status	Configuration Method	Model Discovery
OpenAI	Full support	API key + model selection	Dynamic API
Anthropic	Full support	API key + model selection	Static list
Gemini	Full support with experimental models	API key + model selection	Dynamic API (v1 + v1beta)
Azure OpenAI	Basic support	Endpoint + deployment	Disabled model selection
Ollama	Full support	Local connection	Model discovery
AWS Bedrock	Basic support	AWS credentials	Disabled model selection
Local providers	Full support	Endpoint configuration	Local discovery

Sources: server/utils/AiProviders/gemini/index.js171-301 frontend/src/components/LLMSelection/AnthropicAiOptions/index.jsx42-79 frontend/src/hooks/useGetProvidersModels.js5-10

Supported Integrations

LLM Providers (30+)

LLM Provider Architecture

Complete LLM Provider List

Model Selection Pattern

Vector Databases (10+)

Vector Database Interface

Vector Database Provider Details

Document Storage Flow

Embedding Engines

Supported Embedding Engines

Data Connectors (10+)

Supported Data Connectors

Citation Support by Connector

Connector-Specific Configuration

Document Resync Capabilities

Resync Methods

Provider Status and Capabilities

Model Selection Capabilities

Provider Integration Status

On this page

Supported Integrations

LLM Providers (30+)

LLM Provider Architecture

Complete LLM Provider List

Model Selection Pattern

Vector Databases (10+)

Vector Database Interface

Vector Database Provider Details

Document Storage Flow

Embedding Engines

Supported Embedding Engines

Data Connectors (10+)

Supported Data Connectors

Citation Support by Connector

Connector-Specific Configuration

Document Resync Capabilities

Resync Methods

Provider Status and Capabilities

Model Selection Capabilities

Provider Integration Status

On this page