Provider Architecture

Relevant source files

This document describes the common architecture shared by all LLM providers in AnythingLLM. It covers the interface contract, factory pattern, initialization lifecycle, context window management, and embedding engine pairing. For information about implementing new providers, see Adding New LLM Providers. For model discovery and caching strategies, see Model Management and Discovery.

Overview

AnythingLLM supports 30+ LLM providers through a polymorphic architecture where each provider implements a common interface while handling vendor-specific API interactions internally. All providers follow the same initialization pattern, expose the same core methods, and integrate with the same performance monitoring and embedding systems.

The architecture enforces:

Uniform interface: All providers expose identical methods (getChatCompletion, streamGetChatCompletion, constructPrompt, etc.)
Workspace-level overrides: Providers can be configured globally or per-workspace
Automatic embedding pairing: Each provider is paired with an embedding engine (defaults to NativeEmbedder)
Performance tracking: All API calls are wrapped in LLMPerformanceMonitor
Context window enforcement: Providers report their token limits for prompt compression

Example Providers: OllamaAILLM, OpenAiLLM, GeminiLLM, AnthropicLLM, AzureOpenAiLLM, LMStudioLLM, TogetherAiLLM, MistralLLM, HuggingFaceLLM, LocalAiLLM

Sources: server/utils/AiProviders/ollama/index.js14-46 server/utils/AiProviders/openAi/index.js13-34 server/utils/AiProviders/anthropic/index.js13-40

Provider Interface Contract

All provider classes implement a standard set of methods and properties. This interface enables the chat engine to interact with any provider without knowing implementation details.

Provider Interface Diagram

Sources: server/utils/AiProviders/ollama/index.js14-489 server/utils/AiProviders/openAi/index.js13-301 server/utils/AiProviders/anthropic/index.js13-331 server/utils/AiProviders/gemini/index.js27-457

Core Methods

Method	Purpose	Return Type	Required
`constructor(embedder, modelPreference)`	Initialize provider with optional custom embedder and model	N/A	Yes
`streamingEnabled()`	Check if streaming is supported	`boolean`	Yes
`promptWindowLimit()`	Get maximum context window in tokens	`number`	Yes
`isValidChatCompletionModel(modelName)`	Validate model name	`Promise<boolean>`	Yes
`constructPrompt(promptArgs)`	Build message array from components	`Array<Message>`	Yes
`getChatCompletion(messages, options)`	Synchronous completion request	`Promise<{textResponse, metrics}>`	Yes
`streamGetChatCompletion(messages, options)`	Streaming completion request	`Promise<Stream>`	Yes
`handleStream(response, stream, responseProps)`	Process streaming response chunks	`Promise<string>`	Yes
`embedTextInput(textInput)`	Generate embeddings for text	`Promise<Array<number>>`	Yes
`embedChunks(textChunks)`	Generate embeddings for multiple chunks	`Promise<Array<Array<number>>>`	Yes
`compressMessages(promptArgs, rawHistory)`	Compress prompt to fit context window	`Promise<Array<Message>>`	Yes

Sources: server/utils/AiProviders/ollama/index.js166-484 server/utils/AiProviders/openAi/index.js52-296

Standard Properties

Every provider instance exposes these properties:

The limits object allocates context window space proportionally: system prompt (15%), history (15%), and user input (70%). These values are recalculated when the model changes.

Sources: server/utils/AiProviders/ollama/index.js18-46 server/utils/AiProviders/openAi/index.js22-34

Provider Factory Pattern

Provider instantiation uses a factory function (not shown in provided files but referenced) that selects the correct class based on configuration. The factory checks both system-level settings (process.env.LLM_PROVIDER) and workspace-level overrides (workspace.chatProvider).

Factory Selection Flow

The factory also handles embedding engine selection. If no custom embedder is provided, it defaults to NativeEmbedder, which uses local Xenova/transformers models for offline operation.

Sources: Reference to factory pattern in server/utils/AiProviders/ollama/index.js18 server/utils/AiProviders/openAi/index.js14

Provider Initialization Lifecycle

Provider constructors follow a consistent initialization pattern:

Initialization Sequence Diagram

Sources: server/utils/AiProviders/ollama/index.js18-46 server/utils/AiProviders/gemini/index.js28-62 server/utils/AiProviders/anthropic/index.js14-40

Environment Variable Validation

Each provider validates required configuration in the constructor:

This fail-fast approach prevents runtime errors when providers are selected but not properly configured.

Sources: server/utils/AiProviders/ollama/index.js19-20 server/utils/AiProviders/openAi/index.js15 server/utils/AiProviders/anthropic/index.js15-16

Client SDK Initialization

Providers wrap vendor SDKs with consistent interfaces. Most use the OpenAI SDK for compatibility:

Provider	SDK/Client	Base URL
`OllamaAILLM`	`ollama` NPM package	`process.env.OLLAMA_BASE_PATH`
`OpenAiLLM`	`openai` SDK	OpenAI default endpoint
`GeminiLLM`	`openai` SDK	`https://generativelanguage.googleapis.com/v1beta/openai/`
`AnthropicLLM`	`@anthropic-ai/sdk`	Anthropic default endpoint
`AzureOpenAiLLM`	`openai` SDK	Formatted from `AZURE_OPENAI_ENDPOINT`
`LMStudioLLM`	`openai` SDK	`process.env.LMSTUDIO_BASE_PATH/v1`
`TogetherAiLLM`	`openai` SDK	`https://api.together.xyz/v1`
`MistralLLM`	`openai` SDK	`https://api.mistral.ai/v1`

The OpenAI SDK is used widely because many providers implement OpenAI-compatible APIs, simplifying integration.

Sources: server/utils/AiProviders/ollama/index.js33-37 server/utils/AiProviders/gemini/index.js40-44 server/utils/AiProviders/anthropic/index.js20-24 server/utils/AiProviders/togetherAi/index.js84-89

Context Window Management

Context window management is critical for preventing token limit errors. Each provider reports its maximum context window and dynamically allocates space between system prompt, chat history, and user input.

Context Window Resolution Flow

Sources: server/utils/AiProviders/ollama/index.js170-207 server/utils/AiProviders/gemini/index.js107-141 server/utils/AiProviders/openAi/index.js56-62

Static vs. Dynamic Context Windows

Static Providers (OpenAI, Anthropic, Mistral): Context windows are hardcoded in MODEL_MAP:

Dynamic Providers (Ollama, LMStudio, Gemini): Context windows are fetched from provider APIs and cached:

Caching prevents API calls on every chat request. Ollama and LMStudio cache context windows in static properties; Gemini writes to filesystem (storage/models/gemini/models.json).

Sources: server/utils/AiProviders/ollama/index.js78-115 server/utils/AiProviders/lmStudio/index.js73-112 server/utils/AiProviders/gemini/index.js172-302

User-Defined Limits

Providers respect user-defined token limits via environment variables:

OLLAMA_MODEL_TOKEN_LIMIT
LMSTUDIO_MODEL_TOKEN_LIMIT
AZURE_OPENAI_TOKEN_LIMIT
LOCAL_AI_MODEL_TOKEN_LIMIT
HUGGING_FACE_LLM_TOKEN_LIMIT

These limits are enforced as the minimum of user-defined and system-detected values to prevent exceeding actual model capabilities:

Sources: server/utils/AiProviders/ollama/index.js178-191 server/utils/AiProviders/lmStudio/index.js138-153

Prompt Compression

When prompts exceed context limits, the compressMessages method truncates history intelligently:

The messageArrayCompressor (or messageStringCompressor for Anthropic) uses the provider's limits object to determine how many tokens to allocate for each component (system prompt, history, user input).

Sources: server/utils/AiProviders/ollama/index.js479-484 server/utils/AiProviders/anthropic/index.js310-318

Embedding Engine Pairing

All providers are paired with an embedding engine for generating query embeddings during RAG similarity search. The pairing happens in the constructor:

Embedding Architecture

The provider's embedTextInput and embedChunks methods delegate to the paired embedder:

This delegation pattern allows the chat engine to call provider.embedTextInput() without knowing which embedding engine is active. The system-level embedding provider configuration (process.env.EMBEDDING_ENGINE) determines which embedder is instantiated and passed to LLM providers.

Sources: server/utils/AiProviders/ollama/index.js38 server/utils/AiProviders/ollama/index.js472-477 server/utils/AiProviders/openAi/index.js29

NativeEmbedder Default

NativeEmbedder is the default for offline operation. It uses Xenova/transformers to run embedding models locally (e.g., Xenova/all-MiniLM-L6-v2). This enables AnythingLLM to function without external API dependencies for embeddings.

Sources: server/utils/EmbeddingEngines/native (referenced but not in provided files)

Performance Monitoring

All provider API calls are wrapped in LLMPerformanceMonitor to track metrics:

Monitoring Wrapper Pattern

Sources: server/utils/AiProviders/ollama/index.js268-319 server/utils/AiProviders/openAi/index.js152-183

Metrics Structure

Every completion returns a standardized metrics object:

Sources: server/utils/AiProviders/ollama/index.js305-318 server/utils/AiProviders/openAi/index.js169-182

Streaming Measurement

Streaming responses use LLMPerformanceMonitor.measureStream():

The monitor wraps the stream and tracks metrics in real-time. Providers call stream.endMeasurement(usage) when streaming completes to finalize metrics.

Sources: server/utils/AiProviders/ollama/index.js322-342 server/utils/AiProviders/gemini/index.js416-433

Prompt Construction

All providers implement constructPrompt() to transform RAG components into the message format their API expects:

Prompt Construction Flow

Sources: server/utils/AiProviders/ollama/index.js246-265 server/utils/AiProviders/openAi/index.js107-126

Context Appending

Context from similarity search is formatted and appended to the system prompt:

This ensures context is clearly delimited for the model to reference during response generation.

Sources: server/utils/AiProviders/ollama/index.js117-127 server/utils/AiProviders/openAi/index.js40-50

For providers supporting vision, attachments are converted to content arrays:

This allows models like GPT-4 Vision, Claude 3, and Gemini to analyze images alongside text prompts.

Sources: server/utils/AiProviders/openAi/index.js87-100 server/utils/AiProviders/gemini/index.js320-334

Provider-Specific Features

While all providers implement the common interface, many have unique capabilities exposed through additional methods or properties.

Special Features by Provider

Provider	Feature	Implementation
`OllamaAILLM`	Reasoning token support	Wraps `message.thinking` in `<think>` tags
`OllamaAILLM`	Custom fetch timeout	`#applyFetch()` uses `undici` Agent for extended timeouts
`AnthropicLLM`	Prompt caching	`cacheControl` getter with 5m/1h TTL options
`GeminiLLM`	Experimental models	`isExperimentalModel()` checks v1beta API access
`GeminiLLM`	System prompt emulation	Models without system support use user/assistant workaround
`AzureOpenAiLLM`	O-type model handling	`isOTypeModel` flag changes message format (system → user)
`LMStudioLLM`	Dynamic model discovery	Fetches available models via `/api/v0/models`
`TogetherAiLLM`	Model caching	Caches model list to disk for 1 week

Sources: server/utils/AiProviders/ollama/index.js398-425 server/utils/AiProviders/anthropic/index.js73-86 server/utils/AiProviders/gemini/index.js149-163 server/utils/AiProviders/azureOpenAi/index.js30-31

Ollama Reasoning Tokens

Ollama models can return reasoning content separately from main output:

This makes internal reasoning visible to users without polluting the main response.

Sources: server/utils/AiProviders/ollama/index.js398-425

Anthropic Prompt Caching

Anthropic supports ephemeral caching of system prompts to reduce costs:

Repeated system prompts are cached by Anthropic for 5 minutes or 1 hour, reducing token costs significantly for multi-turn conversations.

Sources: server/utils/AiProviders/anthropic/index.js73-102

Gemini Experimental Models

Gemini distinguishes between stable (v1) and experimental (v1beta) models:

All models use the same OpenAI-compatible endpoint (v1beta/openai/), regardless of stability status.

Sources: server/utils/AiProviders/gemini/index.js149-163

Azure OpenAI O-Type Models

Azure deployments using reasoning models (o1, o3-mini) require special handling:

O-type models don't accept system role messages and don't support temperature parameters.

Sources: server/utils/AiProviders/azureOpenAi/index.js24-148

Frontend Integration

The frontend uses hooks to fetch available models dynamically and display them in settings forms.

Model Fetching Hook

Sources: frontend/src/hooks/useGetProvidersModels.js1-86 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx64-138

Grouped vs. Ungrouped Providers

Grouped Providers (TogetherAI, OpenAI, Novita, OpenRouter, etc.): Models are organized by creator organization:

This creates <optgroup> elements in the model selector for better UX when hundreds of models are available.

Ungrouped Providers: Models are displayed as a flat list.

Sources: frontend/src/hooks/useGetProvidersModels.js40-86 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx116-135

Provider-Specific Settings Forms

Each provider has a dedicated settings component that renders API key fields, model selectors, and provider-specific options:

AnthropicAiOptions: API key + model selector + prompt caching toggle
GeminiLLMOptions: API key + dynamic model selector (Stable/Experimental grouping)

These components use System.customModels() to fetch available models when the API key changes.

Sources: frontend/src/components/LLMSelection/AnthropicAiOptions/index.jsx1-191 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx1-139

Summary

The provider architecture in AnythingLLM achieves flexibility through:

Common interface: All providers implement identical methods, enabling polymorphic usage
Factory pattern: Provider selection based on configuration without hardcoded logic
Dynamic context windows: Automatic detection and caching of token limits
Embedding pairing: Each provider carries its embedding engine for RAG operations
Performance monitoring: Automatic tracking of token usage and latency
Provider-specific extensions: Special features exposed through additional methods/properties

This architecture allows AnythingLLM to support 30+ providers while maintaining a consistent API surface for the chat engine.

Provider Architecture

Relevant source files

Overview

The architecture enforces:

Uniform interface: All providers expose identical methods (getChatCompletion, streamGetChatCompletion, constructPrompt, etc.)
Workspace-level overrides: Providers can be configured globally or per-workspace
Automatic embedding pairing: Each provider is paired with an embedding engine (defaults to NativeEmbedder)
Performance tracking: All API calls are wrapped in LLMPerformanceMonitor
Context window enforcement: Providers report their token limits for prompt compression

Example Providers: OllamaAILLM, OpenAiLLM, GeminiLLM, AnthropicLLM, AzureOpenAiLLM, LMStudioLLM, TogetherAiLLM, MistralLLM, HuggingFaceLLM, LocalAiLLM

Sources: server/utils/AiProviders/ollama/index.js14-46 server/utils/AiProviders/openAi/index.js13-34 server/utils/AiProviders/anthropic/index.js13-40

Provider Interface Contract

All provider classes implement a standard set of methods and properties. This interface enables the chat engine to interact with any provider without knowing implementation details.

Provider Interface Diagram

Sources: server/utils/AiProviders/ollama/index.js14-489 server/utils/AiProviders/openAi/index.js13-301 server/utils/AiProviders/anthropic/index.js13-331 server/utils/AiProviders/gemini/index.js27-457

Core Methods

Method	Purpose	Return Type	Required
`constructor(embedder, modelPreference)`	Initialize provider with optional custom embedder and model	N/A	Yes
`streamingEnabled()`	Check if streaming is supported	`boolean`	Yes
`promptWindowLimit()`	Get maximum context window in tokens	`number`	Yes
`isValidChatCompletionModel(modelName)`	Validate model name	`Promise<boolean>`	Yes
`constructPrompt(promptArgs)`	Build message array from components	`Array<Message>`	Yes
`getChatCompletion(messages, options)`	Synchronous completion request	`Promise<{textResponse, metrics}>`	Yes
`streamGetChatCompletion(messages, options)`	Streaming completion request	`Promise<Stream>`	Yes
`handleStream(response, stream, responseProps)`	Process streaming response chunks	`Promise<string>`	Yes
`embedTextInput(textInput)`	Generate embeddings for text	`Promise<Array<number>>`	Yes
`embedChunks(textChunks)`	Generate embeddings for multiple chunks	`Promise<Array<Array<number>>>`	Yes
`compressMessages(promptArgs, rawHistory)`	Compress prompt to fit context window	`Promise<Array<Message>>`	Yes

Sources: server/utils/AiProviders/ollama/index.js166-484 server/utils/AiProviders/openAi/index.js52-296

Standard Properties

Every provider instance exposes these properties:

The limits object allocates context window space proportionally: system prompt (15%), history (15%), and user input (70%). These values are recalculated when the model changes.

Sources: server/utils/AiProviders/ollama/index.js18-46 server/utils/AiProviders/openAi/index.js22-34

Provider Factory Pattern

Factory Selection Flow

The factory also handles embedding engine selection. If no custom embedder is provided, it defaults to NativeEmbedder, which uses local Xenova/transformers models for offline operation.

Sources: Reference to factory pattern in server/utils/AiProviders/ollama/index.js18 server/utils/AiProviders/openAi/index.js14

Provider Initialization Lifecycle

Provider constructors follow a consistent initialization pattern:

Initialization Sequence Diagram

Sources: server/utils/AiProviders/ollama/index.js18-46 server/utils/AiProviders/gemini/index.js28-62 server/utils/AiProviders/anthropic/index.js14-40

Environment Variable Validation

Each provider validates required configuration in the constructor:

This fail-fast approach prevents runtime errors when providers are selected but not properly configured.

Sources: server/utils/AiProviders/ollama/index.js19-20 server/utils/AiProviders/openAi/index.js15 server/utils/AiProviders/anthropic/index.js15-16

Client SDK Initialization

Providers wrap vendor SDKs with consistent interfaces. Most use the OpenAI SDK for compatibility:

Provider	SDK/Client	Base URL
`OllamaAILLM`	`ollama` NPM package	`process.env.OLLAMA_BASE_PATH`
`OpenAiLLM`	`openai` SDK	OpenAI default endpoint
`GeminiLLM`	`openai` SDK	`https://generativelanguage.googleapis.com/v1beta/openai/`
`AnthropicLLM`	`@anthropic-ai/sdk`	Anthropic default endpoint
`AzureOpenAiLLM`	`openai` SDK	Formatted from `AZURE_OPENAI_ENDPOINT`
`LMStudioLLM`	`openai` SDK	`process.env.LMSTUDIO_BASE_PATH/v1`
`TogetherAiLLM`	`openai` SDK	`https://api.together.xyz/v1`
`MistralLLM`	`openai` SDK	`https://api.mistral.ai/v1`

The OpenAI SDK is used widely because many providers implement OpenAI-compatible APIs, simplifying integration.

Sources: server/utils/AiProviders/ollama/index.js33-37 server/utils/AiProviders/gemini/index.js40-44 server/utils/AiProviders/anthropic/index.js20-24 server/utils/AiProviders/togetherAi/index.js84-89

Context Window Management

Context Window Resolution Flow

Sources: server/utils/AiProviders/ollama/index.js170-207 server/utils/AiProviders/gemini/index.js107-141 server/utils/AiProviders/openAi/index.js56-62

Static vs. Dynamic Context Windows

Static Providers (OpenAI, Anthropic, Mistral): Context windows are hardcoded in MODEL_MAP:

Dynamic Providers (Ollama, LMStudio, Gemini): Context windows are fetched from provider APIs and cached:

Caching prevents API calls on every chat request. Ollama and LMStudio cache context windows in static properties; Gemini writes to filesystem (storage/models/gemini/models.json).

Sources: server/utils/AiProviders/ollama/index.js78-115 server/utils/AiProviders/lmStudio/index.js73-112 server/utils/AiProviders/gemini/index.js172-302

User-Defined Limits

Providers respect user-defined token limits via environment variables:

OLLAMA_MODEL_TOKEN_LIMIT
LMSTUDIO_MODEL_TOKEN_LIMIT
AZURE_OPENAI_TOKEN_LIMIT
LOCAL_AI_MODEL_TOKEN_LIMIT
HUGGING_FACE_LLM_TOKEN_LIMIT

These limits are enforced as the minimum of user-defined and system-detected values to prevent exceeding actual model capabilities:

Sources: server/utils/AiProviders/ollama/index.js178-191 server/utils/AiProviders/lmStudio/index.js138-153

Prompt Compression

When prompts exceed context limits, the compressMessages method truncates history intelligently:

Sources: server/utils/AiProviders/ollama/index.js479-484 server/utils/AiProviders/anthropic/index.js310-318

Embedding Engine Pairing

All providers are paired with an embedding engine for generating query embeddings during RAG similarity search. The pairing happens in the constructor:

Embedding Architecture

The provider's embedTextInput and embedChunks methods delegate to the paired embedder:

Sources: server/utils/AiProviders/ollama/index.js38 server/utils/AiProviders/ollama/index.js472-477 server/utils/AiProviders/openAi/index.js29

NativeEmbedder Default

Sources: server/utils/EmbeddingEngines/native (referenced but not in provided files)

Performance Monitoring

All provider API calls are wrapped in LLMPerformanceMonitor to track metrics:

Monitoring Wrapper Pattern

Sources: server/utils/AiProviders/ollama/index.js268-319 server/utils/AiProviders/openAi/index.js152-183

Metrics Structure

Every completion returns a standardized metrics object:

Sources: server/utils/AiProviders/ollama/index.js305-318 server/utils/AiProviders/openAi/index.js169-182

Streaming Measurement

Streaming responses use LLMPerformanceMonitor.measureStream():

The monitor wraps the stream and tracks metrics in real-time. Providers call stream.endMeasurement(usage) when streaming completes to finalize metrics.

Sources: server/utils/AiProviders/ollama/index.js322-342 server/utils/AiProviders/gemini/index.js416-433

Prompt Construction

All providers implement constructPrompt() to transform RAG components into the message format their API expects:

Prompt Construction Flow

Sources: server/utils/AiProviders/ollama/index.js246-265 server/utils/AiProviders/openAi/index.js107-126

Context Appending

Context from similarity search is formatted and appended to the system prompt:

This ensures context is clearly delimited for the model to reference during response generation.

Sources: server/utils/AiProviders/ollama/index.js117-127 server/utils/AiProviders/openAi/index.js40-50

For providers supporting vision, attachments are converted to content arrays:

This allows models like GPT-4 Vision, Claude 3, and Gemini to analyze images alongside text prompts.

Sources: server/utils/AiProviders/openAi/index.js87-100 server/utils/AiProviders/gemini/index.js320-334

Provider-Specific Features

While all providers implement the common interface, many have unique capabilities exposed through additional methods or properties.

Special Features by Provider

Provider	Feature	Implementation
`OllamaAILLM`	Reasoning token support	Wraps `message.thinking` in `<think>` tags
`OllamaAILLM`	Custom fetch timeout	`#applyFetch()` uses `undici` Agent for extended timeouts
`AnthropicLLM`	Prompt caching	`cacheControl` getter with 5m/1h TTL options
`GeminiLLM`	Experimental models	`isExperimentalModel()` checks v1beta API access
`GeminiLLM`	System prompt emulation	Models without system support use user/assistant workaround
`AzureOpenAiLLM`	O-type model handling	`isOTypeModel` flag changes message format (system → user)
`LMStudioLLM`	Dynamic model discovery	Fetches available models via `/api/v0/models`
`TogetherAiLLM`	Model caching	Caches model list to disk for 1 week

Ollama Reasoning Tokens

Ollama models can return reasoning content separately from main output:

This makes internal reasoning visible to users without polluting the main response.

Sources: server/utils/AiProviders/ollama/index.js398-425

Anthropic Prompt Caching

Anthropic supports ephemeral caching of system prompts to reduce costs:

Repeated system prompts are cached by Anthropic for 5 minutes or 1 hour, reducing token costs significantly for multi-turn conversations.

Sources: server/utils/AiProviders/anthropic/index.js73-102

Gemini Experimental Models

Gemini distinguishes between stable (v1) and experimental (v1beta) models:

All models use the same OpenAI-compatible endpoint (v1beta/openai/), regardless of stability status.

Sources: server/utils/AiProviders/gemini/index.js149-163

Azure OpenAI O-Type Models

Azure deployments using reasoning models (o1, o3-mini) require special handling:

O-type models don't accept system role messages and don't support temperature parameters.

Sources: server/utils/AiProviders/azureOpenAi/index.js24-148

Frontend Integration

The frontend uses hooks to fetch available models dynamically and display them in settings forms.

Model Fetching Hook

Sources: frontend/src/hooks/useGetProvidersModels.js1-86 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx64-138

Grouped vs. Ungrouped Providers

Grouped Providers (TogetherAI, OpenAI, Novita, OpenRouter, etc.): Models are organized by creator organization:

This creates <optgroup> elements in the model selector for better UX when hundreds of models are available.

Ungrouped Providers: Models are displayed as a flat list.

Sources: frontend/src/hooks/useGetProvidersModels.js40-86 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx116-135

Provider-Specific Settings Forms

Each provider has a dedicated settings component that renders API key fields, model selectors, and provider-specific options:

AnthropicAiOptions: API key + model selector + prompt caching toggle
GeminiLLMOptions: API key + dynamic model selector (Stable/Experimental grouping)

These components use System.customModels() to fetch available models when the API key changes.

Sources: frontend/src/components/LLMSelection/AnthropicAiOptions/index.jsx1-191 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx1-139

Summary

The provider architecture in AnythingLLM achieves flexibility through:

Common interface: All providers implement identical methods, enabling polymorphic usage
Factory pattern: Provider selection based on configuration without hardcoded logic
Dynamic context windows: Automatic detection and caching of token limits
Embedding pairing: Each provider carries its embedding engine for RAG operations
Performance monitoring: Automatic tracking of token usage and latency
Provider-specific extensions: Special features exposed through additional methods/properties

This architecture allows AnythingLLM to support 30+ providers while maintaining a consistent API surface for the chat engine.

Provider Architecture

Overview

Provider Interface Contract

Provider Interface Diagram

Core Methods

Standard Properties

Provider Factory Pattern

Factory Selection Flow

Provider Initialization Lifecycle

Initialization Sequence Diagram

Environment Variable Validation

Client SDK Initialization

Context Window Management

Context Window Resolution Flow

Static vs. Dynamic Context Windows

User-Defined Limits

Prompt Compression

Embedding Engine Pairing

Embedding Architecture

NativeEmbedder Default

Performance Monitoring

Monitoring Wrapper Pattern

Metrics Structure

Streaming Measurement

Prompt Construction

Prompt Construction Flow

Context Appending

Multi-Modal Content

Provider-Specific Features

Special Features by Provider

Ollama Reasoning Tokens

Anthropic Prompt Caching

Gemini Experimental Models

Azure OpenAI O-Type Models

Frontend Integration

Model Fetching Hook

Grouped vs. Ungrouped Providers

Provider-Specific Settings Forms

Summary

On this page

Provider Architecture

Overview

Provider Interface Contract

Provider Interface Diagram

Core Methods

Standard Properties

Provider Factory Pattern

Factory Selection Flow

Provider Initialization Lifecycle

Initialization Sequence Diagram

Environment Variable Validation

Client SDK Initialization

Context Window Management

Context Window Resolution Flow

Static vs. Dynamic Context Windows

User-Defined Limits

Prompt Compression

Embedding Engine Pairing

Embedding Architecture

NativeEmbedder Default

Performance Monitoring

Monitoring Wrapper Pattern

Metrics Structure

Streaming Measurement

Prompt Construction

Prompt Construction Flow

Context Appending

Multi-Modal Content

Provider-Specific Features

Special Features by Provider

Ollama Reasoning Tokens

Anthropic Prompt Caching

Gemini Experimental Models

Azure OpenAI O-Type Models

Frontend Integration

Model Fetching Hook

Grouped vs. Ungrouped Providers

Provider-Specific Settings Forms

Summary

On this page