Adding New LLM Providers

Relevant source files

This guide provides a step-by-step process for implementing a new LLM provider class in AnythingLLM. All providers implement a standardized interface while accommodating provider-specific features such as streaming protocols, authentication mechanisms, and model discovery.

Related Pages:

Provider Architecture and Selection: See page 5.1
Model Management and Discovery: See page 5.3
Configuration Management: See page 3

Implementation Overview

New LLM providers are added to the codebase by creating a class that implements the standard provider interface. The class must be placed in server/utils/AiProviders/<provider-name>/index.js and registered in the provider factory at server/utils/helpers/chat/index.js.

Title: Provider Integration Architecture

Sources: server/utils/AiProviders/openAi/index.js13-297 server/utils/AiProviders/anthropic/index.js13-331 server/utils/AiProviders/ollama/index.js14-489

Step 1: Create the Provider Class Structure

Create a new directory under server/utils/AiProviders/<provider-name>/ with an index.js file. The basic class structure follows this template:

Sources: server/utils/AiProviders/openAi/index.js13-34 server/utils/AiProviders/ollama/index.js14-46 server/utils/AiProviders/gemini/index.js27-62

Step 2: Implement Required Core Methods

Method Signatures

All provider classes must implement these methods:

Method	Return Type	Purpose
`constructPrompt()`	`Array<Message>`	Build message array from prompt components
`getChatCompletion()`	`Promise<{textResponse, metrics}>`	Non-streaming completion
`streamGetChatCompletion()`	`Promise<Stream>`	Initialize streaming completion
`handleStream()`	`Promise<string>`	Process stream chunks and write responses
`compressMessages()`	`Promise<Array<Message>>`	Fit messages within context window
`embedTextInput()`	`Promise<Array<number>>`	Delegate to embedder
`embedChunks()`	`Promise<Array<Array<number>>>`	Delegate to embedder
`promptWindowLimit()`	`number`	Return context window size
`streamingEnabled()`	`boolean`	Check if streaming is available
`isValidChatCompletionModel()`	`Promise<boolean>`	Validate model name

Sources: server/utils/AiProviders/openAi/index.js52-296 server/utils/AiProviders/anthropic/index.js46-327

constructPrompt() Implementation

This method assembles the message array from component parts:

Note: Some providers (e.g., Anthropic, Azure reasoning models) require different message structures. See provider-specific patterns below.

Sources: server/utils/AiProviders/openAi/index.js107-126 server/utils/AiProviders/ollama/index.js246-265 server/utils/AiProviders/anthropic/index.js128-148

getChatCompletion() Implementation

This method executes a non-streaming completion request:

Key Points:

Always wrap provider API calls with LLMPerformanceMonitor.measureAsyncFunction()
Return metrics in the standardized format for telemetry
Handle missing usage data gracefully

Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/lmStudio/index.js215-248

Step 3: Implement Streaming Support

streamGetChatCompletion() Implementation

Initialize the streaming request with performance monitoring:

Sources: server/utils/AiProviders/openAi/index.js185-206 server/utils/AiProviders/gemini/index.js415-433

handleStream() Implementation

Process stream chunks and write to response:

For providers with standard OpenAI-compatible streaming, use the helper:

Sources: server/utils/AiProviders/ollama/index.js351-469 server/utils/AiProviders/gemini/index.js435-437 server/utils/AiProviders/lmStudio/index.js271-273

Step 4: Implement Utility Methods

Context Window Management

Sources: server/utils/AiProviders/openAi/index.js56-62 server/utils/AiProviders/ollama/index.js170-207 server/utils/AiProviders/gemini/index.js107-141

Embedding Delegation

Sources: server/utils/AiProviders/openAi/index.js285-290 server/utils/AiProviders/ollama/index.js472-477

Message Compression

For providers with unique message formats (e.g., Anthropic):

Sources: server/utils/AiProviders/openAi/index.js292-296 server/utils/AiProviders/anthropic/index.js310-318

Model Validation

Sources: server/utils/AiProviders/openAi/index.js69-80 server/utils/AiProviders/localAi/index.js64-66 server/utils/AiProviders/lmStudio/index.js160-164

Step 5: Handle Provider-Specific Features

Anthropic-Style System Prompt Separation

Some providers require the system prompt outside the messages array:

Sources: server/utils/AiProviders/anthropic/index.js150-162

Azure-Style Reasoning Models

Reasoning models (o1, o3) require special handling:

Sources: server/utils/AiProviders/azureOpenAi/index.js30-31 server/utils/AiProviders/azureOpenAi/index.js137-139 server/utils/AiProviders/azureOpenAi/index.js160-161

Ollama Reasoning Token Wrapping

Ollama exposes reasoning tokens separately, which should be wrapped in tags:

Sources: server/utils/AiProviders/ollama/index.js398-450

Gemini System Prompt Emulation

Some models don't support system prompts and require emulation:

Sources: server/utils/AiProviders/gemini/index.js70-72 server/utils/AiProviders/gemini/index.js349-368

Step 6: Implement Model Discovery (Optional)

Dynamic Model Fetching

For providers with model listing APIs:

Sources: server/utils/AiProviders/gemini/index.js172-302 server/utils/AiProviders/togetherAi/index.js18-78

Context Window Caching

For self-hosted providers that expose model metadata:

Sources: server/utils/AiProviders/ollama/index.js78-115 server/utils/AiProviders/lmStudio/index.js73-112

Step 7: Register the Provider

Backend Registration

Add the provider to getLLMProvider() in server/utils/helpers/chat/index.js:

Environment Variable Configuration

Add to server/utils/helpers/updateENV.js KEY_MAPPING:

Frontend Component

Create frontend/src/components/LLMSelection/YourProviderOptions/index.jsx:

Sources: frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx1-138 frontend/src/components/LLMSelection/AnthropicAiOptions/index.jsx1-190

Step 8: Testing and Validation

Unit Testing Checklist

Test these scenarios for your provider:

Test Case	Validation
Constructor initialization	Environment variables validated, client initialized
Non-streaming completion	Returns `{textResponse, metrics}` with correct format
Streaming completion	Writes chunks via `writeResponseChunk()`
Context window limits	Returns correct token limit for models
Message compression	Fits within context window
Image attachment handling	Multimodal content formatted correctly
Error handling	Graceful failures with user-friendly messages
Abort handling	Clean disconnect on client abort
Token counting	Accurate metrics returned

Integration Testing

Configuration Flow:
- Set environment variables
- Verify provider appears in settings UI
- Select provider and model
- Save configuration
Chat Flow:
- Send message to workspace using provider
- Verify streaming response appears correctly
- Check chat history persists
- Validate metrics logged
Edge Cases:
- Test with very long prompts (context limit)
- Test with attachments (if supported)
- Test with empty chat history
- Test with invalid API key
- Test connection timeout scenarios

Sources: server/utils/AiProviders/ollama/index.js230-239 server/utils/AiProviders/openAi/index.js146-183

Common Patterns and Reference Examples

Pattern 1: OpenAI-Compatible Providers

For providers with OpenAI-compatible APIs (LMStudio, LocalAI, Mistral, etc.):

Use OpenAI SDK from openai package
Set custom baseURL in constructor
Minimal custom code required
Use handleDefaultStreamResponseV2() for streaming

Examples: server/utils/AiProviders/lmStudio/index.js12-289 server/utils/AiProviders/localAi/index.js10-198 server/utils/AiProviders/mistral/index.js10-191

Pattern 2: Native SDK Providers

For providers with dedicated SDKs (Anthropic, Ollama):

Import provider-specific SDK
Implement custom streaming handlers
Handle provider-specific message formats
Map API responses to standard format

Examples: server/utils/AiProviders/anthropic/index.js13-331 server/utils/AiProviders/ollama/index.js14-489

Pattern 3: Dual API Version Providers

For providers with multiple API versions (Gemini):

Fetch models from multiple endpoints
Cache model metadata
Handle experimental vs stable model flags
Support feature detection per model

Examples: server/utils/AiProviders/gemini/index.js27-457

Troubleshooting

Common Issues

Issue	Cause	Solution
"Provider not found"	Not registered in factory	Add to `getLLMProvider()`
Streaming doesn't work	`handleStream()` not implemented	Implement or use `handleDefaultStreamResponseV2()`
Context window errors	Incorrect `promptWindowLimit()`	Verify model context limits
Missing metrics	Not using `LLMPerformanceMonitor`	Wrap API calls correctly
Embeddings fail	Wrong embedder instance	Check embedder initialization
Frontend model list empty	Model fetch failing	Check API endpoint and caching

Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/ollama/index.js351-469

OpenAI Provider

The OpenAiLLM class at server/utils/AiProviders/openAi/index.js13-297 provides integration with OpenAI's API, supporting GPT models including GPT-4, GPT-4o, and o-series reasoning models.

Configuration

Environment Variable	Default	Required	Purpose
`OPEN_AI_KEY`	-	Yes	OpenAI API authentication
`OPEN_MODEL_PREF`	`gpt-4o`	No	Default model selection

Sources: server/utils/AiProviders/openAi/index.js14-22

Model Validation

OpenAI provider implements fast-path validation for GPT models to avoid unnecessary API calls:

This optimization prevents API latency on every chat request since GPT model names are well-known patterns.

Sources: server/utils/AiProviders/openAi/index.js64-80

Temperature Handling

The OpenAI provider implements special temperature handling for reasoning models:

Sources: server/utils/AiProviders/openAi/index.js134-144

Streaming Protocol

OpenAI uses a custom streaming response format distinct from standard OpenAI SDK streaming:

Sources: server/utils/AiProviders/openAi/index.js182-201 server/utils/AiProviders/openAi/index.js203-277

Context Window Management

Context limits are calculated as percentages of the model's prompt window from MODEL_MAP:

Limit Type	Percentage	Purpose
`history`	15%	Chat history allocation
`system`	15%	System prompt allocation
`user`	70%	User prompt allocation

Sources: server/utils/AiProviders/openAi/index.js23-27 server/utils/AiProviders/openAi/index.js56-62

Anthropic Provider

The AnthropicLLM class at server/utils/AiProviders/anthropic/index.js13-280 provides integration with Anthropic's Claude models using the official @anthropic-ai/sdk.

Configuration

Environment Variable	Default	Required	Purpose
`ANTHROPIC_API_KEY`	-	Yes	Anthropic API key
`ANTHROPIC_MODEL_PREF`	`claude-3-5-sonnet-20241022`	No	Default model

Sources: server/utils/AiProviders/anthropic/index.js14-28

Anthropic-Specific Message Format

Unlike OpenAI-compatible providers, Anthropic requires the system message to be separated from the messages array:

Sources: server/utils/AiProviders/anthropic/index.js84-104 server/utils/AiProviders/anthropic/index.js109-116

Image Attachment Format

Anthropic uses a different image format from OpenAI:

Sources: server/utils/AiProviders/anthropic/index.js65-82

Streaming Implementation

Anthropic uses the native SDK's streaming interface with event-based chunk handling:

Sources: server/utils/AiProviders/anthropic/index.js136-150 server/utils/AiProviders/anthropic/index.js159-244

Message Compression

Anthropic uses messageStringCompressor instead of the standard messageArrayCompressor due to its unique API structure:

Sources: server/utils/AiProviders/anthropic/index.js258-266

Azure OpenAI Provider

The AzureOpenAiLLM class at server/utils/AiProviders/azureOpenAi/index.js10-206 extends OpenAI functionality for Microsoft Azure's OpenAI service with enterprise authentication and reasoning model support.

Configuration

Environment Variable	Required	Purpose
`AZURE_OPENAI_ENDPOINT`	Yes	Azure service endpoint URL
`AZURE_OPENAI_KEY`	Yes	Azure API key
`OPEN_MODEL_PREF`	Yes	Deployment name (user-defined)
`AZURE_OPENAI_MODEL_TYPE`	No	Set to `"reasoning"` for o-models
`AZURE_OPENAI_TOKEN_LIMIT`	No	Custom context window (default 4096)

Sources: server/utils/AiProviders/azureOpenAi/index.js11-24

Deployment-Based Model Selection

Azure OpenAI uses deployment names instead of standard model names. The deployment name can be any user-defined string:

Sources: server/utils/AiProviders/azureOpenAi/index.js82-87

Reasoning Model Support

When AZURE_OPENAI_MODEL_TYPE=reasoning, the provider adjusts behavior:

Aspect	Standard Models	Reasoning Models
Streaming	Enabled	Disabled
System Role	`"system"`	`"user"`
Temperature	Configurable	Omitted from request

Sources: server/utils/AiProviders/azureOpenAi/index.js25-26 server/utils/AiProviders/azureOpenAi/index.js56-65 server/utils/AiProviders/azureOpenAi/index.js119 server/utils/AiProviders/azureOpenAi/index.js142

API Versioning

The provider uses a specific API version for compatibility:

Sources: server/utils/AiProviders/azureOpenAi/index.js18

Google Gemini Provider

The GeminiLLM class at server/utils/AiProviders/gemini/index.js27-453 integrates Google's Gemini models through the OpenAI-compatible v1beta API endpoint.

Configuration

Environment Variable	Default	Required	Purpose
`GEMINI_API_KEY`	-	Yes	Google AI API key
`GEMINI_LLM_MODEL_PREF`	`gemini-2.0-flash-lite`	No	Default model

Sources: server/utils/AiProviders/gemini/index.js28-37

Model Caching System

Gemini implements a sophisticated model caching system to avoid frequent API calls:

Cache staleness is checked using a 24-hour threshold: MAX_STALE = 8.64e7 (1 day in milliseconds).

Sources: server/utils/AiProviders/gemini/index.js81-89 server/utils/AiProviders/gemini/index.js172-302

Experimental Model Detection

Gemini distinguishes between stable (v1) and experimental (v1beta) models:

All models use the v1beta/openai/ endpoint regardless of their version, as this endpoint supports both stable and experimental models.

Sources: server/utils/AiProviders/gemini/index.js39-40 server/utils/AiProviders/gemini/index.js149-163

System Prompt Limitations

Certain Gemini models do not support system prompts:

For these models, the system prompt is emulated through a user-assistant exchange:

Sources: server/utils/AiProviders/gemini/index.js20-25 server/utils/AiProviders/gemini/index.js70-72 server/utils/AiProviders/gemini/index.js349-368

Dynamic Context Window Resolution

Context windows are resolved from the cached model data:

Sources: server/utils/AiProviders/gemini/index.js126-141

Frontend Integration

The Gemini frontend component implements dynamic model fetching and groups models by stability:

Title: GeminiLLMOptions Component Flow

The component groups models using this logic:

Sources: frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx64-138 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx74-79

Ollama Provider

The OllamaAILLM class at server/utils/AiProviders/ollama/index.js13-413 provides integration with locally-hosted Ollama instances, supporting both local and remote deployments.

Configuration

Environment Variable	Default	Required	Purpose
`OLLAMA_BASE_PATH`	-	Yes	Ollama API endpoint URL
`OLLAMA_MODEL_PREF`	-	Yes	Model name (e.g., `llama2`)
`OLLAMA_AUTH_TOKEN`	-	No	Bearer token for auth
`OLLAMA_KEEP_ALIVE_TIMEOUT`	`300`	No	Model load timeout (seconds)
`OLLAMA_PERFORMANCE_MODE`	`base`	No	`base` or `max` context
`OLLAMA_RESPONSE_TIMEOUT`	-	No	Custom response timeout (ms)
`OLLAMA_MODEL_TOKEN_LIMIT`	-	No	User-defined context override

Sources: server/utils/AiProviders/ollama/index.js17-49

Automatic Context Window Caching

Ollama automatically caches context windows for all available models on initialization:

The cache is stored in the static property OllamaAILLM.modelContextWindows and persists for the server lifetime.

Sources: server/utils/AiProviders/ollama/index.js14-15 server/utils/AiProviders/ollama/index.js68-104

Performance Modes

Ollama supports two performance modes that control context usage:

Mode	Behavior
`base`	Uses default context handling
`max`	Sets `num_ctx` to full `promptWindowLimit()`

Sources: server/utils/AiProviders/ollama/index.js24 server/utils/AiProviders/ollama/index.js248-255

Custom Timeout Handling

For slow hardware, Ollama supports custom response timeouts using undici Agent:

Sources: server/utils/AiProviders/ollama/index.js124-153

Image Handling

Ollama uses base64-only format for images (no data URL prefix):

Messages with images use spread format: { role: "user", content: "text", images: [...] }

Sources: server/utils/AiProviders/ollama/index.js191-197 server/utils/AiProviders/ollama/index.js232

Error Enhancement

Ollama includes error handlers that provide more user-friendly messages:

Sources: server/utils/AiProviders/ollama/index.js203-212

Open-Source and Self-Hosted Providers

AnythingLLM supports multiple open-source and self-hosted provider options, all using OpenAI-compatible APIs.

LMStudio Provider

The LMStudioLLM class at server/utils/AiProviders/lmStudio/index.js12-284 integrates with LMStudio's local OpenAI-compatible server.

Configuration

Environment Variable	Default	Required
`LMSTUDIO_BASE_PATH`	-	Yes
`LMSTUDIO_MODEL_PREF`	`"Loaded from Chat UI"`	No
`LMSTUDIO_MODEL_TOKEN_LIMIT`	-	No

The default model name "Loaded from Chat UI" is required for LMStudio versions ≥0.2.17 due to a bug in multi-model chat support.

Sources: server/utils/AiProviders/lmStudio/index.js16-34

Base Path Parsing

LMStudio automatically normalizes the base path to ensure it ends in /v1:

Sources: server/utils/AiProviders/lmStudio/index.js270-278

Context Window Caching

Similar to Ollama, LMStudio caches context windows by fetching from /api/v0/models:

Sources: server/utils/AiProviders/lmStudio/index.js68-99

LocalAI Provider

The LocalAiLLM class at server/utils/AiProviders/localAi/index.js10-192 provides integration with LocalAI deployments.

Configuration

Environment Variable	Default	Required
`LOCAL_AI_BASE_PATH`	-	Yes
`LOCAL_AI_MODEL_PREF`	-	Yes
`LOCAL_AI_API_KEY`	`null`	No
`LOCAL_AI_MODEL_TOKEN_LIMIT`	`4096`	No

LocalAI uses manual token counting for metrics since the API may not return usage data:

Sources: server/utils/AiProviders/localAi/index.js11-29 server/utils/AiProviders/localAi/index.js135-146

TogetherAI Provider

The TogetherAiLLM class at server/utils/AiProviders/togetherAi/index.js80-258 integrates with Together.ai's model hosting platform.

Model Caching

TogetherAI implements 1-week model caching:

Models are filtered to only include chat-capable models and include organization metadata.

Sources: server/utils/AiProviders/togetherAi/index.js18-78

HuggingFace Provider

The HuggingFaceLLM class at server/utils/AiProviders/huggingface/index.js9-159 connects to HuggingFace Inference Endpoints.

Configuration

Environment Variable	Required
`HUGGING_FACE_LLM_ENDPOINT`	Yes
`HUGGING_FACE_LLM_API_KEY`	Yes
`HUGGING_FACE_LLM_TOKEN_LIMIT`	Yes

The model parameter is stubbed to "tgi" (Text Generation Inference) since HuggingFace endpoints run a single model:

Sources: server/utils/AiProviders/huggingface/index.js10-24

System Prompt Emulation

HuggingFace does not support system prompts natively, so they are emulated:

Sources: server/utils/AiProviders/huggingface/index.js76-83

Mistral Provider

The MistralLLM class at server/utils/AiProviders/mistral/index.js10-187 provides integration with Mistral AI's API.

Configuration

Environment Variable	Default	Required
`MISTRAL_API_KEY`	-	Yes
`MISTRAL_MODEL_PREF`	`mistral-tiny`	No

Mistral uses a fixed 32K context window and defaults to zero temperature:

Sources: server/utils/AiProviders/mistral/index.js11-30 server/utils/AiProviders/mistral/index.js54-60

Embedding Integration

All LLM provider classes delegate embedding functionality to a separate embedder instance through standardized wrapper methods. By default, providers use NativeEmbedder, but this can be overridden:

Title: Embedding Delegation Pattern in Provider Classes

Example implementation from OpenAiLLM:

This pattern allows the chat engine to inject custom embedders (e.g., OpenAiEmbedder, AzureOpenAiEmbedder) when needed, while maintaining a consistent interface across all providers.

Sources: server/utils/AiProviders/openAi/index.js16 server/utils/AiProviders/openAi/index.js273-278 server/utils/AiProviders/anthropic/index.js35 server/utils/AiProviders/anthropic/index.js269-274 server/utils/AiProviders/gemini/index.js52 server/utils/AiProviders/gemini/index.js441-446

Error Handling and Performance Monitoring

Both providers implement comprehensive error handling and performance monitoring through the LLMPerformanceMonitor system:

Metric	Description	Calculation
`prompt_tokens`	Input token count	From API response
`completion_tokens`	Output token count	From API response
`total_tokens`	Combined token count	Sum of prompt + completion
`outputTps`	Output tokens per second	`completion_tokens / duration`
`duration`	Request duration	Measured by performance monitor

Sources: server/utils/AiProviders/openAi/index.js151-178 server/utils/AiProviders/azureOpenAi/index.js138-161

Adding New LLM Providers

Relevant source files

Related Pages:

Provider Architecture and Selection: See page 5.1
Model Management and Discovery: See page 5.3
Configuration Management: See page 3

Implementation Overview

Title: Provider Integration Architecture

Sources: server/utils/AiProviders/openAi/index.js13-297 server/utils/AiProviders/anthropic/index.js13-331 server/utils/AiProviders/ollama/index.js14-489

Step 1: Create the Provider Class Structure

Create a new directory under server/utils/AiProviders/<provider-name>/ with an index.js file. The basic class structure follows this template:

Sources: server/utils/AiProviders/openAi/index.js13-34 server/utils/AiProviders/ollama/index.js14-46 server/utils/AiProviders/gemini/index.js27-62

Step 2: Implement Required Core Methods

Method Signatures

All provider classes must implement these methods:

Method	Return Type	Purpose
`constructPrompt()`	`Array<Message>`	Build message array from prompt components
`getChatCompletion()`	`Promise<{textResponse, metrics}>`	Non-streaming completion
`streamGetChatCompletion()`	`Promise<Stream>`	Initialize streaming completion
`handleStream()`	`Promise<string>`	Process stream chunks and write responses
`compressMessages()`	`Promise<Array<Message>>`	Fit messages within context window
`embedTextInput()`	`Promise<Array<number>>`	Delegate to embedder
`embedChunks()`	`Promise<Array<Array<number>>>`	Delegate to embedder
`promptWindowLimit()`	`number`	Return context window size
`streamingEnabled()`	`boolean`	Check if streaming is available
`isValidChatCompletionModel()`	`Promise<boolean>`	Validate model name

Sources: server/utils/AiProviders/openAi/index.js52-296 server/utils/AiProviders/anthropic/index.js46-327

constructPrompt() Implementation

This method assembles the message array from component parts:

Note: Some providers (e.g., Anthropic, Azure reasoning models) require different message structures. See provider-specific patterns below.

Sources: server/utils/AiProviders/openAi/index.js107-126 server/utils/AiProviders/ollama/index.js246-265 server/utils/AiProviders/anthropic/index.js128-148

getChatCompletion() Implementation

This method executes a non-streaming completion request:

Key Points:

Always wrap provider API calls with LLMPerformanceMonitor.measureAsyncFunction()
Return metrics in the standardized format for telemetry
Handle missing usage data gracefully

Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/lmStudio/index.js215-248

Step 3: Implement Streaming Support

streamGetChatCompletion() Implementation

Initialize the streaming request with performance monitoring:

Sources: server/utils/AiProviders/openAi/index.js185-206 server/utils/AiProviders/gemini/index.js415-433

handleStream() Implementation

Process stream chunks and write to response:

For providers with standard OpenAI-compatible streaming, use the helper:

Sources: server/utils/AiProviders/ollama/index.js351-469 server/utils/AiProviders/gemini/index.js435-437 server/utils/AiProviders/lmStudio/index.js271-273

Step 4: Implement Utility Methods

Context Window Management

Sources: server/utils/AiProviders/openAi/index.js56-62 server/utils/AiProviders/ollama/index.js170-207 server/utils/AiProviders/gemini/index.js107-141

Embedding Delegation

Sources: server/utils/AiProviders/openAi/index.js285-290 server/utils/AiProviders/ollama/index.js472-477

Message Compression

For providers with unique message formats (e.g., Anthropic):

Sources: server/utils/AiProviders/openAi/index.js292-296 server/utils/AiProviders/anthropic/index.js310-318

Model Validation

Sources: server/utils/AiProviders/openAi/index.js69-80 server/utils/AiProviders/localAi/index.js64-66 server/utils/AiProviders/lmStudio/index.js160-164

Step 5: Handle Provider-Specific Features

Anthropic-Style System Prompt Separation

Some providers require the system prompt outside the messages array:

Sources: server/utils/AiProviders/anthropic/index.js150-162

Azure-Style Reasoning Models

Reasoning models (o1, o3) require special handling:

Sources: server/utils/AiProviders/azureOpenAi/index.js30-31 server/utils/AiProviders/azureOpenAi/index.js137-139 server/utils/AiProviders/azureOpenAi/index.js160-161

Ollama Reasoning Token Wrapping

Ollama exposes reasoning tokens separately, which should be wrapped in tags:

Sources: server/utils/AiProviders/ollama/index.js398-450

Gemini System Prompt Emulation

Some models don't support system prompts and require emulation:

Sources: server/utils/AiProviders/gemini/index.js70-72 server/utils/AiProviders/gemini/index.js349-368

Step 6: Implement Model Discovery (Optional)

Dynamic Model Fetching

For providers with model listing APIs:

Sources: server/utils/AiProviders/gemini/index.js172-302 server/utils/AiProviders/togetherAi/index.js18-78

Context Window Caching

For self-hosted providers that expose model metadata:

Sources: server/utils/AiProviders/ollama/index.js78-115 server/utils/AiProviders/lmStudio/index.js73-112

Step 7: Register the Provider

Backend Registration

Add the provider to getLLMProvider() in server/utils/helpers/chat/index.js:

Environment Variable Configuration

Add to server/utils/helpers/updateENV.js KEY_MAPPING:

Frontend Component

Create frontend/src/components/LLMSelection/YourProviderOptions/index.jsx:

Sources: frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx1-138 frontend/src/components/LLMSelection/AnthropicAiOptions/index.jsx1-190

Step 8: Testing and Validation

Unit Testing Checklist

Test these scenarios for your provider:

Test Case	Validation
Constructor initialization	Environment variables validated, client initialized
Non-streaming completion	Returns `{textResponse, metrics}` with correct format
Streaming completion	Writes chunks via `writeResponseChunk()`
Context window limits	Returns correct token limit for models
Message compression	Fits within context window
Image attachment handling	Multimodal content formatted correctly
Error handling	Graceful failures with user-friendly messages
Abort handling	Clean disconnect on client abort
Token counting	Accurate metrics returned

Integration Testing

Configuration Flow:
- Set environment variables
- Verify provider appears in settings UI
- Select provider and model
- Save configuration
Chat Flow:
- Send message to workspace using provider
- Verify streaming response appears correctly
- Check chat history persists
- Validate metrics logged
Edge Cases:
- Test with very long prompts (context limit)
- Test with attachments (if supported)
- Test with empty chat history
- Test with invalid API key
- Test connection timeout scenarios

Sources: server/utils/AiProviders/ollama/index.js230-239 server/utils/AiProviders/openAi/index.js146-183

Common Patterns and Reference Examples

Pattern 1: OpenAI-Compatible Providers

For providers with OpenAI-compatible APIs (LMStudio, LocalAI, Mistral, etc.):

Use OpenAI SDK from openai package
Set custom baseURL in constructor
Minimal custom code required
Use handleDefaultStreamResponseV2() for streaming

Examples: server/utils/AiProviders/lmStudio/index.js12-289 server/utils/AiProviders/localAi/index.js10-198 server/utils/AiProviders/mistral/index.js10-191

Pattern 2: Native SDK Providers

For providers with dedicated SDKs (Anthropic, Ollama):

Import provider-specific SDK
Implement custom streaming handlers
Handle provider-specific message formats
Map API responses to standard format

Examples: server/utils/AiProviders/anthropic/index.js13-331 server/utils/AiProviders/ollama/index.js14-489

Pattern 3: Dual API Version Providers

For providers with multiple API versions (Gemini):

Fetch models from multiple endpoints
Cache model metadata
Handle experimental vs stable model flags
Support feature detection per model

Examples: server/utils/AiProviders/gemini/index.js27-457

Troubleshooting

Common Issues

Issue	Cause	Solution
"Provider not found"	Not registered in factory	Add to `getLLMProvider()`
Streaming doesn't work	`handleStream()` not implemented	Implement or use `handleDefaultStreamResponseV2()`
Context window errors	Incorrect `promptWindowLimit()`	Verify model context limits
Missing metrics	Not using `LLMPerformanceMonitor`	Wrap API calls correctly
Embeddings fail	Wrong embedder instance	Check embedder initialization
Frontend model list empty	Model fetch failing	Check API endpoint and caching

Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/ollama/index.js351-469

OpenAI Provider

The OpenAiLLM class at server/utils/AiProviders/openAi/index.js13-297 provides integration with OpenAI's API, supporting GPT models including GPT-4, GPT-4o, and o-series reasoning models.

Configuration

Environment Variable	Default	Required	Purpose
`OPEN_AI_KEY`	-	Yes	OpenAI API authentication
`OPEN_MODEL_PREF`	`gpt-4o`	No	Default model selection

Sources: server/utils/AiProviders/openAi/index.js14-22

Model Validation

OpenAI provider implements fast-path validation for GPT models to avoid unnecessary API calls:

This optimization prevents API latency on every chat request since GPT model names are well-known patterns.

Sources: server/utils/AiProviders/openAi/index.js64-80

Temperature Handling

The OpenAI provider implements special temperature handling for reasoning models:

Sources: server/utils/AiProviders/openAi/index.js134-144

Streaming Protocol

OpenAI uses a custom streaming response format distinct from standard OpenAI SDK streaming:

Sources: server/utils/AiProviders/openAi/index.js182-201 server/utils/AiProviders/openAi/index.js203-277

Context Window Management

Context limits are calculated as percentages of the model's prompt window from MODEL_MAP:

Limit Type	Percentage	Purpose
`history`	15%	Chat history allocation
`system`	15%	System prompt allocation
`user`	70%	User prompt allocation

Sources: server/utils/AiProviders/openAi/index.js23-27 server/utils/AiProviders/openAi/index.js56-62

Anthropic Provider

The AnthropicLLM class at server/utils/AiProviders/anthropic/index.js13-280 provides integration with Anthropic's Claude models using the official @anthropic-ai/sdk.

Configuration

Environment Variable	Default	Required	Purpose
`ANTHROPIC_API_KEY`	-	Yes	Anthropic API key
`ANTHROPIC_MODEL_PREF`	`claude-3-5-sonnet-20241022`	No	Default model

Sources: server/utils/AiProviders/anthropic/index.js14-28

Anthropic-Specific Message Format

Unlike OpenAI-compatible providers, Anthropic requires the system message to be separated from the messages array:

Sources: server/utils/AiProviders/anthropic/index.js84-104 server/utils/AiProviders/anthropic/index.js109-116

Image Attachment Format

Anthropic uses a different image format from OpenAI:

Sources: server/utils/AiProviders/anthropic/index.js65-82

Streaming Implementation

Anthropic uses the native SDK's streaming interface with event-based chunk handling:

Sources: server/utils/AiProviders/anthropic/index.js136-150 server/utils/AiProviders/anthropic/index.js159-244

Message Compression

Anthropic uses messageStringCompressor instead of the standard messageArrayCompressor due to its unique API structure:

Sources: server/utils/AiProviders/anthropic/index.js258-266

Azure OpenAI Provider

Configuration

Environment Variable	Required	Purpose
`AZURE_OPENAI_ENDPOINT`	Yes	Azure service endpoint URL
`AZURE_OPENAI_KEY`	Yes	Azure API key
`OPEN_MODEL_PREF`	Yes	Deployment name (user-defined)
`AZURE_OPENAI_MODEL_TYPE`	No	Set to `"reasoning"` for o-models
`AZURE_OPENAI_TOKEN_LIMIT`	No	Custom context window (default 4096)

Sources: server/utils/AiProviders/azureOpenAi/index.js11-24

Deployment-Based Model Selection

Azure OpenAI uses deployment names instead of standard model names. The deployment name can be any user-defined string:

Sources: server/utils/AiProviders/azureOpenAi/index.js82-87

Reasoning Model Support

When AZURE_OPENAI_MODEL_TYPE=reasoning, the provider adjusts behavior:

Aspect	Standard Models	Reasoning Models
Streaming	Enabled	Disabled
System Role	`"system"`	`"user"`
Temperature	Configurable	Omitted from request

API Versioning

The provider uses a specific API version for compatibility:

Sources: server/utils/AiProviders/azureOpenAi/index.js18

Google Gemini Provider

The GeminiLLM class at server/utils/AiProviders/gemini/index.js27-453 integrates Google's Gemini models through the OpenAI-compatible v1beta API endpoint.

Configuration

Environment Variable	Default	Required	Purpose
`GEMINI_API_KEY`	-	Yes	Google AI API key
`GEMINI_LLM_MODEL_PREF`	`gemini-2.0-flash-lite`	No	Default model

Sources: server/utils/AiProviders/gemini/index.js28-37

Model Caching System

Gemini implements a sophisticated model caching system to avoid frequent API calls:

Cache staleness is checked using a 24-hour threshold: MAX_STALE = 8.64e7 (1 day in milliseconds).

Sources: server/utils/AiProviders/gemini/index.js81-89 server/utils/AiProviders/gemini/index.js172-302

Experimental Model Detection

Gemini distinguishes between stable (v1) and experimental (v1beta) models:

All models use the v1beta/openai/ endpoint regardless of their version, as this endpoint supports both stable and experimental models.

Sources: server/utils/AiProviders/gemini/index.js39-40 server/utils/AiProviders/gemini/index.js149-163

System Prompt Limitations

Certain Gemini models do not support system prompts:

For these models, the system prompt is emulated through a user-assistant exchange:

Sources: server/utils/AiProviders/gemini/index.js20-25 server/utils/AiProviders/gemini/index.js70-72 server/utils/AiProviders/gemini/index.js349-368

Dynamic Context Window Resolution

Context windows are resolved from the cached model data:

Sources: server/utils/AiProviders/gemini/index.js126-141

Frontend Integration

The Gemini frontend component implements dynamic model fetching and groups models by stability:

Title: GeminiLLMOptions Component Flow

The component groups models using this logic:

Sources: frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx64-138 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx74-79

Ollama Provider

The OllamaAILLM class at server/utils/AiProviders/ollama/index.js13-413 provides integration with locally-hosted Ollama instances, supporting both local and remote deployments.

Configuration

Environment Variable	Default	Required	Purpose
`OLLAMA_BASE_PATH`	-	Yes	Ollama API endpoint URL
`OLLAMA_MODEL_PREF`	-	Yes	Model name (e.g., `llama2`)
`OLLAMA_AUTH_TOKEN`	-	No	Bearer token for auth
`OLLAMA_KEEP_ALIVE_TIMEOUT`	`300`	No	Model load timeout (seconds)
`OLLAMA_PERFORMANCE_MODE`	`base`	No	`base` or `max` context
`OLLAMA_RESPONSE_TIMEOUT`	-	No	Custom response timeout (ms)
`OLLAMA_MODEL_TOKEN_LIMIT`	-	No	User-defined context override

Sources: server/utils/AiProviders/ollama/index.js17-49

Automatic Context Window Caching

Ollama automatically caches context windows for all available models on initialization:

The cache is stored in the static property OllamaAILLM.modelContextWindows and persists for the server lifetime.

Sources: server/utils/AiProviders/ollama/index.js14-15 server/utils/AiProviders/ollama/index.js68-104

Performance Modes

Ollama supports two performance modes that control context usage:

Mode	Behavior
`base`	Uses default context handling
`max`	Sets `num_ctx` to full `promptWindowLimit()`

Sources: server/utils/AiProviders/ollama/index.js24 server/utils/AiProviders/ollama/index.js248-255

Custom Timeout Handling

For slow hardware, Ollama supports custom response timeouts using undici Agent:

Sources: server/utils/AiProviders/ollama/index.js124-153

Image Handling

Ollama uses base64-only format for images (no data URL prefix):

Messages with images use spread format: { role: "user", content: "text", images: [...] }

Sources: server/utils/AiProviders/ollama/index.js191-197 server/utils/AiProviders/ollama/index.js232

Error Enhancement

Ollama includes error handlers that provide more user-friendly messages:

Sources: server/utils/AiProviders/ollama/index.js203-212

Open-Source and Self-Hosted Providers

AnythingLLM supports multiple open-source and self-hosted provider options, all using OpenAI-compatible APIs.

LMStudio Provider

The LMStudioLLM class at server/utils/AiProviders/lmStudio/index.js12-284 integrates with LMStudio's local OpenAI-compatible server.

Configuration

Environment Variable	Default	Required
`LMSTUDIO_BASE_PATH`	-	Yes
`LMSTUDIO_MODEL_PREF`	`"Loaded from Chat UI"`	No
`LMSTUDIO_MODEL_TOKEN_LIMIT`	-	No

The default model name "Loaded from Chat UI" is required for LMStudio versions ≥0.2.17 due to a bug in multi-model chat support.

Sources: server/utils/AiProviders/lmStudio/index.js16-34

Base Path Parsing

LMStudio automatically normalizes the base path to ensure it ends in /v1:

Sources: server/utils/AiProviders/lmStudio/index.js270-278

Context Window Caching

Similar to Ollama, LMStudio caches context windows by fetching from /api/v0/models:

Sources: server/utils/AiProviders/lmStudio/index.js68-99

LocalAI Provider

The LocalAiLLM class at server/utils/AiProviders/localAi/index.js10-192 provides integration with LocalAI deployments.

Configuration

Environment Variable	Default	Required
`LOCAL_AI_BASE_PATH`	-	Yes
`LOCAL_AI_MODEL_PREF`	-	Yes
`LOCAL_AI_API_KEY`	`null`	No
`LOCAL_AI_MODEL_TOKEN_LIMIT`	`4096`	No

LocalAI uses manual token counting for metrics since the API may not return usage data:

Sources: server/utils/AiProviders/localAi/index.js11-29 server/utils/AiProviders/localAi/index.js135-146

TogetherAI Provider

The TogetherAiLLM class at server/utils/AiProviders/togetherAi/index.js80-258 integrates with Together.ai's model hosting platform.

Model Caching

TogetherAI implements 1-week model caching:

Models are filtered to only include chat-capable models and include organization metadata.

Sources: server/utils/AiProviders/togetherAi/index.js18-78

HuggingFace Provider

The HuggingFaceLLM class at server/utils/AiProviders/huggingface/index.js9-159 connects to HuggingFace Inference Endpoints.

Configuration

Environment Variable	Required
`HUGGING_FACE_LLM_ENDPOINT`	Yes
`HUGGING_FACE_LLM_API_KEY`	Yes
`HUGGING_FACE_LLM_TOKEN_LIMIT`	Yes

The model parameter is stubbed to "tgi" (Text Generation Inference) since HuggingFace endpoints run a single model:

Sources: server/utils/AiProviders/huggingface/index.js10-24

System Prompt Emulation

HuggingFace does not support system prompts natively, so they are emulated:

Sources: server/utils/AiProviders/huggingface/index.js76-83

Mistral Provider

The MistralLLM class at server/utils/AiProviders/mistral/index.js10-187 provides integration with Mistral AI's API.

Configuration

Environment Variable	Default	Required
`MISTRAL_API_KEY`	-	Yes
`MISTRAL_MODEL_PREF`	`mistral-tiny`	No

Mistral uses a fixed 32K context window and defaults to zero temperature:

Sources: server/utils/AiProviders/mistral/index.js11-30 server/utils/AiProviders/mistral/index.js54-60

Embedding Integration

All LLM provider classes delegate embedding functionality to a separate embedder instance through standardized wrapper methods. By default, providers use NativeEmbedder, but this can be overridden:

Title: Embedding Delegation Pattern in Provider Classes

Example implementation from OpenAiLLM:

This pattern allows the chat engine to inject custom embedders (e.g., OpenAiEmbedder, AzureOpenAiEmbedder) when needed, while maintaining a consistent interface across all providers.

Error Handling and Performance Monitoring

Both providers implement comprehensive error handling and performance monitoring through the LLMPerformanceMonitor system:

Metric	Description	Calculation
`prompt_tokens`	Input token count	From API response
`completion_tokens`	Output token count	From API response
`total_tokens`	Combined token count	Sum of prompt + completion
`outputTps`	Output tokens per second	`completion_tokens / duration`
`duration`	Request duration	Measured by performance monitor

Sources: server/utils/AiProviders/openAi/index.js151-178 server/utils/AiProviders/azureOpenAi/index.js138-161