This guide provides a step-by-step process for implementing a new LLM provider class in AnythingLLM. All providers implement a standardized interface while accommodating provider-specific features such as streaming protocols, authentication mechanisms, and model discovery.
Related Pages:
New LLM providers are added to the codebase by creating a class that implements the standard provider interface. The class must be placed in server/utils/AiProviders/<provider-name>/index.js and registered in the provider factory at server/utils/helpers/chat/index.js.
Title: Provider Integration Architecture
Sources: server/utils/AiProviders/openAi/index.js13-297 server/utils/AiProviders/anthropic/index.js13-331 server/utils/AiProviders/ollama/index.js14-489
Create a new directory under server/utils/AiProviders/<provider-name>/ with an index.js file. The basic class structure follows this template:
Sources: server/utils/AiProviders/openAi/index.js13-34 server/utils/AiProviders/ollama/index.js14-46 server/utils/AiProviders/gemini/index.js27-62
All provider classes must implement these methods:
| Method | Return Type | Purpose |
|---|---|---|
constructPrompt() | Array<Message> | Build message array from prompt components |
getChatCompletion() | Promise<{textResponse, metrics}> | Non-streaming completion |
streamGetChatCompletion() | Promise<Stream> | Initialize streaming completion |
handleStream() | Promise<string> | Process stream chunks and write responses |
compressMessages() | Promise<Array<Message>> | Fit messages within context window |
embedTextInput() | Promise<Array<number>> | Delegate to embedder |
embedChunks() | Promise<Array<Array<number>>> | Delegate to embedder |
promptWindowLimit() | number | Return context window size |
streamingEnabled() | boolean | Check if streaming is available |
isValidChatCompletionModel() | Promise<boolean> | Validate model name |
Sources: server/utils/AiProviders/openAi/index.js52-296 server/utils/AiProviders/anthropic/index.js46-327
This method assembles the message array from component parts:
Note: Some providers (e.g., Anthropic, Azure reasoning models) require different message structures. See provider-specific patterns below.
Sources: server/utils/AiProviders/openAi/index.js107-126 server/utils/AiProviders/ollama/index.js246-265 server/utils/AiProviders/anthropic/index.js128-148
This method executes a non-streaming completion request:
Key Points:
LLMPerformanceMonitor.measureAsyncFunction()Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/lmStudio/index.js215-248
Initialize the streaming request with performance monitoring:
Sources: server/utils/AiProviders/openAi/index.js185-206 server/utils/AiProviders/gemini/index.js415-433
Process stream chunks and write to response:
For providers with standard OpenAI-compatible streaming, use the helper:
Sources: server/utils/AiProviders/ollama/index.js351-469 server/utils/AiProviders/gemini/index.js435-437 server/utils/AiProviders/lmStudio/index.js271-273
Sources: server/utils/AiProviders/openAi/index.js56-62 server/utils/AiProviders/ollama/index.js170-207 server/utils/AiProviders/gemini/index.js107-141
Sources: server/utils/AiProviders/openAi/index.js285-290 server/utils/AiProviders/ollama/index.js472-477
For providers with unique message formats (e.g., Anthropic):
Sources: server/utils/AiProviders/openAi/index.js292-296 server/utils/AiProviders/anthropic/index.js310-318
Sources: server/utils/AiProviders/openAi/index.js69-80 server/utils/AiProviders/localAi/index.js64-66 server/utils/AiProviders/lmStudio/index.js160-164
Some providers require the system prompt outside the messages array:
Sources: server/utils/AiProviders/anthropic/index.js150-162
Reasoning models (o1, o3) require special handling:
Sources: server/utils/AiProviders/azureOpenAi/index.js30-31 server/utils/AiProviders/azureOpenAi/index.js137-139 server/utils/AiProviders/azureOpenAi/index.js160-161
Ollama exposes reasoning tokens separately, which should be wrapped in tags:
Sources: server/utils/AiProviders/ollama/index.js398-450
Some models don't support system prompts and require emulation:
Sources: server/utils/AiProviders/gemini/index.js70-72 server/utils/AiProviders/gemini/index.js349-368
For providers with model listing APIs:
Sources: server/utils/AiProviders/gemini/index.js172-302 server/utils/AiProviders/togetherAi/index.js18-78
For self-hosted providers that expose model metadata:
Sources: server/utils/AiProviders/ollama/index.js78-115 server/utils/AiProviders/lmStudio/index.js73-112
Add the provider to getLLMProvider() in server/utils/helpers/chat/index.js:
Add to server/utils/helpers/updateENV.js KEY_MAPPING:
Create frontend/src/components/LLMSelection/YourProviderOptions/index.jsx:
Sources: frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx1-138 frontend/src/components/LLMSelection/AnthropicAiOptions/index.jsx1-190
Test these scenarios for your provider:
| Test Case | Validation |
|---|---|
| Constructor initialization | Environment variables validated, client initialized |
| Non-streaming completion | Returns {textResponse, metrics} with correct format |
| Streaming completion | Writes chunks via writeResponseChunk() |
| Context window limits | Returns correct token limit for models |
| Message compression | Fits within context window |
| Image attachment handling | Multimodal content formatted correctly |
| Error handling | Graceful failures with user-friendly messages |
| Abort handling | Clean disconnect on client abort |
| Token counting | Accurate metrics returned |
Configuration Flow:
Chat Flow:
Edge Cases:
Sources: server/utils/AiProviders/ollama/index.js230-239 server/utils/AiProviders/openAi/index.js146-183
For providers with OpenAI-compatible APIs (LMStudio, LocalAI, Mistral, etc.):
OpenAI SDK from openai packagebaseURL in constructorhandleDefaultStreamResponseV2() for streamingExamples: server/utils/AiProviders/lmStudio/index.js12-289 server/utils/AiProviders/localAi/index.js10-198 server/utils/AiProviders/mistral/index.js10-191
For providers with dedicated SDKs (Anthropic, Ollama):
Examples: server/utils/AiProviders/anthropic/index.js13-331 server/utils/AiProviders/ollama/index.js14-489
For providers with multiple API versions (Gemini):
Examples: server/utils/AiProviders/gemini/index.js27-457
| Issue | Cause | Solution |
|---|---|---|
| "Provider not found" | Not registered in factory | Add to getLLMProvider() |
| Streaming doesn't work | handleStream() not implemented | Implement or use handleDefaultStreamResponseV2() |
| Context window errors | Incorrect promptWindowLimit() | Verify model context limits |
| Missing metrics | Not using LLMPerformanceMonitor | Wrap API calls correctly |
| Embeddings fail | Wrong embedder instance | Check embedder initialization |
| Frontend model list empty | Model fetch failing | Check API endpoint and caching |
Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/ollama/index.js351-469
The OpenAiLLM class at server/utils/AiProviders/openAi/index.js13-297 provides integration with OpenAI's API, supporting GPT models including GPT-4, GPT-4o, and o-series reasoning models.
| Environment Variable | Default | Required | Purpose |
|---|---|---|---|
OPEN_AI_KEY | - | Yes | OpenAI API authentication |
OPEN_MODEL_PREF | gpt-4o | No | Default model selection |
Sources: server/utils/AiProviders/openAi/index.js14-22
OpenAI provider implements fast-path validation for GPT models to avoid unnecessary API calls:
This optimization prevents API latency on every chat request since GPT model names are well-known patterns.
Sources: server/utils/AiProviders/openAi/index.js64-80
The OpenAI provider implements special temperature handling for reasoning models:
Sources: server/utils/AiProviders/openAi/index.js134-144
OpenAI uses a custom streaming response format distinct from standard OpenAI SDK streaming:
Sources: server/utils/AiProviders/openAi/index.js182-201 server/utils/AiProviders/openAi/index.js203-277
Context limits are calculated as percentages of the model's prompt window from MODEL_MAP:
| Limit Type | Percentage | Purpose |
|---|---|---|
history | 15% | Chat history allocation |
system | 15% | System prompt allocation |
user | 70% | User prompt allocation |
Sources: server/utils/AiProviders/openAi/index.js23-27 server/utils/AiProviders/openAi/index.js56-62
The AnthropicLLM class at server/utils/AiProviders/anthropic/index.js13-280 provides integration with Anthropic's Claude models using the official @anthropic-ai/sdk.
| Environment Variable | Default | Required | Purpose |
|---|---|---|---|
ANTHROPIC_API_KEY | - | Yes | Anthropic API key |
ANTHROPIC_MODEL_PREF | claude-3-5-sonnet-20241022 | No | Default model |
Sources: server/utils/AiProviders/anthropic/index.js14-28
Unlike OpenAI-compatible providers, Anthropic requires the system message to be separated from the messages array:
Sources: server/utils/AiProviders/anthropic/index.js84-104 server/utils/AiProviders/anthropic/index.js109-116
Anthropic uses a different image format from OpenAI:
Sources: server/utils/AiProviders/anthropic/index.js65-82
Anthropic uses the native SDK's streaming interface with event-based chunk handling:
Sources: server/utils/AiProviders/anthropic/index.js136-150 server/utils/AiProviders/anthropic/index.js159-244
Anthropic uses messageStringCompressor instead of the standard messageArrayCompressor due to its unique API structure:
Sources: server/utils/AiProviders/anthropic/index.js258-266
The AzureOpenAiLLM class at server/utils/AiProviders/azureOpenAi/index.js10-206 extends OpenAI functionality for Microsoft Azure's OpenAI service with enterprise authentication and reasoning model support.
| Environment Variable | Required | Purpose |
|---|---|---|
AZURE_OPENAI_ENDPOINT | Yes | Azure service endpoint URL |
AZURE_OPENAI_KEY | Yes | Azure API key |
OPEN_MODEL_PREF | Yes | Deployment name (user-defined) |
AZURE_OPENAI_MODEL_TYPE | No | Set to "reasoning" for o-models |
AZURE_OPENAI_TOKEN_LIMIT | No | Custom context window (default 4096) |
Sources: server/utils/AiProviders/azureOpenAi/index.js11-24
Azure OpenAI uses deployment names instead of standard model names. The deployment name can be any user-defined string:
Sources: server/utils/AiProviders/azureOpenAi/index.js82-87
When AZURE_OPENAI_MODEL_TYPE=reasoning, the provider adjusts behavior:
| Aspect | Standard Models | Reasoning Models |
|---|---|---|
| Streaming | Enabled | Disabled |
| System Role | "system" | "user" |
| Temperature | Configurable | Omitted from request |
Sources: server/utils/AiProviders/azureOpenAi/index.js25-26 server/utils/AiProviders/azureOpenAi/index.js56-65 server/utils/AiProviders/azureOpenAi/index.js119 server/utils/AiProviders/azureOpenAi/index.js142
The provider uses a specific API version for compatibility:
Sources: server/utils/AiProviders/azureOpenAi/index.js18
The GeminiLLM class at server/utils/AiProviders/gemini/index.js27-453 integrates Google's Gemini models through the OpenAI-compatible v1beta API endpoint.
| Environment Variable | Default | Required | Purpose |
|---|---|---|---|
GEMINI_API_KEY | - | Yes | Google AI API key |
GEMINI_LLM_MODEL_PREF | gemini-2.0-flash-lite | No | Default model |
Sources: server/utils/AiProviders/gemini/index.js28-37
Gemini implements a sophisticated model caching system to avoid frequent API calls:
Cache staleness is checked using a 24-hour threshold: MAX_STALE = 8.64e7 (1 day in milliseconds).
Sources: server/utils/AiProviders/gemini/index.js81-89 server/utils/AiProviders/gemini/index.js172-302
Gemini distinguishes between stable (v1) and experimental (v1beta) models:
All models use the v1beta/openai/ endpoint regardless of their version, as this endpoint supports both stable and experimental models.
Sources: server/utils/AiProviders/gemini/index.js39-40 server/utils/AiProviders/gemini/index.js149-163
Certain Gemini models do not support system prompts:
For these models, the system prompt is emulated through a user-assistant exchange:
Sources: server/utils/AiProviders/gemini/index.js20-25 server/utils/AiProviders/gemini/index.js70-72 server/utils/AiProviders/gemini/index.js349-368
Context windows are resolved from the cached model data:
Sources: server/utils/AiProviders/gemini/index.js126-141
The Gemini frontend component implements dynamic model fetching and groups models by stability:
Title: GeminiLLMOptions Component Flow
The component groups models using this logic:
Sources: frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx64-138 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx74-79
The OllamaAILLM class at server/utils/AiProviders/ollama/index.js13-413 provides integration with locally-hosted Ollama instances, supporting both local and remote deployments.
| Environment Variable | Default | Required | Purpose |
|---|---|---|---|
OLLAMA_BASE_PATH | - | Yes | Ollama API endpoint URL |
OLLAMA_MODEL_PREF | - | Yes | Model name (e.g., llama2) |
OLLAMA_AUTH_TOKEN | - | No | Bearer token for auth |
OLLAMA_KEEP_ALIVE_TIMEOUT | 300 | No | Model load timeout (seconds) |
OLLAMA_PERFORMANCE_MODE | base | No | base or max context |
OLLAMA_RESPONSE_TIMEOUT | - | No | Custom response timeout (ms) |
OLLAMA_MODEL_TOKEN_LIMIT | - | No | User-defined context override |
Sources: server/utils/AiProviders/ollama/index.js17-49
Ollama automatically caches context windows for all available models on initialization:
The cache is stored in the static property OllamaAILLM.modelContextWindows and persists for the server lifetime.
Sources: server/utils/AiProviders/ollama/index.js14-15 server/utils/AiProviders/ollama/index.js68-104
Ollama supports two performance modes that control context usage:
| Mode | Behavior |
|---|---|
base | Uses default context handling |
max | Sets num_ctx to full promptWindowLimit() |
Sources: server/utils/AiProviders/ollama/index.js24 server/utils/AiProviders/ollama/index.js248-255
For slow hardware, Ollama supports custom response timeouts using undici Agent:
Sources: server/utils/AiProviders/ollama/index.js124-153
Ollama uses base64-only format for images (no data URL prefix):
Messages with images use spread format: { role: "user", content: "text", images: [...] }
Sources: server/utils/AiProviders/ollama/index.js191-197 server/utils/AiProviders/ollama/index.js232
Ollama includes error handlers that provide more user-friendly messages:
Sources: server/utils/AiProviders/ollama/index.js203-212
AnythingLLM supports multiple open-source and self-hosted provider options, all using OpenAI-compatible APIs.
The LMStudioLLM class at server/utils/AiProviders/lmStudio/index.js12-284 integrates with LMStudio's local OpenAI-compatible server.
| Environment Variable | Default | Required |
|---|---|---|
LMSTUDIO_BASE_PATH | - | Yes |
LMSTUDIO_MODEL_PREF | "Loaded from Chat UI" | No |
LMSTUDIO_MODEL_TOKEN_LIMIT | - | No |
The default model name "Loaded from Chat UI" is required for LMStudio versions ≥0.2.17 due to a bug in multi-model chat support.
Sources: server/utils/AiProviders/lmStudio/index.js16-34
LMStudio automatically normalizes the base path to ensure it ends in /v1:
Sources: server/utils/AiProviders/lmStudio/index.js270-278
Similar to Ollama, LMStudio caches context windows by fetching from /api/v0/models:
Sources: server/utils/AiProviders/lmStudio/index.js68-99
The LocalAiLLM class at server/utils/AiProviders/localAi/index.js10-192 provides integration with LocalAI deployments.
| Environment Variable | Default | Required |
|---|---|---|
LOCAL_AI_BASE_PATH | - | Yes |
LOCAL_AI_MODEL_PREF | - | Yes |
LOCAL_AI_API_KEY | null | No |
LOCAL_AI_MODEL_TOKEN_LIMIT | 4096 | No |
LocalAI uses manual token counting for metrics since the API may not return usage data:
Sources: server/utils/AiProviders/localAi/index.js11-29 server/utils/AiProviders/localAi/index.js135-146
The TogetherAiLLM class at server/utils/AiProviders/togetherAi/index.js80-258 integrates with Together.ai's model hosting platform.
TogetherAI implements 1-week model caching:
Models are filtered to only include chat-capable models and include organization metadata.
Sources: server/utils/AiProviders/togetherAi/index.js18-78
The HuggingFaceLLM class at server/utils/AiProviders/huggingface/index.js9-159 connects to HuggingFace Inference Endpoints.
| Environment Variable | Required |
|---|---|
HUGGING_FACE_LLM_ENDPOINT | Yes |
HUGGING_FACE_LLM_API_KEY | Yes |
HUGGING_FACE_LLM_TOKEN_LIMIT | Yes |
The model parameter is stubbed to "tgi" (Text Generation Inference) since HuggingFace endpoints run a single model:
Sources: server/utils/AiProviders/huggingface/index.js10-24
HuggingFace does not support system prompts natively, so they are emulated:
Sources: server/utils/AiProviders/huggingface/index.js76-83
The MistralLLM class at server/utils/AiProviders/mistral/index.js10-187 provides integration with Mistral AI's API.
| Environment Variable | Default | Required |
|---|---|---|
MISTRAL_API_KEY | - | Yes |
MISTRAL_MODEL_PREF | mistral-tiny | No |
Mistral uses a fixed 32K context window and defaults to zero temperature:
Sources: server/utils/AiProviders/mistral/index.js11-30 server/utils/AiProviders/mistral/index.js54-60
All LLM provider classes delegate embedding functionality to a separate embedder instance through standardized wrapper methods. By default, providers use NativeEmbedder, but this can be overridden:
Title: Embedding Delegation Pattern in Provider Classes
Example implementation from OpenAiLLM:
This pattern allows the chat engine to inject custom embedders (e.g., OpenAiEmbedder, AzureOpenAiEmbedder) when needed, while maintaining a consistent interface across all providers.
Sources: server/utils/AiProviders/openAi/index.js16 server/utils/AiProviders/openAi/index.js273-278 server/utils/AiProviders/anthropic/index.js35 server/utils/AiProviders/anthropic/index.js269-274 server/utils/AiProviders/gemini/index.js52 server/utils/AiProviders/gemini/index.js441-446
Both providers implement comprehensive error handling and performance monitoring through the LLMPerformanceMonitor system:
| Metric | Description | Calculation |
|---|---|---|
prompt_tokens | Input token count | From API response |
completion_tokens | Output token count | From API response |
total_tokens | Combined token count | Sum of prompt + completion |
outputTps | Output tokens per second | completion_tokens / duration |
duration | Request duration | Measured by performance monitor |
Sources: server/utils/AiProviders/openAi/index.js151-178 server/utils/AiProviders/azureOpenAi/index.js138-161
Refresh this wiki