LLM Provider Integration

Relevant source files

This document covers the LLM provider architecture in AnythingLLM, including the common interface that all providers implement, the factory pattern for provider selection, core methods, and provider-specific features. It explains how to work with existing providers and how to add new ones.

For information about using LLM providers in the chat system, see Chat System Architecture. For workspace-level LLM configuration, see Workspace Configuration. For agent-specific provider usage, see Agent System.

Provider Architecture

AnythingLLM supports 30+ LLM providers through a polymorphic architecture where each provider implements a common interface. This design allows the system to treat all providers uniformly while handling vendor-specific quirks internally.

Common Interface

All LLM provider classes implement the same set of core methods:

Method	Purpose	Returns
`constructPrompt()`	Formats messages for provider API	Message array
`getChatCompletion()`	Synchronous chat completion	`{textResponse, metrics}`
`streamGetChatCompletion()`	Initiates streaming response	Stream object
`handleStream()`	Processes streaming chunks	Promise resolving to full text
`embedTextInput()`	Embeds query text	Vector array
`embedChunks()`	Embeds document chunks	Vector arrays
`compressMessages()`	Fits messages to context window	Compressed message array
`promptWindowLimit()`	Returns context window size	Number (tokens)
`isValidChatCompletionModel()`	Validates model name	Boolean
`streamingEnabled()`	Checks streaming support	Boolean

Sources: server/utils/AiProviders/openAi/index.js13-301 server/utils/AiProviders/anthropic/index.js13-331 server/utils/AiProviders/ollama/index.js14-489 server/utils/AiProviders/gemini/index.js27-457

Constructor Pattern

All provider constructors follow a consistent pattern:

Validate environment variables - Throw error if required credentials missing
Initialize API client - Create provider-specific client instance
Set model preference - Use modelPreference parameter or environment variable
Configure context limits - Calculate token limits for history/system/user prompts
Pair embedder - Use provided embedder or default to NativeEmbedder
Log initialization - Output configuration details to console

Sources: server/utils/AiProviders/openAi/index.js14-34 server/utils/AiProviders/ollama/index.js18-46 server/utils/AiProviders/gemini/index.js28-62 server/utils/AiProviders/anthropic/index.js14-40

Provider Factory Pattern

Provider selection uses a factory pattern (mentioned in Diagram 5 of the high-level overview). The getLLMProvider() function:

Reads LLM_PROVIDER environment variable or workspace override
Instantiates the appropriate provider class
Passes model preference if specified at workspace level
Returns instance implementing common interface

This allows runtime provider switching without code changes.

Sources: Diagram 5 from high-level architecture

Core Methods Implementation

constructPrompt()

The constructPrompt() method formats input into provider-specific message arrays. All providers accept the same parameters:

Message Structure:

System message: Combines systemPrompt + formatted contextTexts
Chat history: Previous user/assistant exchanges
User message: Current prompt + attachments

Context formatting uses #appendContext() helper:

This wraps each context chunk in XML-style tags for clear delineation.

Attachment Handling:

Providers generate content arrays for multimodal messages:

Provider Variations:

Provider	Message Format	Special Handling
OpenAI	Standard `{role, content}`	`input_text` type for O-models
Anthropic	Standard, system separate	System as separate parameter
Ollama	Standard	Spreads attachments with `formatChatHistory`
Gemini	Standard	Models without system support use user/assistant emulation
Azure	Standard	O-type models use `user` role for system

Sources: server/utils/AiProviders/openAi/index.js107-126 server/utils/AiProviders/ollama/index.js246-265 server/utils/AiProviders/gemini/index.js341-378 server/utils/AiProviders/anthropic/index.js128-148 server/utils/AiProviders/azureOpenAi/index.js129-148

getChatCompletion()

Synchronous completion method that returns full response at once. Standard implementation:

Validate model with isValidChatCompletionModel()
Wrap API call in LLMPerformanceMonitor.measureAsyncFunction()
Extract response text from provider-specific format
Return {textResponse, metrics} object

Metrics Structure:

prompt_tokens - Input tokens consumed
completion_tokens - Output tokens generated
total_tokens - Sum of prompt + completion
outputTps - Tokens per second (completion/duration)
duration - API call duration in seconds
model - Model identifier used
provider - Provider class name
timestamp - Completion time

Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/gemini/index.js380-413 server/utils/AiProviders/anthropic/index.js150-183

streamGetChatCompletion()

Initiates streaming response. Returns a MonitoredStream object wrapped by LLMPerformanceMonitor:

The measureStream() wrapper:

Tracks token counts (calculated or from API)
Measures streaming duration
Provides endMeasurement() for finalizing metrics

Sources: server/utils/AiProviders/openAi/index.js185-206 server/utils/AiProviders/ollama/index.js321-342 server/utils/AiProviders/gemini/index.js415-433

handleStream()

Processes streaming chunks and writes to HTTP response. Standard pattern:

Chunk Processing Variations:

Provider	Chunk Format	Notes
OpenAI	`{type, delta}`	Uses `response.output_text.delta`
Anthropic	`{type, delta}`	Custom event types `content_block_delta`, `message_stop`
Ollama	`{message: {content}}`	Special handling for `thinking` property (reasoning tokens)
Gemini	Standard OpenAI format	Via OpenAI-compatible endpoint

Some providers use handleDefaultStreamResponseV2() helper for standard OpenAI-format streams.

Sources: server/utils/AiProviders/openAi/index.js208-282 server/utils/AiProviders/ollama/index.js351-469 server/utils/AiProviders/anthropic/index.js211-296

compressMessages()

Ensures messages fit within context window by intelligently truncating. Two strategies exist:

Array Compressor (for OpenAI-format messages):

String Compressor (for providers with different formats):

The compressor:

Calculates token counts for system, user, and history messages
Compares against this.limits (typically 15%/15%/70% split)
Truncates history if exceeding limits
Preserves system prompt and current user message

Sources: server/utils/AiProviders/openAi/index.js292-296 server/utils/AiProviders/anthropic/index.js310-318 server/utils/AiProviders/ollama/index.js479-484

Context Window Management

Context window management varies significantly across providers due to different model capabilities and API behaviors.

Static vs Dynamic Context Windows

Static Context Windows: Some providers have fixed limits defined in MODEL_MAP:

Dynamic Context Windows: Others fetch limits from provider APIs:

Ollama Context Window Caching:

Ollama caches context windows on first initialization:

User Override: User-defined limits take precedence but cannot exceed system limits:

Sources: server/utils/AiProviders/ollama/index.js78-115 server/utils/AiProviders/ollama/index.js170-207 server/utils/AiProviders/lmStudio/index.js73-112

Gemini Model Context Windows

Gemini uses filesystem caching for model metadata:

Cache structure:

Cache expires after 1 day (8.64e7 milliseconds).

Sources: server/utils/AiProviders/gemini/index.js172-302 server/utils/AiProviders/gemini/index.js81-89 server/utils/AiProviders/gemini/index.js107-141

Context Limit Allocation

All providers use a similar allocation strategy (15%/15%/70%):

Lazy initialization pattern for performance:

Sources: server/utils/AiProviders/openAi/index.js23-27 server/utils/AiProviders/ollama/index.js56-67

Performance Monitoring

All provider calls are wrapped in LLMPerformanceMonitor for metrics collection.

Async Function Measurement

For synchronous completions:

The monitor:

Records start time
Executes async function
Calculates duration
Wraps result with timing metadata

Sources: server/utils/AiProviders/openAi/index.js152-163 server/utils/AiProviders/gemini/index.js381-392

Stream Measurement

For streaming completions:

The monitored stream:

Wraps original stream as async iterable
Counts tokens if runPromptTokenCalculation: true
Provides endMeasurement(usage) method to finalize metrics
Tracks streaming duration

Token Counting:

When runPromptTokenCalculation: true, the monitor uses tiktoken or similar to count tokens since some providers don't return usage metadata.

Finalizing Metrics:

Stream handlers call endMeasurement() with final usage data:

Sources: server/utils/AiProviders/ollama/index.js322-342 server/utils/AiProviders/openAi/index.js191-206 server/utils/AiProviders/ollama/index.js367-393

Metrics Collection Flow

Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/ollama/index.js321-342 server/utils/AiProviders/ollama/index.js351-469

Provider-Specific Features

Ollama: Reasoning Token Support

Ollama v0.9.0+ supports reasoning tokens via the thinking property. These are wrapped in `"; writeResponseChunk(response, { /* ... /, textResponse: endTag }); fullText += reasoningText + endTag; reasoningText = ""; } fullText += content; writeResponseChunk(response, { / ... */, textResponse: content }); }


**Non-streaming:**
```javascript
let content = res.message.content;
if (res.message.thinking)
  content = `${content}`;

This makes reasoning visible in the chat UI for models that support chain-of-thought.

Custom Timeout Support:

Ollama allows custom timeouts for slow responses via OLLAMA_RESPONSE_TIMEOUT:

Sources: server/utils/AiProviders/ollama/index.js398-451 server/utils/AiProviders/ollama/index.js282-284 server/utils/AiProviders/ollama/index.js135-164

Anthropic: Prompt Caching

Anthropic supports prompt caching to reduce costs for repeated system prompts:

System Prompt Builder:

Configuration via ANTHROPIC_CACHE_CONTROL:

5m - Cache for 5 minutes
1h - Cache for 1 hour
none or unset - No caching

Benefits:

Reduced token costs for repeated prompts
Faster response times (cached prompts don't count)
Automatic by Anthropic if content > 1024 tokens

Sources: server/utils/AiProviders/anthropic/index.js73-86 server/utils/AiProviders/anthropic/index.js93-102 frontend/src/components/LLMSelection/AnthropicAiOptions/index.jsx64-85

Gemini: Experimental Models

Gemini distinguishes between stable (v1) and experimental (v1beta) models:

All models use the v1beta OpenAI-compatible endpoint:

System Prompt Support:

Some models don't support system prompts (Gemma variants):

Emulation for unsupported models:

Frontend Model Grouping:

Gemini models are grouped by stability in UI:

Sources: server/utils/AiProviders/gemini/index.js149-163 server/utils/AiProviders/gemini/index.js39-44 server/utils/AiProviders/gemini/index.js70-72 server/utils/AiProviders/gemini/index.js349-368 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx74-80

Azure OpenAI: O-Type Models

Azure OpenAI deployments support reasoning models (o1, o1-mini, o3-mini) which require different handling:

System Message Handling:

O-type models don't accept system role:

Temperature Override:

O-type models don't support temperature:

Note: Azure doesn't expose model metadata, so users must manually configure AZURE_OPENAI_MODEL_TYPE="reasoning" for reasoning models.

Sources: server/utils/AiProviders/azureOpenAi/index.js30-31 server/utils/AiProviders/azureOpenAi/index.js137-138 server/utils/AiProviders/azureOpenAi/index.js160

TogetherAI: Model List Caching

TogetherAI fetches and caches model list from API:

Context Window Lookup:

Sources: server/utils/AiProviders/togetherAi/index.js18-78 server/utils/AiProviders/togetherAi/index.js148-152

Adding New Providers

To add a new LLM provider, follow this implementation checklist:

1. Create Provider Class File

Create server/utils/AiProviders/{provider-name}/index.js:

2. Implement Required Methods

Context Formatting:

Attachment Handling:

Construct Prompt:

Sync Completion:

Stream Completion:

Handle Stream:

Utility Methods:

Embedding Methods:

Message Compression:

3. Register Provider

Add to getLLMProvider() factory (location varies, typically in a provider loader):

4. Add Frontend Configuration

Create frontend/src/components/LLMSelection/NewProviderOptions/index.jsx:

5. Add Model Discovery Hook

If provider supports dynamic model listing:

6. Environment Variable Documentation

Add to server/.env.example:

Testing Checklist

Sources: server/utils/AiProviders/openAi/index.js13-301 server/utils/AiProviders/mistral/index.js10-191 frontend/src/components/LLMSelection/GeminiLLMOptions/index.jsx4-138 frontend/src/hooks/useGetProvidersModels.js58-86

Provider Configuration Matrix

Provider	API Type	Streaming	Vision	Context Window Source	Model Discovery	Special Features
OpenAI	OpenAI SDK	✓	✓	MODEL_MAP + API	API models endpoint	O-model support
Anthropic	Anthropic SDK	✓	✓	MODEL_MAP	Hardcoded list	Prompt caching
Ollama	Ollama SDK	✓	✓	API discovery	`/api/tags` endpoint	Reasoning tokens, custom timeout
Gemini	OpenAI-compatible	✓	✓	API discovery (cached)	`/v1/models` + `/v1beta/models`	Experimental models, system prompt emulation
Azure OpenAI	OpenAI SDK	✓	✓	Environment variable	User-configured	O-type model support
LM Studio	OpenAI SDK	✓	✓	API discovery	`/api/v0/models`	Multi-model chat
TogetherAI	OpenAI SDK	✓	✓	API discovery (cached)	`/v1/models`	Large model catalog
HuggingFace	OpenAI-compatible	✓	✗	Environment variable	N/A	Inference endpoints
LocalAI	OpenAI SDK	✓	✓	Environment variable	User-configured	Self-hosted
Mistral	OpenAI SDK	✓	✓	Fixed (32000)	Hardcoded	-

Sources: All provider implementation files listed above

LLM Provider Integration

Relevant source files

Provider Architecture

Common Interface

All LLM provider classes implement the same set of core methods:

Method	Purpose	Returns
`constructPrompt()`	Formats messages for provider API	Message array
`getChatCompletion()`	Synchronous chat completion	`{textResponse, metrics}`
`streamGetChatCompletion()`	Initiates streaming response	Stream object
`handleStream()`	Processes streaming chunks	Promise resolving to full text
`embedTextInput()`	Embeds query text	Vector array
`embedChunks()`	Embeds document chunks	Vector arrays
`compressMessages()`	Fits messages to context window	Compressed message array
`promptWindowLimit()`	Returns context window size	Number (tokens)
`isValidChatCompletionModel()`	Validates model name	Boolean
`streamingEnabled()`	Checks streaming support	Boolean

Constructor Pattern

All provider constructors follow a consistent pattern:

Validate environment variables - Throw error if required credentials missing
Initialize API client - Create provider-specific client instance
Set model preference - Use modelPreference parameter or environment variable
Configure context limits - Calculate token limits for history/system/user prompts
Pair embedder - Use provided embedder or default to NativeEmbedder
Log initialization - Output configuration details to console

Sources: server/utils/AiProviders/openAi/index.js14-34 server/utils/AiProviders/ollama/index.js18-46 server/utils/AiProviders/gemini/index.js28-62 server/utils/AiProviders/anthropic/index.js14-40

Provider Factory Pattern

Provider selection uses a factory pattern (mentioned in Diagram 5 of the high-level overview). The getLLMProvider() function:

Reads LLM_PROVIDER environment variable or workspace override
Instantiates the appropriate provider class
Passes model preference if specified at workspace level
Returns instance implementing common interface

This allows runtime provider switching without code changes.

Sources: Diagram 5 from high-level architecture

Core Methods Implementation

constructPrompt()

The constructPrompt() method formats input into provider-specific message arrays. All providers accept the same parameters:

Message Structure:

System message: Combines systemPrompt + formatted contextTexts
Chat history: Previous user/assistant exchanges
User message: Current prompt + attachments

Context formatting uses #appendContext() helper:

This wraps each context chunk in XML-style tags for clear delineation.

Attachment Handling:

Providers generate content arrays for multimodal messages:

Provider Variations:

Provider	Message Format	Special Handling
OpenAI	Standard `{role, content}`	`input_text` type for O-models
Anthropic	Standard, system separate	System as separate parameter
Ollama	Standard	Spreads attachments with `formatChatHistory`
Gemini	Standard	Models without system support use user/assistant emulation
Azure	Standard	O-type models use `user` role for system

getChatCompletion()

Synchronous completion method that returns full response at once. Standard implementation:

Validate model with isValidChatCompletionModel()
Wrap API call in LLMPerformanceMonitor.measureAsyncFunction()
Extract response text from provider-specific format
Return {textResponse, metrics} object

Metrics Structure:

prompt_tokens - Input tokens consumed
completion_tokens - Output tokens generated
total_tokens - Sum of prompt + completion
outputTps - Tokens per second (completion/duration)
duration - API call duration in seconds
model - Model identifier used
provider - Provider class name
timestamp - Completion time

Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/gemini/index.js380-413 server/utils/AiProviders/anthropic/index.js150-183

streamGetChatCompletion()

Initiates streaming response. Returns a MonitoredStream object wrapped by LLMPerformanceMonitor:

The measureStream() wrapper:

Tracks token counts (calculated or from API)
Measures streaming duration
Provides endMeasurement() for finalizing metrics

Sources: server/utils/AiProviders/openAi/index.js185-206 server/utils/AiProviders/ollama/index.js321-342 server/utils/AiProviders/gemini/index.js415-433

handleStream()

Processes streaming chunks and writes to HTTP response. Standard pattern:

Chunk Processing Variations:

Provider	Chunk Format	Notes
OpenAI	`{type, delta}`	Uses `response.output_text.delta`
Anthropic	`{type, delta}`	Custom event types `content_block_delta`, `message_stop`
Ollama	`{message: {content}}`	Special handling for `thinking` property (reasoning tokens)
Gemini	Standard OpenAI format	Via OpenAI-compatible endpoint

Some providers use handleDefaultStreamResponseV2() helper for standard OpenAI-format streams.

Sources: server/utils/AiProviders/openAi/index.js208-282 server/utils/AiProviders/ollama/index.js351-469 server/utils/AiProviders/anthropic/index.js211-296

compressMessages()

Ensures messages fit within context window by intelligently truncating. Two strategies exist:

Array Compressor (for OpenAI-format messages):

String Compressor (for providers with different formats):

The compressor:

Calculates token counts for system, user, and history messages
Compares against this.limits (typically 15%/15%/70% split)
Truncates history if exceeding limits
Preserves system prompt and current user message

Sources: server/utils/AiProviders/openAi/index.js292-296 server/utils/AiProviders/anthropic/index.js310-318 server/utils/AiProviders/ollama/index.js479-484

Context Window Management

Context window management varies significantly across providers due to different model capabilities and API behaviors.

Static vs Dynamic Context Windows

Static Context Windows: Some providers have fixed limits defined in MODEL_MAP:

Dynamic Context Windows: Others fetch limits from provider APIs:

Ollama Context Window Caching:

Ollama caches context windows on first initialization:

User Override: User-defined limits take precedence but cannot exceed system limits:

Sources: server/utils/AiProviders/ollama/index.js78-115 server/utils/AiProviders/ollama/index.js170-207 server/utils/AiProviders/lmStudio/index.js73-112

Gemini Model Context Windows

Gemini uses filesystem caching for model metadata:

Cache structure:

Cache expires after 1 day (8.64e7 milliseconds).

Sources: server/utils/AiProviders/gemini/index.js172-302 server/utils/AiProviders/gemini/index.js81-89 server/utils/AiProviders/gemini/index.js107-141

Context Limit Allocation

All providers use a similar allocation strategy (15%/15%/70%):

Lazy initialization pattern for performance:

Sources: server/utils/AiProviders/openAi/index.js23-27 server/utils/AiProviders/ollama/index.js56-67

Performance Monitoring

All provider calls are wrapped in LLMPerformanceMonitor for metrics collection.

Async Function Measurement

For synchronous completions:

The monitor:

Records start time
Executes async function
Calculates duration
Wraps result with timing metadata

Sources: server/utils/AiProviders/openAi/index.js152-163 server/utils/AiProviders/gemini/index.js381-392

Stream Measurement

For streaming completions:

The monitored stream:

Wraps original stream as async iterable
Counts tokens if runPromptTokenCalculation: true
Provides endMeasurement(usage) method to finalize metrics
Tracks streaming duration

Token Counting:

When runPromptTokenCalculation: true, the monitor uses tiktoken or similar to count tokens since some providers don't return usage metadata.

Finalizing Metrics:

Stream handlers call endMeasurement() with final usage data:

Sources: server/utils/AiProviders/ollama/index.js322-342 server/utils/AiProviders/openAi/index.js191-206 server/utils/AiProviders/ollama/index.js367-393

Metrics Collection Flow

Sources: server/utils/AiProviders/openAi/index.js146-183 server/utils/AiProviders/ollama/index.js321-342 server/utils/AiProviders/ollama/index.js351-469

Provider-Specific Features

Ollama: Reasoning Token Support


**Non-streaming:**
```javascript
let content = res.message.content;
if (res.message.thinking)
  content = `${content}`;

This makes reasoning visible in the chat UI for models that support chain-of-thought.

Custom Timeout Support:

Ollama allows custom timeouts for slow responses via OLLAMA_RESPONSE_TIMEOUT:

Sources: server/utils/AiProviders/ollama/index.js398-451 server/utils/AiProviders/ollama/index.js282-284 server/utils/AiProviders/ollama/index.js135-164

Anthropic: Prompt Caching

Anthropic supports prompt caching to reduce costs for repeated system prompts:

System Prompt Builder:

Configuration via ANTHROPIC_CACHE_CONTROL:

5m - Cache for 5 minutes
1h - Cache for 1 hour
none or unset - No caching

Benefits:

Reduced token costs for repeated prompts
Faster response times (cached prompts don't count)
Automatic by Anthropic if content > 1024 tokens

Sources: server/utils/AiProviders/anthropic/index.js73-86 server/utils/AiProviders/anthropic/index.js93-102 frontend/src/components/LLMSelection/AnthropicAiOptions/index.jsx64-85

Gemini: Experimental Models

Gemini distinguishes between stable (v1) and experimental (v1beta) models:

All models use the v1beta OpenAI-compatible endpoint:

System Prompt Support:

Some models don't support system prompts (Gemma variants):

Emulation for unsupported models:

Frontend Model Grouping:

Gemini models are grouped by stability in UI:

Azure OpenAI: O-Type Models

Azure OpenAI deployments support reasoning models (o1, o1-mini, o3-mini) which require different handling:

System Message Handling:

O-type models don't accept system role:

Temperature Override:

O-type models don't support temperature:

Note: Azure doesn't expose model metadata, so users must manually configure AZURE_OPENAI_MODEL_TYPE="reasoning" for reasoning models.

Sources: server/utils/AiProviders/azureOpenAi/index.js30-31 server/utils/AiProviders/azureOpenAi/index.js137-138 server/utils/AiProviders/azureOpenAi/index.js160

TogetherAI: Model List Caching

TogetherAI fetches and caches model list from API:

Context Window Lookup:

Sources: server/utils/AiProviders/togetherAi/index.js18-78 server/utils/AiProviders/togetherAi/index.js148-152

Adding New Providers

To add a new LLM provider, follow this implementation checklist:

1. Create Provider Class File

Create server/utils/AiProviders/{provider-name}/index.js:

2. Implement Required Methods

Context Formatting:

Attachment Handling:

Construct Prompt:

Sync Completion:

Stream Completion:

Handle Stream:

Utility Methods:

Embedding Methods:

Message Compression:

3. Register Provider

Add to getLLMProvider() factory (location varies, typically in a provider loader):

4. Add Frontend Configuration

Create frontend/src/components/LLMSelection/NewProviderOptions/index.jsx:

5. Add Model Discovery Hook

If provider supports dynamic model listing:

6. Environment Variable Documentation

Add to server/.env.example:

Testing Checklist

Provider Configuration Matrix

Provider	API Type	Streaming	Vision	Context Window Source	Model Discovery	Special Features
OpenAI	OpenAI SDK	✓	✓	MODEL_MAP + API	API models endpoint	O-model support
Anthropic	Anthropic SDK	✓	✓	MODEL_MAP	Hardcoded list	Prompt caching
Ollama	Ollama SDK	✓	✓	API discovery	`/api/tags` endpoint	Reasoning tokens, custom timeout
Gemini	OpenAI-compatible	✓	✓	API discovery (cached)	`/v1/models` + `/v1beta/models`	Experimental models, system prompt emulation
Azure OpenAI	OpenAI SDK	✓	✓	Environment variable	User-configured	O-type model support
LM Studio	OpenAI SDK	✓	✓	API discovery	`/api/v0/models`	Multi-model chat
TogetherAI	OpenAI SDK	✓	✓	API discovery (cached)	`/v1/models`	Large model catalog
HuggingFace	OpenAI-compatible	✓	✗	Environment variable	N/A	Inference endpoints
LocalAI	OpenAI SDK	✓	✓	Environment variable	User-configured	Self-hosted
Mistral	OpenAI SDK	✓	✓	Fixed (32000)	Hardcoded	-

Sources: All provider implementation files listed above

LLM Provider Integration

Provider Architecture

Common Interface

Constructor Pattern

Provider Factory Pattern

Core Methods Implementation

constructPrompt()

getChatCompletion()

streamGetChatCompletion()

handleStream()

compressMessages()

Context Window Management

Static vs Dynamic Context Windows

Gemini Model Context Windows

Context Limit Allocation

Performance Monitoring

Async Function Measurement

Stream Measurement

Metrics Collection Flow

Provider-Specific Features

Ollama: Reasoning Token Support

Anthropic: Prompt Caching

Gemini: Experimental Models

Azure OpenAI: O-Type Models

TogetherAI: Model List Caching

Adding New Providers

1. Create Provider Class File

2. Implement Required Methods

3. Register Provider

4. Add Frontend Configuration

5. Add Model Discovery Hook

6. Environment Variable Documentation

Testing Checklist

Provider Configuration Matrix

On this page

LLM Provider Integration

Provider Architecture

Common Interface

Constructor Pattern

Provider Factory Pattern

Core Methods Implementation

constructPrompt()

getChatCompletion()

streamGetChatCompletion()

handleStream()

compressMessages()

Context Window Management

Static vs Dynamic Context Windows

Gemini Model Context Windows

Context Limit Allocation

Performance Monitoring

Async Function Measurement

Stream Measurement

Metrics Collection Flow

Provider-Specific Features

Ollama: Reasoning Token Support

Anthropic: Prompt Caching

Gemini: Experimental Models

Azure OpenAI: O-Type Models

TogetherAI: Model List Caching

Adding New Providers

1. Create Provider Class File

2. Implement Required Methods

3. Register Provider

4. Add Frontend Configuration

5. Add Model Discovery Hook

6. Environment Variable Documentation

Testing Checklist

Provider Configuration Matrix

On this page