Chat Streaming Engine

Relevant source files

Purpose and Scope

The Chat Streaming Engine is the core component responsible for processing chat messages in AnythingLLM. It orchestrates the Retrieval-Augmented Generation (RAG) pipeline by assembling context from multiple sources, managing chat history, and coordinating with LLM providers to generate responses. This document focuses on the backend streaming logic implemented primarily in server/utils/chats/stream.js

For information about the frontend chat components, see page 7.2. For agent-specific functionality, see page 7.4. For embedded chat widgets, see page 7.5.

Sources: server/utils/chats/stream.js1-316

System Overview

The chat engine operates through a multi-stage pipeline that processes user messages, gathers relevant context, and streams responses from LLM providers. The main entry point is the streamChatWithWorkspace function, which handles both standard workspace chats and thread-scoped conversations.

streamChatWithWorkspace Execution Pipeline

Sources: server/utils/chats/stream.js18-311

Core Function: streamChatWithWorkspace

The streamChatWithWorkspace function is exported from server/utils/chats/stream.js and serves as the primary entry point for all chat processing. This function coordinates the entire RAG pipeline from context assembly to LLM invocation and response streaming.

Function Signature and Parameters

Parameter	Type	Source	Description
`response`	Express Response	HTTP endpoint	Used for Server-Sent Events streaming via `writeResponseChunk()`
`workspace`	Object	`Workspace.get({slug})`	Contains `chatProvider`, `chatModel`, `openAiTemp`, `openAiHistory`, `similarityThreshold`, `topN`, `chatMode`, `queryRefusalResponse`
`message`	String	Request body	Raw user input before command/agent preprocessing
`chatMode`	String	Request body or workspace default	Validated against `VALID_CHAT_MODE = ["chat", "query"]`
`user`	Object \| null	JWT authentication	Contains `id` field for scoping history and permissions
`thread`	Object \| null	`WorkspaceThread.get()`	Contains `id` field for thread-scoped history
`attachments`	Array	Request body	Parsed file attachments from `WorkspaceParsedFiles`

Sources: server/utils/chats/stream.js18-26 server/utils/chats/stream.js16

Chat Modes: Query vs Chat

AnythingLLM supports two distinct chat modes that control how the system handles context and responses:

Mode Comparison Table

Aspect	Query Mode	Chat Mode
Purpose	Strict factual answers from documents	Conversational interaction with optional context
Context Requirement	Required - refuses without context	Optional - can use general knowledge
Behavior without context	Returns refusal response	Proceeds with LLM's general knowledge
Use Case	Precise Q&A from documents	General conversation with RAG support

Query Mode Enforcement Checkpoints

Query mode enforces strict context requirements through two validation checkpoints. Both checkpoints persist a refusal chat record with include: false to maintain conversation continuity.

Checkpoint 1: Empty Vector Namespace

Checkpoint 2: Post-Search Context Validation

Key Design Detail: Refusal responses use include: false so they appear in chat history but don't consume context window space in future interactions.

Sources: server/utils/chats/stream.js60-92 server/utils/chats/stream.js200-227

Context Assembly Pipeline

The chat engine assembles context from four distinct sources, executed in parallel where possible to optimize performance. The final context is a merged collection from all sources.

Context Assembly Data Flow

Sources: server/utils/chats/stream.js102-196

1. Pinned Documents (Lines 115-132)

Pinned documents (workspace_documents.pinned = true) are always included in context regardless of relevance to the query. The DocumentManager class handles token-aware limiting to prevent context overflow.

Implementation Details:

DocumentManager.pinnedDocs() queries workspace_documents where pinned = true
maxTokens caps pinned content at ~80% of LLMConnector.promptWindowLimit() to reserve space for chat history and vector results
sourceIdentifier(doc) generates "title:{title}-timestamp:{published}" for deduplication
pinnedDocIdentifiers[] is passed to performSimilaritySearch() via filterIdentifiers parameter

Sources: server/utils/chats/stream.js115-132 server/utils/chats/index.js107-110

2. Parsed Files (Attachments)

Parsed files are user-uploaded attachments that are scoped to the workspace, thread, or user session. These are retrieved via WorkspaceParsedFiles.getContextFiles().

Sources: server/utils/chats/stream.js135-148

3. Vector Similarity Search (Lines 150-178)

Vector search is conditionally executed based on embeddingsCount and uses workspace-scoped configuration parameters.

Workspace Configuration Parameters:

Field	Default	Validation	Vector DB Param
`workspace.similarityThreshold`	0.25	Float: 0.0-1.0	`similarityThreshold`
`workspace.topN`	4	Int: ≥ 1	`topN`
`workspace.vectorSearchMode`	"default"	"default" \| "rerank"	`rerank` boolean

Vector DB Method Signature:

Returns {contextTexts: string[], sources: object[], message: string|null}
message field indicates search failure (connection error, timeout, etc.)
filterIdentifiers prevents re-fetching documents already in pinnedDocIdentifiers[]

Sources: server/utils/chats/stream.js61 server/utils/chats/stream.js150-178 server/models/workspace.js80-94

4. Chat History

Recent chat history is retrieved and converted to prompt format for the LLM:

The messageLimit defaults to the workspace's openAiHistory setting (default: 20 messages).

Sources: server/utils/chats/stream.js102-107 server/utils/chats/index.js61-82

Context Backfilling: fillSourceWindow

The fillSourceWindow utility (imported from server/utils/helpers/chat/index.js) implements intelligent context augmentation by backfilling from chat history when vector search returns fewer than topN results.

fillSourceWindow Algorithm

Sources: server/utils/helpers/chat/index.js

Usage in streamChatWithWorkspace

Asymmetric Context Merge (Critical Design Decision)

The context and sources arrays are merged differently to prevent UX confusion:

Array	Merge Strategy	Rationale
`contextTexts`	Includes `filledSources.contextTexts`	LLM needs full historical context for coherent responses
`sources`	Only `vectorSearchResults.sources`	User sees only current search citations, not backfilled sources

Comment from source code (lines 188-196):

"Why does contextTexts get all the info, but sources only get current search? This is to give the ability of the LLM to 'comprehend' a contextual response without populating the Citations under a response with documents the user 'thinks' are irrelevant due to how we manage backfilling of the context to keep chats with the LLM more correct in responses."

This prevents GitHub issues like "LLM citing document that has no answer in it" while keeping answers accurate.

Sources: server/utils/chats/stream.js180-196

Message Preprocessing

Before context assembly, messages undergo preprocessing to handle special commands and agent invocations.

Preprocessing Pipeline

Sources: server/utils/chats/stream.js28-51 server/utils/chats/index.js8-37

Slash Commands (Lines 28-40)

Slash command processing occurs in two phases: built-in command execution and user preset substitution.

VALID_COMMANDS Registry (server/utils/chats/index.js:8-10):

grepCommand Implementation:

Fetches user presets via SlashCommandPresets.getUserPresets(user?.id)
Checks if message matches any built-in command regex: new RegExp(^(${cmd}), "i")
If matched, returns command name (e.g., "/reset")
If not matched, performs regex replacement of all user presets: updatedMessage.replace(regex, preset.prompt)
Multiple presets can be substituted in a single message

Database Schema:

slash_command_presets table stores user-defined commands
Fields: command (e.g., "/summarize"), prompt (replacement text), uid (user ID), userId (FK)

Sources: server/utils/chats/stream.js28-40 server/utils/chats/index.js8-37 server/prisma/schema.prisma283-295

Agent Invocations (Lines 43-51)

The grepAgents function detects @agent mentions and transitions to the AIbitat framework execution path:

Agent Detection Flow:

grepAgents() parses updatedMessage for @agent pattern
If detected:
- Creates WorkspaceAgentInvocations record with uuid
- Initializes AIbitat instance with agent provider/model from workspace
- Establishes WebSocket connection via agentInitWebsocketConnection()
- Returns true
If not detected, returns false and continues normal chat flow

Database Persistence:

workspace_agent_invocations table tracks all agent sessions
Fields: uuid, prompt, closed, user_id, thread_id, workspace_id
See page 7.4 for complete agent system documentation

Sources: server/utils/chats/stream.js43-51 server/utils/chats/agents.js (imported at line 7), server/prisma/schema.prisma201-215

Message Compression and LLM Preparation

After context assembly, the engine prepares the final prompt for the LLM by compressing messages to fit within the token limit.

Message Compression (Lines 231-240)

The compressMessages method ensures the final prompt fits within the LLM's context window by intelligently truncating chat history while preserving system prompt, user prompt, and context.

chatPrompt Function (server/utils/chats/index.js:91-100):

Prompt Construction Hierarchy:

System prompt: workspace.openAiPrompt → SystemSettings.saneDefaultSystemPrompt (fallback)
SystemPromptVariables.expandSystemPromptVariables() performs variable substitution (e.g., {{username}}, {{date}})
Context texts: Assembled from pinned docs, parsed files, vector search, and backfilled history
Chat history: Converted to {role: "user"|"assistant", content: string}[] format
User prompt: Pre-processed updatedMessage
Attachments: Parsed file metadata

Compression Algorithm:

LLMConnector.compressMessages() calculates token counts for each component
If total > promptWindowLimit(), truncates chat history from oldest to newest
Preserves full system prompt, context, and current user message
Returns array formatted for LLM provider (e.g., OpenAI message format)

Sources: server/utils/chats/stream.js231-240 server/utils/chats/index.js91-100

Streaming and Response Handling

The engine supports both streaming and non-streaming responses depending on the LLM provider's capabilities.

Streaming vs Non-Streaming Response (Lines 244-275)

The engine selects response mode based on LLMConnector.streamingEnabled(), which varies by provider implementation.

Sources: server/utils/chats/stream.js244-275

Non-Streaming Response

Sources: server/utils/chats/stream.js265-272

Persistence and Metrics

After generating a response, the chat engine persists the conversation to the database via WorkspaceChats.new().

Chat Persistence (Lines 277-309)

After LLM response completion, the chat is persisted to workspace_chats table via WorkspaceChats.new().

workspace_chats Table Schema (server/prisma/schema.prisma:186-199):

Column	Type	Relation	Description
`id`	Int @id	Primary key	Auto-increment
`workspaceId`	Int	workspaces	Namespace for chat
`prompt`	String	-	User input (not preprocessed)
`response`	String	-	`JSON.stringify({text, sources, type, attachments, metrics})`
`include`	Boolean	-	If `false`, excluded from `recentChatHistory()`
`user_id`	Int?	users	Multi-user mode scoping
`thread_id`	Int?	-	Thread scoping (no FK to avoid migration)
`api_session_id`	String?	-	API client partition key
`feedbackScore`	Boolean?	-	User feedback (true = positive, false = negative, null = none)
`createdAt`	DateTime	-	Auto-generated timestamp
`lastUpdatedAt`	DateTime	-	Auto-updated timestamp

Key Implementation Notes:

response field stores JSON-stringified object (not Prisma JSON type to maintain SQLite compatibility)
include: false used for refusal responses in query mode to exclude from future context
thread_id has no foreign key to prevent full table migration when workspace_threads was added
api_session_id enables stateful conversations for API clients without user accounts

Sources: server/utils/chats/stream.js277-309 server/prisma/schema.prisma186-199 server/models/workspaceChats.js5-31

Supporting Utilities

recentChatHistory Function

Retrieves chat history filtered by workspace, user, thread, and include: true:

Returns { rawHistory, chatHistory } where:

rawHistory: Array of raw workspace_chats records
chatHistory: Converted to {role, content}[] format via convertToPromptHistory()

Sources: server/utils/chats/index.js61-82

sourceIdentifier

Generates unique identifiers for source documents to prevent duplication:

Sources: server/utils/chats/index.js107-110

writeResponseChunk

Utility function for writing streaming response chunks to the HTTP response. Handles Server-Sent Events (SSE) formatting.

Sources: server/utils/helpers/chat/responses.js (referenced at server/utils/chats/stream.js6)

Embedded Chat Engine Variant: streamChatWithForEmbed

A parallel implementation for embedded chat widgets exists in server/utils/chats/embed.js The streamChatWithForEmbed function reuses most RAG pipeline logic but differs in authentication, configuration, and persistence.

Architecture Comparison Table

Aspect	streamChatWithWorkspace	streamChatWithForEmbed
Function Signature	`(response, workspace, message, chatMode, user, thread, attachments)`	`(response, embed, message, sessionId, {promptOverride, modelOverride, temperatureOverride, username})`
Entry Point	`POST /workspace/:slug/stream-chat`	`POST /embed/:embedId/stream-chat`
Authentication	JWT or instance password	`sessionId` UUID + allowlist domain check
Configuration Source	`workspaces` table	`embed_configs` + `workspace` relation
Persistence Target	`workspace_chats` table	`embed_chats` table
History Function	`recentChatHistory({user, workspace, thread, messageLimit})`	`recentEmbedChatHistory(sessionId, embed, messageLimit)`
Source Handling	All sources included in response	`filterSources()` removes sources from `/history` endpoint
Rate Limiting	User daily limit or none	`max_chats_per_day`, `max_chats_per_session`
Middleware Stack	`validApiKey` or `flexUserRoleValid`	`validEmbedConfig`, `setConnectionMeta`, `canRespond`
Command Support	Slash commands + agent invocations	No commands/agents
Override Support	None (uses workspace config)	`allow_prompt_override`, `allow_model_override`, `allow_temperature_override`

Embed-Specific Features

1. Configuration Overrides (Lines 22-28):

2. Rate Limiting (server/utils/middleware/embedMiddleware.js:107-157):

max_chats_per_day: Total chats for embed across all sessions (last 24 hours)
max_chats_per_session: Total chats for specific sessionId (last 24 hours)
Returns 429 status with errorMsg field for user-facing display

3. Source Filtering (server/models/embedChats.js:47-53):

4. Connection Metadata (server/utils/middleware/embedMiddleware.js:22-28):

Sources: server/utils/chats/embed.js11-223 server/utils/middleware/embedMiddleware.js9-171 server/models/embedChats.js47-53 server/endpoints/embed/index.js19-67

Configuration Parameters

The chat engine reads configuration from the workspaces table:

workspace Column	Default	Validation	Description
`chatProvider`	null	String or null	LLM provider key (e.g., "openai")
`chatModel`	null	String or null	Model identifier
`chatMode`	"chat"	"chat" \| "query"	Determines context requirement
`openAiTemp`	null	Float or null (≥ 0)	Temperature for LLM
`openAiHistory`	20	Int (≥ 0)	Message limit for history
`openAiPrompt`	null	String or null	System prompt (falls back to `saneDefaultSystemPrompt`)
`similarityThreshold`	0.25	Float (0.0-1.0)	Vector search cutoff
`topN`	4	Int (≥ 1)	Number of vector results
`queryRefusalResponse`	null	String or null	Custom refusal message
`vectorSearchMode`	"default"	"default" \| "rerank"	Enables LanceDB reranking

These are validated by Workspace.validations object and applied in Workspace.update().

Sources: server/models/workspace.js60-131 server/prisma/schema.prisma126-155

Error Handling and Abort Scenarios

The chat engine handles errors at multiple checkpoints:

1. Vector Search Failures

2. Empty Context in Query Mode

Two checkpoints: stream.js65-92 and stream.js200-227
Returns type: "textResponse" with queryRefusalResponse
Persists refusal to workspace_chats with include: false

3. Command Execution Errors

Handled by individual command functions in VALID_COMMANDS
Example: resetMemory returns success/error via writeResponseChunk

4. LLM Provider Errors

Propagated through getChatCompletion() or handleStream()
Caught by endpoint try/catch and returned as abort chunks

Sources: server/utils/chats/stream.js65-92 server/utils/chats/stream.js168-178 server/utils/chats/stream.js200-227

Performance Considerations

The chat engine optimizes performance through:

Parallel Context Fetching: Pinned docs, parsed files, and vector search execute concurrently
Token-Aware Limiting: Pinned docs capped at 80% of context window
Streaming Responses: Reduces perceived latency for users
Context Window Calculation: Pre-calculates available space via LLMConnector.promptWindowLimit()
Early Exit Conditions: Query mode exits early without context
Identifier Filtering: Prevents duplicate sources in vector search

Sources: server/utils/chats/stream.js115-196

Chat Streaming Engine

Relevant source files

Purpose and Scope

For information about the frontend chat components, see page 7.2. For agent-specific functionality, see page 7.4. For embedded chat widgets, see page 7.5.

Sources: server/utils/chats/stream.js1-316

System Overview

streamChatWithWorkspace Execution Pipeline

Sources: server/utils/chats/stream.js18-311

Core Function: streamChatWithWorkspace

Function Signature and Parameters

Parameter	Type	Source	Description
`response`	Express Response	HTTP endpoint	Used for Server-Sent Events streaming via `writeResponseChunk()`
`workspace`	Object	`Workspace.get({slug})`	Contains `chatProvider`, `chatModel`, `openAiTemp`, `openAiHistory`, `similarityThreshold`, `topN`, `chatMode`, `queryRefusalResponse`
`message`	String	Request body	Raw user input before command/agent preprocessing
`chatMode`	String	Request body or workspace default	Validated against `VALID_CHAT_MODE = ["chat", "query"]`
`user`	Object \| null	JWT authentication	Contains `id` field for scoping history and permissions
`thread`	Object \| null	`WorkspaceThread.get()`	Contains `id` field for thread-scoped history
`attachments`	Array	Request body	Parsed file attachments from `WorkspaceParsedFiles`

Sources: server/utils/chats/stream.js18-26 server/utils/chats/stream.js16

Chat Modes: Query vs Chat

AnythingLLM supports two distinct chat modes that control how the system handles context and responses:

Mode Comparison Table

Aspect	Query Mode	Chat Mode
Purpose	Strict factual answers from documents	Conversational interaction with optional context
Context Requirement	Required - refuses without context	Optional - can use general knowledge
Behavior without context	Returns refusal response	Proceeds with LLM's general knowledge
Use Case	Precise Q&A from documents	General conversation with RAG support

Query Mode Enforcement Checkpoints

Query mode enforces strict context requirements through two validation checkpoints. Both checkpoints persist a refusal chat record with include: false to maintain conversation continuity.

Checkpoint 1: Empty Vector Namespace

Checkpoint 2: Post-Search Context Validation

Key Design Detail: Refusal responses use include: false so they appear in chat history but don't consume context window space in future interactions.

Sources: server/utils/chats/stream.js60-92 server/utils/chats/stream.js200-227

Context Assembly Pipeline

The chat engine assembles context from four distinct sources, executed in parallel where possible to optimize performance. The final context is a merged collection from all sources.

Context Assembly Data Flow

Sources: server/utils/chats/stream.js102-196

1. Pinned Documents (Lines 115-132)

Implementation Details:

DocumentManager.pinnedDocs() queries workspace_documents where pinned = true
maxTokens caps pinned content at ~80% of LLMConnector.promptWindowLimit() to reserve space for chat history and vector results
sourceIdentifier(doc) generates "title:{title}-timestamp:{published}" for deduplication
pinnedDocIdentifiers[] is passed to performSimilaritySearch() via filterIdentifiers parameter

Sources: server/utils/chats/stream.js115-132 server/utils/chats/index.js107-110

2. Parsed Files (Attachments)

Parsed files are user-uploaded attachments that are scoped to the workspace, thread, or user session. These are retrieved via WorkspaceParsedFiles.getContextFiles().

Sources: server/utils/chats/stream.js135-148

3. Vector Similarity Search (Lines 150-178)

Vector search is conditionally executed based on embeddingsCount and uses workspace-scoped configuration parameters.

Workspace Configuration Parameters:

Field	Default	Validation	Vector DB Param
`workspace.similarityThreshold`	0.25	Float: 0.0-1.0	`similarityThreshold`
`workspace.topN`	4	Int: ≥ 1	`topN`
`workspace.vectorSearchMode`	"default"	"default" \| "rerank"	`rerank` boolean

Vector DB Method Signature:

Returns {contextTexts: string[], sources: object[], message: string|null}
message field indicates search failure (connection error, timeout, etc.)
filterIdentifiers prevents re-fetching documents already in pinnedDocIdentifiers[]

Sources: server/utils/chats/stream.js61 server/utils/chats/stream.js150-178 server/models/workspace.js80-94

4. Chat History

Recent chat history is retrieved and converted to prompt format for the LLM:

The messageLimit defaults to the workspace's openAiHistory setting (default: 20 messages).

Sources: server/utils/chats/stream.js102-107 server/utils/chats/index.js61-82

Context Backfilling: fillSourceWindow

fillSourceWindow Algorithm

Sources: server/utils/helpers/chat/index.js

Usage in streamChatWithWorkspace

Asymmetric Context Merge (Critical Design Decision)

The context and sources arrays are merged differently to prevent UX confusion:

Array	Merge Strategy	Rationale
`contextTexts`	Includes `filledSources.contextTexts`	LLM needs full historical context for coherent responses
`sources`	Only `vectorSearchResults.sources`	User sees only current search citations, not backfilled sources

Comment from source code (lines 188-196):

"Why does contextTexts get all the info, but sources only get current search? This is to give the ability of the LLM to 'comprehend' a contextual response without populating the Citations under a response with documents the user 'thinks' are irrelevant due to how we manage backfilling of the context to keep chats with the LLM more correct in responses."

This prevents GitHub issues like "LLM citing document that has no answer in it" while keeping answers accurate.

Sources: server/utils/chats/stream.js180-196

Message Preprocessing

Before context assembly, messages undergo preprocessing to handle special commands and agent invocations.

Preprocessing Pipeline

Sources: server/utils/chats/stream.js28-51 server/utils/chats/index.js8-37

Slash Commands (Lines 28-40)

Slash command processing occurs in two phases: built-in command execution and user preset substitution.

VALID_COMMANDS Registry (server/utils/chats/index.js:8-10):

grepCommand Implementation:

Fetches user presets via SlashCommandPresets.getUserPresets(user?.id)
Checks if message matches any built-in command regex: new RegExp(^(${cmd}), "i")
If matched, returns command name (e.g., "/reset")
If not matched, performs regex replacement of all user presets: updatedMessage.replace(regex, preset.prompt)
Multiple presets can be substituted in a single message

Database Schema:

slash_command_presets table stores user-defined commands
Fields: command (e.g., "/summarize"), prompt (replacement text), uid (user ID), userId (FK)

Sources: server/utils/chats/stream.js28-40 server/utils/chats/index.js8-37 server/prisma/schema.prisma283-295

Agent Invocations (Lines 43-51)

The grepAgents function detects @agent mentions and transitions to the AIbitat framework execution path:

Agent Detection Flow:

grepAgents() parses updatedMessage for @agent pattern
If detected:
- Creates WorkspaceAgentInvocations record with uuid
- Initializes AIbitat instance with agent provider/model from workspace
- Establishes WebSocket connection via agentInitWebsocketConnection()
- Returns true
If not detected, returns false and continues normal chat flow

Database Persistence:

workspace_agent_invocations table tracks all agent sessions
Fields: uuid, prompt, closed, user_id, thread_id, workspace_id
See page 7.4 for complete agent system documentation

Sources: server/utils/chats/stream.js43-51 server/utils/chats/agents.js (imported at line 7), server/prisma/schema.prisma201-215

Message Compression and LLM Preparation

After context assembly, the engine prepares the final prompt for the LLM by compressing messages to fit within the token limit.

Message Compression (Lines 231-240)

The compressMessages method ensures the final prompt fits within the LLM's context window by intelligently truncating chat history while preserving system prompt, user prompt, and context.

chatPrompt Function (server/utils/chats/index.js:91-100):

Prompt Construction Hierarchy:

System prompt: workspace.openAiPrompt → SystemSettings.saneDefaultSystemPrompt (fallback)
SystemPromptVariables.expandSystemPromptVariables() performs variable substitution (e.g., {{username}}, {{date}})
Context texts: Assembled from pinned docs, parsed files, vector search, and backfilled history
Chat history: Converted to {role: "user"|"assistant", content: string}[] format
User prompt: Pre-processed updatedMessage
Attachments: Parsed file metadata

Compression Algorithm:

LLMConnector.compressMessages() calculates token counts for each component
If total > promptWindowLimit(), truncates chat history from oldest to newest
Preserves full system prompt, context, and current user message
Returns array formatted for LLM provider (e.g., OpenAI message format)

Sources: server/utils/chats/stream.js231-240 server/utils/chats/index.js91-100

Streaming and Response Handling

The engine supports both streaming and non-streaming responses depending on the LLM provider's capabilities.

Streaming vs Non-Streaming Response (Lines 244-275)

The engine selects response mode based on LLMConnector.streamingEnabled(), which varies by provider implementation.

Sources: server/utils/chats/stream.js244-275

Non-Streaming Response

Sources: server/utils/chats/stream.js265-272

Persistence and Metrics

After generating a response, the chat engine persists the conversation to the database via WorkspaceChats.new().

Chat Persistence (Lines 277-309)

After LLM response completion, the chat is persisted to workspace_chats table via WorkspaceChats.new().

workspace_chats Table Schema (server/prisma/schema.prisma:186-199):

Column	Type	Relation	Description
`id`	Int @id	Primary key	Auto-increment
`workspaceId`	Int	workspaces	Namespace for chat
`prompt`	String	-	User input (not preprocessed)
`response`	String	-	`JSON.stringify({text, sources, type, attachments, metrics})`
`include`	Boolean	-	If `false`, excluded from `recentChatHistory()`
`user_id`	Int?	users	Multi-user mode scoping
`thread_id`	Int?	-	Thread scoping (no FK to avoid migration)
`api_session_id`	String?	-	API client partition key
`feedbackScore`	Boolean?	-	User feedback (true = positive, false = negative, null = none)
`createdAt`	DateTime	-	Auto-generated timestamp
`lastUpdatedAt`	DateTime	-	Auto-updated timestamp

Key Implementation Notes:

response field stores JSON-stringified object (not Prisma JSON type to maintain SQLite compatibility)
include: false used for refusal responses in query mode to exclude from future context
thread_id has no foreign key to prevent full table migration when workspace_threads was added
api_session_id enables stateful conversations for API clients without user accounts

Sources: server/utils/chats/stream.js277-309 server/prisma/schema.prisma186-199 server/models/workspaceChats.js5-31

Supporting Utilities

recentChatHistory Function

Retrieves chat history filtered by workspace, user, thread, and include: true:

Returns { rawHistory, chatHistory } where:

rawHistory: Array of raw workspace_chats records
chatHistory: Converted to {role, content}[] format via convertToPromptHistory()

Sources: server/utils/chats/index.js61-82

sourceIdentifier

Generates unique identifiers for source documents to prevent duplication:

Sources: server/utils/chats/index.js107-110

writeResponseChunk

Utility function for writing streaming response chunks to the HTTP response. Handles Server-Sent Events (SSE) formatting.

Sources: server/utils/helpers/chat/responses.js (referenced at server/utils/chats/stream.js6)

Embedded Chat Engine Variant: streamChatWithForEmbed

Architecture Comparison Table

Aspect	streamChatWithWorkspace	streamChatWithForEmbed
Function Signature	`(response, workspace, message, chatMode, user, thread, attachments)`	`(response, embed, message, sessionId, {promptOverride, modelOverride, temperatureOverride, username})`
Entry Point	`POST /workspace/:slug/stream-chat`	`POST /embed/:embedId/stream-chat`
Authentication	JWT or instance password	`sessionId` UUID + allowlist domain check
Configuration Source	`workspaces` table	`embed_configs` + `workspace` relation
Persistence Target	`workspace_chats` table	`embed_chats` table
History Function	`recentChatHistory({user, workspace, thread, messageLimit})`	`recentEmbedChatHistory(sessionId, embed, messageLimit)`
Source Handling	All sources included in response	`filterSources()` removes sources from `/history` endpoint
Rate Limiting	User daily limit or none	`max_chats_per_day`, `max_chats_per_session`
Middleware Stack	`validApiKey` or `flexUserRoleValid`	`validEmbedConfig`, `setConnectionMeta`, `canRespond`
Command Support	Slash commands + agent invocations	No commands/agents
Override Support	None (uses workspace config)	`allow_prompt_override`, `allow_model_override`, `allow_temperature_override`

Embed-Specific Features

1. Configuration Overrides (Lines 22-28):

2. Rate Limiting (server/utils/middleware/embedMiddleware.js:107-157):

max_chats_per_day: Total chats for embed across all sessions (last 24 hours)
max_chats_per_session: Total chats for specific sessionId (last 24 hours)
Returns 429 status with errorMsg field for user-facing display

3. Source Filtering (server/models/embedChats.js:47-53):

4. Connection Metadata (server/utils/middleware/embedMiddleware.js:22-28):

Sources: server/utils/chats/embed.js11-223 server/utils/middleware/embedMiddleware.js9-171 server/models/embedChats.js47-53 server/endpoints/embed/index.js19-67

Configuration Parameters

The chat engine reads configuration from the workspaces table:

workspace Column	Default	Validation	Description
`chatProvider`	null	String or null	LLM provider key (e.g., "openai")
`chatModel`	null	String or null	Model identifier
`chatMode`	"chat"	"chat" \| "query"	Determines context requirement
`openAiTemp`	null	Float or null (≥ 0)	Temperature for LLM
`openAiHistory`	20	Int (≥ 0)	Message limit for history
`openAiPrompt`	null	String or null	System prompt (falls back to `saneDefaultSystemPrompt`)
`similarityThreshold`	0.25	Float (0.0-1.0)	Vector search cutoff
`topN`	4	Int (≥ 1)	Number of vector results
`queryRefusalResponse`	null	String or null	Custom refusal message
`vectorSearchMode`	"default"	"default" \| "rerank"	Enables LanceDB reranking

These are validated by Workspace.validations object and applied in Workspace.update().

Sources: server/models/workspace.js60-131 server/prisma/schema.prisma126-155

Error Handling and Abort Scenarios

The chat engine handles errors at multiple checkpoints:

1. Vector Search Failures

2. Empty Context in Query Mode

Two checkpoints: stream.js65-92 and stream.js200-227
Returns type: "textResponse" with queryRefusalResponse
Persists refusal to workspace_chats with include: false

3. Command Execution Errors

Handled by individual command functions in VALID_COMMANDS
Example: resetMemory returns success/error via writeResponseChunk

4. LLM Provider Errors

Propagated through getChatCompletion() or handleStream()
Caught by endpoint try/catch and returned as abort chunks

Sources: server/utils/chats/stream.js65-92 server/utils/chats/stream.js168-178 server/utils/chats/stream.js200-227

Performance Considerations

The chat engine optimizes performance through:

Parallel Context Fetching: Pinned docs, parsed files, and vector search execute concurrently
Token-Aware Limiting: Pinned docs capped at 80% of context window
Streaming Responses: Reduces perceived latency for users
Context Window Calculation: Pre-calculates available space via LLMConnector.promptWindowLimit()
Early Exit Conditions: Query mode exits early without context
Identifier Filtering: Prevents duplicate sources in vector search

Sources: server/utils/chats/stream.js115-196

Chat Streaming Engine

Purpose and Scope

System Overview

streamChatWithWorkspace Execution Pipeline

Core Function: streamChatWithWorkspace

Function Signature and Parameters

Chat Modes: Query vs Chat

Mode Comparison Table

Query Mode Enforcement Checkpoints

Checkpoint 1: Empty Vector Namespace

Checkpoint 2: Post-Search Context Validation

Context Assembly Pipeline

Context Assembly Data Flow

1. Pinned Documents (Lines 115-132)

2. Parsed Files (Attachments)

3. Vector Similarity Search (Lines 150-178)

4. Chat History

Context Backfilling: fillSourceWindow

fillSourceWindow Algorithm

Usage in streamChatWithWorkspace

Asymmetric Context Merge (Critical Design Decision)

Message Preprocessing

Preprocessing Pipeline

Slash Commands (Lines 28-40)

Agent Invocations (Lines 43-51)

Message Compression and LLM Preparation

Message Compression (Lines 231-240)

Streaming and Response Handling

Streaming vs Non-Streaming Response (Lines 244-275)

Non-Streaming Response

Persistence and Metrics

Chat Persistence (Lines 277-309)

Supporting Utilities

recentChatHistory Function

sourceIdentifier

writeResponseChunk

Embedded Chat Engine Variant: streamChatWithForEmbed

Architecture Comparison Table

Embed-Specific Features

Configuration Parameters

Error Handling and Abort Scenarios

Performance Considerations

On this page

Chat Streaming Engine

Purpose and Scope

System Overview

streamChatWithWorkspace Execution Pipeline

Core Function: streamChatWithWorkspace

Function Signature and Parameters

Chat Modes: Query vs Chat

Mode Comparison Table

Query Mode Enforcement Checkpoints

Checkpoint 1: Empty Vector Namespace

Checkpoint 2: Post-Search Context Validation

Context Assembly Pipeline

Context Assembly Data Flow

1. Pinned Documents (Lines 115-132)

2. Parsed Files (Attachments)

3. Vector Similarity Search (Lines 150-178)

4. Chat History

Context Backfilling: fillSourceWindow

fillSourceWindow Algorithm

Usage in streamChatWithWorkspace

Asymmetric Context Merge (Critical Design Decision)

Message Preprocessing

Preprocessing Pipeline

Slash Commands (Lines 28-40)

Agent Invocations (Lines 43-51)

Message Compression and LLM Preparation

Message Compression (Lines 231-240)

Streaming and Response Handling

Streaming vs Non-Streaming Response (Lines 244-275)

Non-Streaming Response

Persistence and Metrics

Chat Persistence (Lines 277-309)

Supporting Utilities

recentChatHistory Function

sourceIdentifier

writeResponseChunk

Embedded Chat Engine Variant: streamChatWithForEmbed