Chat System Architecture

Relevant source files

Purpose and Scope

This document provides an overview of the complete chat system architecture in AnythingLLM, covering the end-to-end flow from frontend user interaction to backend LLM response delivery. The system encompasses:

Backend streaming engine with RAG pipeline and context assembly (detailed in Chat Streaming Engine)
Frontend chat interface including input handling and drag-and-drop attachments (detailed in Frontend Chat Components)
Message rendering and history management (detailed in Message Rendering and History)
Agent invocation via WebSocket for tool-augmented responses (detailed in Agent System)
Embedded chat widgets for external website integration (detailed in Embedded Chat Widgets)

System Overview

The chat system implements a streaming RAG (Retrieval-Augmented Generation) pipeline with two execution paths:

Standard Chat: SSE-based streaming via streamChatWithWorkspace() for context-aware responses
Agent Chat: WebSocket-based execution via grepAgents() for tool-augmented workflows

The system supports two chat modes configured per workspace:

Chat Mode (chatMode: "chat"): Uses retrieved context + general LLM knowledge
Query Mode (chatMode: "query"): Uses only retrieved context, refuses if no context found

High-Level Chat Architecture

Sources: frontend/src/components/WorkspaceChat/ChatContainer/index.jsx27-325 server/utils/chats/stream.js18-311 server/utils/chats/index.js12-37 frontend/src/models/workspace.js117-203 frontend/src/utils/chat/index.js

Message Types and Structures

The chat system handles multiple message types, each with specific rendering behavior:

Type	Source	Structure	Frontend Handler
`textResponseChunk`	LLM streaming	`{type, textResponse, uuid, sources}`	Append to message content
`textResponse`	Non-streaming LLM	`{type, textResponse, uuid, sources, close: true}`	Set final message
`statusResponse`	Agent thinking	`{type, content, uuid}`	Render as status indicator
`rechartVisualize`	Agent chart generation	`{type, content, uuid}`	Render `<Chartable>` component
`agentInitWebsocketConnection`	Agent start	`{type, websocketUUID}`	Initialize WebSocket connection
`finalizeResponseStream`	Stream completion	`{type, chatId, metrics, close: true}`	Attach chatId for actions
`abort`	Error or cancellation	`{type, error, close: true}`	Display error message
`stopGeneration`	User cancellation	`{type: "stopGeneration"}`	Stop streaming, reset UI

Sources: server/utils/helpers/chat/responses.js frontend/src/utils/chat/index.js server/utils/chats/stream.js69-77

Backend Chat Streaming Engine

The backend chat engine is responsible for the complete RAG (Retrieval-Augmented Generation) pipeline, from context assembly through LLM response streaming. The main orchestrator is streamChatWithWorkspace() in server/utils/chats/stream.js18-311

For detailed documentation of the backend streaming engine, including:

Context assembly from pinned documents, vector search, and history
Chat vs Query mode behavior
Message compression and token limit enforcement
Streaming vs non-streaming LLM processing
Message persistence to workspace_chats table

See Chat Streaming Engine.

Key Backend Components:

Component	File Path	Responsibility
`streamChatWithWorkspace()`	server/utils/chats/stream.js	Main chat orchestrator
`recentChatHistory()`	server/utils/chats/index.js:61-82	Fetch chat history
`DocumentManager.pinnedDocs()`	server/utils/DocumentManager.js	Retrieve pinned documents
`WorkspaceParsedFiles.getContextFiles()`	server/models/workspaceParsedFiles.js	Get temporary attachments
`VectorDb.performSimilaritySearch()`	server/utils/vectorDbProviders/*	Semantic search
`fillSourceWindow()`	server/utils/helpers/chat/index.js	Backfill context from history
`LLMConnector.compressMessages()`	server/utils/AiProviders/*/index.js	Token limit enforcement
`writeResponseChunk()`	server/utils/helpers/chat/responses.js	Write SSE chunks

Sources: server/utils/chats/stream.js18-311 server/utils/chats/index.js server/utils/DocumentManager.js

Frontend Chat Components

The frontend chat interface is built with React components that handle user input, real-time streaming responses, and interactive message management. The architecture follows an event-driven pattern for communication between components.

For detailed documentation of frontend components, including:

ChatContainer component structure and lifecycle
PromptInput textarea with undo/redo support
Drag-and-drop file upload system
Attachment processing and embedding
Real-time response streaming

See Frontend Chat Components.

Frontend Component Hierarchy

Sources: frontend/src/components/WorkspaceChat/ChatContainer/index.jsx26-330 frontend/src/components/WorkspaceChat/ChatContainer/PromptInput/index.jsx32-377 frontend/src/components/WorkspaceChat/ChatContainer/DnDWrapper/index.jsx

Chat Modes: Chat vs Query

The chatMode field on the workspaces table determines response behavior when no context is found:

Chat Mode (workspace.chatMode === "chat"):

Uses retrieved context + LLM general knowledge
Always responds even with zero context
Default mode for conversational assistants
Configured via server/models/workspace.js95-97

Query Mode (workspace.chatMode === "query"):

Uses only retrieved context from documents
Refuses to respond if contextTexts.length === 0
Returns workspace.queryRefusalResponse or default message
Exits early before LLM call to save costs
Prevents hallucination on document-specific questions
Configured via server/utils/chats/stream.js200-227

Query Refusal Response Customization:

Early Exit Logic:

Sources: server/utils/chats/stream.js16-17 server/utils/chats/stream.js65-92 server/utils/chats/stream.js200-227 server/models/workspace.js95-97

Key Frontend Patterns

Event-Driven Communication: The frontend uses custom events to avoid prop drilling and prevent unnecessary re-renders:

Event	Purpose	Defined	Dispatched	Handled
`PROMPT_INPUT_EVENT`	Update prompt text remotely	PromptInput/index.jsx30	`setMessageEmit()`	PromptInput/index.jsx69-74
`CLEAR_ATTACHMENTS_EVENT`	Clear attachment queue	DnDWrapper	After submit	DnDWrapper context
`ATTACHMENTS_PROCESSING_EVENT`	Disable send while processing	DnDWrapper	File upload start	PromptInput/index.jsx363
`ATTACHMENTS_PROCESSED_EVENT`	Re-enable send button	DnDWrapper	File upload complete	PromptInput/index.jsx366
`ABORT_STREAM_EVENT`	Cancel active chat stream	utils/chat/index.js3	User clicks stop	workspace.js145
`AGENT_SESSION_START`	Agent mode activated	chat/agent.js	WebSocket opens	UI updates
`AGENT_SESSION_END`	Agent mode closed	chat/agent.js	WebSocket closes	ChatContainer/index.jsx252

Sources: frontend/src/components/WorkspaceChat/ChatContainer/PromptInput/index.jsx30 frontend/src/components/WorkspaceChat/ChatContainer/DnDWrapper/index.jsx frontend/src/utils/chat/index.js3

State Management:

chatHistory - Array of message objects {content, role, chatId, sources, ...}
loadingResponse - Boolean controlling streaming state and UI
socketId - UUID for agent WebSocket connection
websocket - Active WebSocket instance for agent communication
files - Attachment queue from DndUploaderContext

Sources: frontend/src/components/WorkspaceChat/ChatContainer/index.jsx28-33

Prompt Input Storage: The usePromptInputStorage hook persists unsent messages per thread/workspace in localStorage:

Messages are debounced and saved every 500ms, preventing data loss on refresh.

Sources: frontend/src/hooks/usePromptInputStorage.js1-63

Message Rendering and History

For detailed documentation of message rendering, including:

HistoricalMessage component for individual messages
Message actions (edit, copy, regenerate, delete, fork)
Chat history management and threading
Response formatting with markdown, code syntax highlighting

See Message Rendering and History.

Sources: frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/index.jsx frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/HistoricalMessage/index.jsx

Agent System

The agent system enables the LLM to invoke external tools and APIs through a structured invocation pattern. Users can trigger agent mode by mentioning @agent in their message. Agent responses are handled via WebSocket instead of SSE for bidirectional communication.

For detailed documentation of the agent system, including:

Agent detection and WebSocket initialization
Available agent skills and plugins
AgentHandler and tool invocation flow
Agent response streaming and state management

See Agent System.

Agent Invocation Flow

Sources: server/utils/chats/agents.js server/utils/chats/stream.js42-51 frontend/src/components/WorkspaceChat/ChatContainer/index.jsx229-300

Embedded Chat Widgets

Embedded chat widgets enable external websites to integrate AnythingLLM chat functionality. Each embed has its own configuration for access control, rate limiting, and behavioral overrides.

For detailed documentation of embedded chat widgets, including:

Embed configuration and database schema
Domain allowlisting for access control
Rate limiting (per-day and per-session)
Source filtering for privacy
Configuration overrides (prompt, model, temperature)

See Embedded Chat Widgets.

Embed System Overview

Key Differences from Workspace Chat:

Separate embed_configs and embed_chats tables
Domain allowlist validation
Rate limiting per embed/session
Source citations filtered from history for privacy
Optional configuration overrides (prompt, model, temperature)

Sources: server/endpoints/embed/index.js19-67 server/utils/chats/embed.js11-207 server/utils/middleware/embedMiddleware.js9-171 server/models/embedChats.js

Chat Pipeline Execution Flow

Context Assembly Stages

The streamChatWithWorkspace() function assembles context from five sources in priority order:

Context Assembly Pipeline:

Sources: server/utils/chats/stream.js102-196

Context Source Details:

Stage	Function	Database/Storage	Priority	Token Limit
1. History	`recentChatHistory()`	`workspace_chats` table	N/A	Via `messageLimit`
2. Pinned	`DocumentManager.pinnedDocs()`	`workspace_documents.pinned=true`	Highest	80% of context window
3. Parsed	`WorkspaceParsedFiles.getContextFiles()`	`workspace_parsed_files`	High	No specific limit
4. Vector	`VectorDb.performSimilaritySearch()`	Vector database	Medium	Via `topN` parameter
5. Backfill	`fillSourceWindow()`	From history JSON	Low	Fills remaining space

Deduplication via sourceIdentifier():

Sources: server/utils/chats/stream.js115-132 server/utils/chats/stream.js135-148 server/utils/chats/stream.js150-166 server/utils/chats/stream.js180-186 server/utils/chats/index.js107-110

Dual Context Strategy

The system maintains two separate arrays to prevent citation confusion:

contextTexts Array (sent to LLM):

Pinned document content
Parsed file content
Vector search results
Backfilled sources from history (key difference)

sources Array (shown to user):

Pinned document metadata
Parsed file metadata
Only current vector search results (no backfill)

Why This Matters:

Sources: server/utils/chats/stream.js180-196

Token Limit Enforcement

The LLMConnector.compressMessages() method ensures the prompt fits within the model's context window:

Compression Process:

Calculate total tokens in system prompt, user prompt, context, and history
If total exceeds LLMConnector.promptWindowLimit(), trim history from oldest to newest
If still exceeds limit, truncate context texts
Never truncate current user message or system prompt

Model-Specific Limits:

Sources: server/utils/chats/stream.js231-240

Streaming vs Non-Streaming Execution

The system adapts to LLM provider capabilities:

Streaming Flow:

Non-Streaming Flow:

Chat System Architecture

Relevant source files

Purpose and Scope

Backend streaming engine with RAG pipeline and context assembly (detailed in Chat Streaming Engine)
Frontend chat interface including input handling and drag-and-drop attachments (detailed in Frontend Chat Components)
Message rendering and history management (detailed in Message Rendering and History)
Agent invocation via WebSocket for tool-augmented responses (detailed in Agent System)
Embedded chat widgets for external website integration (detailed in Embedded Chat Widgets)

System Overview

The chat system implements a streaming RAG (Retrieval-Augmented Generation) pipeline with two execution paths:

Standard Chat: SSE-based streaming via streamChatWithWorkspace() for context-aware responses
Agent Chat: WebSocket-based execution via grepAgents() for tool-augmented workflows

The system supports two chat modes configured per workspace:

Chat Mode (chatMode: "chat"): Uses retrieved context + general LLM knowledge
Query Mode (chatMode: "query"): Uses only retrieved context, refuses if no context found

High-Level Chat Architecture

Message Types and Structures

The chat system handles multiple message types, each with specific rendering behavior:

Type	Source	Structure	Frontend Handler
`textResponseChunk`	LLM streaming	`{type, textResponse, uuid, sources}`	Append to message content
`textResponse`	Non-streaming LLM	`{type, textResponse, uuid, sources, close: true}`	Set final message
`statusResponse`	Agent thinking	`{type, content, uuid}`	Render as status indicator
`rechartVisualize`	Agent chart generation	`{type, content, uuid}`	Render `<Chartable>` component
`agentInitWebsocketConnection`	Agent start	`{type, websocketUUID}`	Initialize WebSocket connection
`finalizeResponseStream`	Stream completion	`{type, chatId, metrics, close: true}`	Attach chatId for actions
`abort`	Error or cancellation	`{type, error, close: true}`	Display error message
`stopGeneration`	User cancellation	`{type: "stopGeneration"}`	Stop streaming, reset UI

Sources: server/utils/helpers/chat/responses.js frontend/src/utils/chat/index.js server/utils/chats/stream.js69-77

Backend Chat Streaming Engine

For detailed documentation of the backend streaming engine, including:

Context assembly from pinned documents, vector search, and history
Chat vs Query mode behavior
Message compression and token limit enforcement
Streaming vs non-streaming LLM processing
Message persistence to workspace_chats table

See Chat Streaming Engine.

Key Backend Components:

Component	File Path	Responsibility
`streamChatWithWorkspace()`	server/utils/chats/stream.js	Main chat orchestrator
`recentChatHistory()`	server/utils/chats/index.js:61-82	Fetch chat history
`DocumentManager.pinnedDocs()`	server/utils/DocumentManager.js	Retrieve pinned documents
`WorkspaceParsedFiles.getContextFiles()`	server/models/workspaceParsedFiles.js	Get temporary attachments
`VectorDb.performSimilaritySearch()`	server/utils/vectorDbProviders/*	Semantic search
`fillSourceWindow()`	server/utils/helpers/chat/index.js	Backfill context from history
`LLMConnector.compressMessages()`	server/utils/AiProviders/*/index.js	Token limit enforcement
`writeResponseChunk()`	server/utils/helpers/chat/responses.js	Write SSE chunks

Sources: server/utils/chats/stream.js18-311 server/utils/chats/index.js server/utils/DocumentManager.js

Frontend Chat Components

For detailed documentation of frontend components, including:

ChatContainer component structure and lifecycle
PromptInput textarea with undo/redo support
Drag-and-drop file upload system
Attachment processing and embedding
Real-time response streaming

See Frontend Chat Components.

Frontend Component Hierarchy

Chat Modes: Chat vs Query

The chatMode field on the workspaces table determines response behavior when no context is found:

Chat Mode (workspace.chatMode === "chat"):

Uses retrieved context + LLM general knowledge
Always responds even with zero context
Default mode for conversational assistants
Configured via server/models/workspace.js95-97

Query Mode (workspace.chatMode === "query"):

Uses only retrieved context from documents
Refuses to respond if contextTexts.length === 0
Returns workspace.queryRefusalResponse or default message
Exits early before LLM call to save costs
Prevents hallucination on document-specific questions
Configured via server/utils/chats/stream.js200-227

Query Refusal Response Customization:

Early Exit Logic:

Sources: server/utils/chats/stream.js16-17 server/utils/chats/stream.js65-92 server/utils/chats/stream.js200-227 server/models/workspace.js95-97

Key Frontend Patterns

Event-Driven Communication: The frontend uses custom events to avoid prop drilling and prevent unnecessary re-renders:

Event	Purpose	Defined	Dispatched	Handled
`PROMPT_INPUT_EVENT`	Update prompt text remotely	PromptInput/index.jsx30	`setMessageEmit()`	PromptInput/index.jsx69-74
`CLEAR_ATTACHMENTS_EVENT`	Clear attachment queue	DnDWrapper	After submit	DnDWrapper context
`ATTACHMENTS_PROCESSING_EVENT`	Disable send while processing	DnDWrapper	File upload start	PromptInput/index.jsx363
`ATTACHMENTS_PROCESSED_EVENT`	Re-enable send button	DnDWrapper	File upload complete	PromptInput/index.jsx366
`ABORT_STREAM_EVENT`	Cancel active chat stream	utils/chat/index.js3	User clicks stop	workspace.js145
`AGENT_SESSION_START`	Agent mode activated	chat/agent.js	WebSocket opens	UI updates
`AGENT_SESSION_END`	Agent mode closed	chat/agent.js	WebSocket closes	ChatContainer/index.jsx252

Sources: frontend/src/components/WorkspaceChat/ChatContainer/PromptInput/index.jsx30 frontend/src/components/WorkspaceChat/ChatContainer/DnDWrapper/index.jsx frontend/src/utils/chat/index.js3

State Management:

chatHistory - Array of message objects {content, role, chatId, sources, ...}
loadingResponse - Boolean controlling streaming state and UI
socketId - UUID for agent WebSocket connection
websocket - Active WebSocket instance for agent communication
files - Attachment queue from DndUploaderContext

Sources: frontend/src/components/WorkspaceChat/ChatContainer/index.jsx28-33

Prompt Input Storage: The usePromptInputStorage hook persists unsent messages per thread/workspace in localStorage:

Messages are debounced and saved every 500ms, preventing data loss on refresh.

Sources: frontend/src/hooks/usePromptInputStorage.js1-63

Message Rendering and History

For detailed documentation of message rendering, including:

HistoricalMessage component for individual messages
Message actions (edit, copy, regenerate, delete, fork)
Chat history management and threading
Response formatting with markdown, code syntax highlighting

See Message Rendering and History.

Sources: frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/index.jsx frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/HistoricalMessage/index.jsx

Agent System

For detailed documentation of the agent system, including:

Agent detection and WebSocket initialization
Available agent skills and plugins
AgentHandler and tool invocation flow
Agent response streaming and state management

See Agent System.

Agent Invocation Flow

Sources: server/utils/chats/agents.js server/utils/chats/stream.js42-51 frontend/src/components/WorkspaceChat/ChatContainer/index.jsx229-300

Embedded Chat Widgets

Embedded chat widgets enable external websites to integrate AnythingLLM chat functionality. Each embed has its own configuration for access control, rate limiting, and behavioral overrides.

For detailed documentation of embedded chat widgets, including:

Embed configuration and database schema
Domain allowlisting for access control
Rate limiting (per-day and per-session)
Source filtering for privacy
Configuration overrides (prompt, model, temperature)

See Embedded Chat Widgets.

Embed System Overview

Key Differences from Workspace Chat:

Separate embed_configs and embed_chats tables
Domain allowlist validation
Rate limiting per embed/session
Source citations filtered from history for privacy
Optional configuration overrides (prompt, model, temperature)

Sources: server/endpoints/embed/index.js19-67 server/utils/chats/embed.js11-207 server/utils/middleware/embedMiddleware.js9-171 server/models/embedChats.js

Chat Pipeline Execution Flow

Context Assembly Stages

The streamChatWithWorkspace() function assembles context from five sources in priority order:

Context Assembly Pipeline:

Sources: server/utils/chats/stream.js102-196

Context Source Details:

Stage	Function	Database/Storage	Priority	Token Limit
1. History	`recentChatHistory()`	`workspace_chats` table	N/A	Via `messageLimit`
2. Pinned	`DocumentManager.pinnedDocs()`	`workspace_documents.pinned=true`	Highest	80% of context window
3. Parsed	`WorkspaceParsedFiles.getContextFiles()`	`workspace_parsed_files`	High	No specific limit
4. Vector	`VectorDb.performSimilaritySearch()`	Vector database	Medium	Via `topN` parameter
5. Backfill	`fillSourceWindow()`	From history JSON	Low	Fills remaining space

Deduplication via sourceIdentifier():

Sources: server/utils/chats/stream.js115-132 server/utils/chats/stream.js135-148 server/utils/chats/stream.js150-166 server/utils/chats/stream.js180-186 server/utils/chats/index.js107-110

Dual Context Strategy

The system maintains two separate arrays to prevent citation confusion:

contextTexts Array (sent to LLM):

Pinned document content
Parsed file content
Vector search results
Backfilled sources from history (key difference)

sources Array (shown to user):

Pinned document metadata
Parsed file metadata
Only current vector search results (no backfill)

Why This Matters:

Sources: server/utils/chats/stream.js180-196

Token Limit Enforcement

The LLMConnector.compressMessages() method ensures the prompt fits within the model's context window:

Compression Process:

Calculate total tokens in system prompt, user prompt, context, and history
If total exceeds LLMConnector.promptWindowLimit(), trim history from oldest to newest
If still exceeds limit, truncate context texts
Never truncate current user message or system prompt

Model-Specific Limits:

Sources: server/utils/chats/stream.js231-240

Streaming vs Non-Streaming Execution

The system adapts to LLM provider capabilities:

Streaming Flow:

Non-Streaming Flow:

Chat System Architecture

Purpose and Scope

System Overview

High-Level Chat Architecture

Message Types and Structures

Backend Chat Streaming Engine

Frontend Chat Components

Frontend Component Hierarchy

Chat Modes: Chat vs Query

Key Frontend Patterns

Message Rendering and History

Agent System

Agent Invocation Flow

Embedded Chat Widgets

Embed System Overview

Chat Pipeline Execution Flow

Context Assembly Stages

Dual Context Strategy

Token Limit Enforcement

Streaming vs Non-Streaming Execution

On this page

Chat System Architecture

Purpose and Scope

System Overview

High-Level Chat Architecture

Message Types and Structures

Backend Chat Streaming Engine

Frontend Chat Components

Frontend Component Hierarchy

Chat Modes: Chat vs Query

Key Frontend Patterns

Message Rendering and History

Agent System

Agent Invocation Flow

Embedded Chat Widgets

Embed System Overview

Chat Pipeline Execution Flow

Context Assembly Stages

Dual Context Strategy

Token Limit Enforcement

Streaming vs Non-Streaming Execution

On this page