This document provides an overview of the complete chat system architecture in AnythingLLM, covering the end-to-end flow from frontend user interaction to backend LLM response delivery. The system encompasses:
The chat system implements a streaming RAG (Retrieval-Augmented Generation) pipeline with two execution paths:
streamChatWithWorkspace() for context-aware responsesgrepAgents() for tool-augmented workflowsThe system supports two chat modes configured per workspace:
chatMode: "chat"): Uses retrieved context + general LLM knowledgechatMode: "query"): Uses only retrieved context, refuses if no context foundSources: frontend/src/components/WorkspaceChat/ChatContainer/index.jsx27-325 server/utils/chats/stream.js18-311 server/utils/chats/index.js12-37 frontend/src/models/workspace.js117-203 frontend/src/utils/chat/index.js
The chat system handles multiple message types, each with specific rendering behavior:
| Type | Source | Structure | Frontend Handler |
|---|---|---|---|
textResponseChunk | LLM streaming | {type, textResponse, uuid, sources} | Append to message content |
textResponse | Non-streaming LLM | {type, textResponse, uuid, sources, close: true} | Set final message |
statusResponse | Agent thinking | {type, content, uuid} | Render as status indicator |
rechartVisualize | Agent chart generation | {type, content, uuid} | Render <Chartable> component |
agentInitWebsocketConnection | Agent start | {type, websocketUUID} | Initialize WebSocket connection |
finalizeResponseStream | Stream completion | {type, chatId, metrics, close: true} | Attach chatId for actions |
abort | Error or cancellation | {type, error, close: true} | Display error message |
stopGeneration | User cancellation | {type: "stopGeneration"} | Stop streaming, reset UI |
Sources: server/utils/helpers/chat/responses.js frontend/src/utils/chat/index.js server/utils/chats/stream.js69-77
The backend chat engine is responsible for the complete RAG (Retrieval-Augmented Generation) pipeline, from context assembly through LLM response streaming. The main orchestrator is streamChatWithWorkspace() in server/utils/chats/stream.js18-311
For detailed documentation of the backend streaming engine, including:
workspace_chats tableKey Backend Components:
| Component | File Path | Responsibility |
|---|---|---|
streamChatWithWorkspace() | server/utils/chats/stream.js | Main chat orchestrator |
recentChatHistory() | server/utils/chats/index.js:61-82 | Fetch chat history |
DocumentManager.pinnedDocs() | server/utils/DocumentManager.js | Retrieve pinned documents |
WorkspaceParsedFiles.getContextFiles() | server/models/workspaceParsedFiles.js | Get temporary attachments |
VectorDb.performSimilaritySearch() | server/utils/vectorDbProviders/* | Semantic search |
fillSourceWindow() | server/utils/helpers/chat/index.js | Backfill context from history |
LLMConnector.compressMessages() | server/utils/AiProviders/*/index.js | Token limit enforcement |
writeResponseChunk() | server/utils/helpers/chat/responses.js | Write SSE chunks |
Sources: server/utils/chats/stream.js18-311 server/utils/chats/index.js server/utils/DocumentManager.js
The frontend chat interface is built with React components that handle user input, real-time streaming responses, and interactive message management. The architecture follows an event-driven pattern for communication between components.
For detailed documentation of frontend components, including:
ChatContainer component structure and lifecyclePromptInput textarea with undo/redo supportSources: frontend/src/components/WorkspaceChat/ChatContainer/index.jsx26-330 frontend/src/components/WorkspaceChat/ChatContainer/PromptInput/index.jsx32-377 frontend/src/components/WorkspaceChat/ChatContainer/DnDWrapper/index.jsx
The chatMode field on the workspaces table determines response behavior when no context is found:
Chat Mode (workspace.chatMode === "chat"):
Query Mode (workspace.chatMode === "query"):
contextTexts.length === 0workspace.queryRefusalResponse or default messageQuery Refusal Response Customization:
Early Exit Logic:
Sources: server/utils/chats/stream.js16-17 server/utils/chats/stream.js65-92 server/utils/chats/stream.js200-227 server/models/workspace.js95-97
Event-Driven Communication: The frontend uses custom events to avoid prop drilling and prevent unnecessary re-renders:
| Event | Purpose | Defined | Dispatched | Handled |
|---|---|---|---|---|
PROMPT_INPUT_EVENT | Update prompt text remotely | PromptInput/index.jsx30 | setMessageEmit() | PromptInput/index.jsx69-74 |
CLEAR_ATTACHMENTS_EVENT | Clear attachment queue | DnDWrapper | After submit | DnDWrapper context |
ATTACHMENTS_PROCESSING_EVENT | Disable send while processing | DnDWrapper | File upload start | PromptInput/index.jsx363 |
ATTACHMENTS_PROCESSED_EVENT | Re-enable send button | DnDWrapper | File upload complete | PromptInput/index.jsx366 |
ABORT_STREAM_EVENT | Cancel active chat stream | utils/chat/index.js3 | User clicks stop | workspace.js145 |
AGENT_SESSION_START | Agent mode activated | chat/agent.js | WebSocket opens | UI updates |
AGENT_SESSION_END | Agent mode closed | chat/agent.js | WebSocket closes | ChatContainer/index.jsx252 |
Sources: frontend/src/components/WorkspaceChat/ChatContainer/PromptInput/index.jsx30 frontend/src/components/WorkspaceChat/ChatContainer/DnDWrapper/index.jsx frontend/src/utils/chat/index.js3
State Management:
chatHistory - Array of message objects {content, role, chatId, sources, ...}loadingResponse - Boolean controlling streaming state and UIsocketId - UUID for agent WebSocket connectionwebsocket - Active WebSocket instance for agent communicationfiles - Attachment queue from DndUploaderContextSources: frontend/src/components/WorkspaceChat/ChatContainer/index.jsx28-33
Prompt Input Storage:
The usePromptInputStorage hook persists unsent messages per thread/workspace in localStorage:
Messages are debounced and saved every 500ms, preventing data loss on refresh.
Sources: frontend/src/hooks/usePromptInputStorage.js1-63
For detailed documentation of message rendering, including:
HistoricalMessage component for individual messagesSee Message Rendering and History.
Sources: frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/index.jsx frontend/src/components/WorkspaceChat/ChatContainer/ChatHistory/HistoricalMessage/index.jsx
The agent system enables the LLM to invoke external tools and APIs through a structured invocation pattern. Users can trigger agent mode by mentioning @agent in their message. Agent responses are handled via WebSocket instead of SSE for bidirectional communication.
For detailed documentation of the agent system, including:
AgentHandler and tool invocation flowSee Agent System.
Sources: server/utils/chats/agents.js server/utils/chats/stream.js42-51 frontend/src/components/WorkspaceChat/ChatContainer/index.jsx229-300
Embedded chat widgets enable external websites to integrate AnythingLLM chat functionality. Each embed has its own configuration for access control, rate limiting, and behavioral overrides.
For detailed documentation of embedded chat widgets, including:
Key Differences from Workspace Chat:
embed_configs and embed_chats tablesSources: server/endpoints/embed/index.js19-67 server/utils/chats/embed.js11-207 server/utils/middleware/embedMiddleware.js9-171 server/models/embedChats.js
The streamChatWithWorkspace() function assembles context from five sources in priority order:
Context Assembly Pipeline:
Sources: server/utils/chats/stream.js102-196
Context Source Details:
| Stage | Function | Database/Storage | Priority | Token Limit |
|---|---|---|---|---|
| 1. History | recentChatHistory() | workspace_chats table | N/A | Via messageLimit |
| 2. Pinned | DocumentManager.pinnedDocs() | workspace_documents.pinned=true | Highest | 80% of context window |
| 3. Parsed | WorkspaceParsedFiles.getContextFiles() | workspace_parsed_files | High | No specific limit |
| 4. Vector | VectorDb.performSimilaritySearch() | Vector database | Medium | Via topN parameter |
| 5. Backfill | fillSourceWindow() | From history JSON | Low | Fills remaining space |
Deduplication via sourceIdentifier():
Sources: server/utils/chats/stream.js115-132 server/utils/chats/stream.js135-148 server/utils/chats/stream.js150-166 server/utils/chats/stream.js180-186 server/utils/chats/index.js107-110
The system maintains two separate arrays to prevent citation confusion:
contextTexts Array (sent to LLM):
sources Array (shown to user):
Why This Matters:
Sources: server/utils/chats/stream.js180-196
The LLMConnector.compressMessages() method ensures the prompt fits within the model's context window:
Compression Process:
LLMConnector.promptWindowLimit(), trim history from oldest to newestModel-Specific Limits:
Sources: server/utils/chats/stream.js231-240
The system adapts to LLM provider capabilities:
Streaming Flow:
Non-Streaming Flow:
Refresh this wiki