This document describes how RAGFlow's Canvas workflow system manages state and variables throughout agent execution. State management enables data flow between components, conversation context preservation, and variable resolution across the directed acyclic graph (DAG) of components.
For information about the Canvas execution engine and workflow orchestration, see Canvas Engine and DSL. For component architecture and lifecycle, see Component System Architecture.
The Canvas system maintains five distinct state buckets that together provide a complete execution context for workflow components. Each bucket serves a specific purpose in the agent's reasoning and data flow.
Sources: agent/canvas.py284-316 agent/canvas.py191-267
The globals bucket stores system-wide variables that are available to all components throughout the workflow execution.
System variables are automatically managed by the Canvas engine and follow the sys.* naming convention:
| Variable | Type | Description | Updated By |
|---|---|---|---|
sys.query | string | Current user query | Canvas.run() |
sys.user_id | string | Tenant/user identifier | Canvas initialization |
sys.conversation_turns | integer | Number of conversation rounds | Canvas.run() increment |
sys.files | list[str] | Uploaded file contents (images as base64, docs as text) | Canvas.run() |
sys.history | list[str] | Formatted conversation history | add_user_input(), workflow completion |
Sources: agent/canvas.py284-290 agent/canvas.py389-397
User-defined variables follow the env.* naming convention and are configured in the Canvas DSL:
Environment variables are accessed as {env.api_key} in component parameters and are reset to their default values when Canvas.reset() is called.
Sources: agent/canvas.py310-367
Sources: agent/canvas.py324-367 agent/canvas.py389-397
The history bucket maintains the conversation context as a list of (role, content) tuples.
| Method | Parameters | Description |
|---|---|---|
add_user_input(question) | question: str | Appends user input to history |
get_history(window_size) | window_size: int | Returns last N conversation turns |
Window Size Calculation:
history[window_size * -2:]Sources: agent/canvas.py297-309 agent/canvas.py718-731
The retrieval bucket accumulates retrieved document chunks and document aggregations throughout workflow execution. Each workflow run appends a new retrieval result.
| Method | Parameters | Description |
|---|---|---|
add_reference(chunks, doc_infos) | chunks: list, doc_infos: list | Adds chunks and doc aggregations to current retrieval |
get_reference() | - | Returns latest retrieval result |
Hash-Based Deduplication:
Chunk IDs are hashed to 500 buckets using hash_str2int(chunk_id, 500) to prevent duplicate chunks within a single retrieval result.
Sources: agent/canvas.py315-316 agent/canvas.py803-821 rag/prompts/generator.py40-59
The memory bucket stores summaries of tool invocations for long-term agent context. Each entry is a tuple of (user_request, assistant_response, summary).
| Method | Parameters | Description |
|---|---|---|
add_memory(user, assist, summ) | user: str, assist: str, summ: str | Appends tool call summary to memory |
get_memory() | - | Returns list of memory tuples |
Memory Generation:
The Agent component generates memory summaries using the tool_call_summary() function, which uses an LLM to create concise summaries of tool invocations.
Sources: agent/canvas.py823-827 rag/prompts/generator.py448-456
Components maintain their own input and output state through _param.inputs and _param.outputs dictionaries. These enable data flow between components via variable references.
RAGFlow uses a three-part variable reference system:
| Syntax | Example | Description |
|---|---|---|
{component_id@output_key} | {retrieval_0@content} | Reference component output |
{[email protected]} | {[email protected]} | Reference nested output |
{sys.variable} | {sys.query} | Reference system variable |
{env.variable} | {env.api_key} | Reference environment variable |
Sources: agent/canvas.py164-235
The get_variable_value() method handles all variable resolution:
Nested Path Traversal:
The get_variable_param_value() method supports accessing nested data structures:
{[email protected]}{[email protected]}{[email protected]}Sources: agent/canvas.py191-235
The get_value_with_variable() method resolves variable references within strings:
The method:
\{* *\{([a-zA-Z:0-9]+@[A-Za-z0-9_.-]+|sys\.[A-Za-z0-9_.]+|env\.[A-Za-z0-9_.]+)\} *\}*get_variable_value()Sources: agent/canvas.py164-189
Components read inputs and write outputs through their parameter objects. The Canvas engine resolves variable references during component invocation.
Sources: agent/component/base.py478-492
Components discover their input variables by scanning parameter values for variable references:
This enables dynamic input form generation for component debugging.
Sources: agent/component/base.py500-511
Component outputs are stored in _param.outputs as dictionaries with value and type keys:
Reserved Output Keys:
_ERROR: Error message if component failed_created_time: Timestamp when component started_elapsed_time: Execution duration_next: For branching components (Categorize, Switch), list of downstream component IDsSources: agent/component/base.py453-461
Canvas state is persisted to support multi-turn conversations and session resumption.
The Canvas.__str__() method serializes the complete state to JSON:
Sources: agent/canvas.py109-128 agent/canvas.py318-322
Sources: api/db/services/canvas_service.py192-251
The Canvas.reset() method clears state while preserving configuration:
Sources: agent/canvas.py324-367
Components can output functools.partial objects for streaming data:
The Canvas engine handles partial functions specially:
Sources: agent/canvas.py175-179 agent/canvas.py506-549
Components can track which component's retrieval results they're using:
This enables citation generation when multiple retrieval components exist.
Sources: agent/component/base.py507-509
Environment variables support type definitions:
Types are used for:
Sources: agent/canvas.py346-366
Sources: agent/component/llm.py226-259
Sources: agent/component/agent_with_tools.py278-410
Sources: agent/component/categorize.py108-156
The Canvas provides an API to inspect component inputs for debugging:
This returns resolved input elements:
Sources: api/apps/canvas_app.py312-329
The debug endpoint allows testing individual components with mock inputs:
The debug flow:
_param.debug_inputs with provided paramsSources: api/apps/canvas_app.py332-366
Agent components log tool invocations to Redis for debugging:
Traces are retrieved via:
Sources: agent/canvas.py779-801 api/apps/canvas_app.py551-562
Sources: agent/canvas.py369-661
RAGFlow's state management system provides a robust foundation for complex agent workflows through five distinct state buckets (globals, history, retrieval, memory, variables) and a flexible variable resolution system. Key characteristics:
{component@output}, {sys.*}, {env.*}) enables data flow between components with nested path supportThis architecture enables building sophisticated RAG applications with complex data flows while maintaining clarity and debuggability.
Sources: agent/canvas.py40-831 agent/component/base.py365-585 api/db/services/canvas_service.py192-251
Refresh this wiki