State and Variable Management

Relevant source files

This document describes how RAGFlow's Canvas workflow system manages state and variables throughout agent execution. State management enables data flow between components, conversation context preservation, and variable resolution across the directed acyclic graph (DAG) of components.

For information about the Canvas execution engine and workflow orchestration, see Canvas Engine and DSL. For component architecture and lifecycle, see Component System Architecture.

State Architecture Overview

The Canvas system maintains five distinct state buckets that together provide a complete execution context for workflow components. Each bucket serves a specific purpose in the agent's reasoning and data flow.

The Five State Buckets

Sources: agent/canvas.py284-316 agent/canvas.py191-267

Globals Bucket

The globals bucket stores system-wide variables that are available to all components throughout the workflow execution.

System Variables

System variables are automatically managed by the Canvas engine and follow the sys.* naming convention:

Variable	Type	Description	Updated By
`sys.query`	string	Current user query	Canvas.run()
`sys.user_id`	string	Tenant/user identifier	Canvas initialization
`sys.conversation_turns`	integer	Number of conversation rounds	Canvas.run() increment
`sys.files`	list[str]	Uploaded file contents (images as base64, docs as text)	Canvas.run()
`sys.history`	list[str]	Formatted conversation history	add_user_input(), workflow completion

Sources: agent/canvas.py284-290 agent/canvas.py389-397

Environment Variables

User-defined variables follow the env.* naming convention and are configured in the Canvas DSL:

Environment variables are accessed as {env.api_key} in component parameters and are reset to their default values when Canvas.reset() is called.

Sources: agent/canvas.py310-367

Global Variable Lifecycle

Sources: agent/canvas.py324-367 agent/canvas.py389-397

History Bucket

The history bucket maintains the conversation context as a list of (role, content) tuples.

History Structure

History Management Methods

Method	Parameters	Description
`add_user_input(question)`	question: str	Appends user input to history
`get_history(window_size)`	window_size: int	Returns last N conversation turns

Window Size Calculation:

Window size of 13 returns last 26 entries (13 * 2 for user/assistant pairs)
Accessed as history[window_size * -2:]
Used to limit context sent to LLMs

Sources: agent/canvas.py297-309 agent/canvas.py718-731

Retrieval Bucket

The retrieval bucket accumulates retrieved document chunks and document aggregations throughout workflow execution. Each workflow run appends a new retrieval result.

Retrieval Structure

Retrieval Methods

Method	Parameters	Description
`add_reference(chunks, doc_infos)`	chunks: list, doc_infos: list	Adds chunks and doc aggregations to current retrieval
`get_reference()`	-	Returns latest retrieval result

Hash-Based Deduplication: Chunk IDs are hashed to 500 buckets using hash_str2int(chunk_id, 500) to prevent duplicate chunks within a single retrieval result.

Sources: agent/canvas.py315-316 agent/canvas.py803-821 rag/prompts/generator.py40-59

Memory Bucket

The memory bucket stores summaries of tool invocations for long-term agent context. Each entry is a tuple of (user_request, assistant_response, summary).

Memory Structure

Memory Methods

Method	Parameters	Description
`add_memory(user, assist, summ)`	user: str, assist: str, summ: str	Appends tool call summary to memory
`get_memory()`	-	Returns list of memory tuples

Memory Generation: The Agent component generates memory summaries using the tool_call_summary() function, which uses an LLM to create concise summaries of tool invocations.

Sources: agent/canvas.py823-827 rag/prompts/generator.py448-456

Component Variables and Data Flow

Components maintain their own input and output state through _param.inputs and _param.outputs dictionaries. These enable data flow between components via variable references.

Variable Reference Syntax

RAGFlow uses a three-part variable reference system:

Syntax	Example	Description
`{component_id@output_key}`	`{retrieval_0@content}`	Reference component output
`{[email protected]}`	`{[email protected]}`	Reference nested output
`{sys.variable}`	`{sys.query}`	Reference system variable
`{env.variable}`	`{env.api_key}`	Reference environment variable

Variable Resolution Pattern

Sources: agent/canvas.py164-235

Variable Resolution Implementation

The get_variable_value() method handles all variable resolution:

Nested Path Traversal: The get_variable_param_value() method supports accessing nested data structures:

Dictionary access: {[email protected]}
List access: {[email protected]}
Object attributes: {[email protected]}

Sources: agent/canvas.py191-235

String Template Resolution

The get_value_with_variable() method resolves variable references within strings:

The method:

Finds all variable patterns using regex: \{* *\{([a-zA-Z:0-9]+@[A-Za-z0-9_.-]+|sys\.[A-Za-z0-9_.]+|env\.[A-Za-z0-9_.]+)\} *\}*
Resolves each variable via get_variable_value()
Handles partial functions (streaming outputs) by consuming the stream
Replaces variable references with resolved values
Converts non-string values to JSON

Sources: agent/canvas.py164-189

Component Input/Output Flow

Components read inputs and write outputs through their parameter objects. The Canvas engine resolves variable references during component invocation.

Component Input Resolution

Sources: agent/component/base.py478-492

Input Element Discovery

Components discover their input variables by scanning parameter values for variable references:

This enables dynamic input form generation for component debugging.

Sources: agent/component/base.py500-511

Output Structure

Component outputs are stored in _param.outputs as dictionaries with value and type keys:

Reserved Output Keys:

_ERROR: Error message if component failed
_created_time: Timestamp when component started
_elapsed_time: Execution duration
_next: For branching components (Categorize, Switch), list of downstream component IDs

Sources: agent/component/base.py453-461

State Persistence and Session Management

Canvas state is persisted to support multi-turn conversations and session resumption.

DSL State Serialization

The Canvas.__str__() method serializes the complete state to JSON:

Sources: agent/canvas.py109-128 agent/canvas.py318-322

Session-Based State Management

Sources: api/db/services/canvas_service.py192-251

State Reset Behavior

The Canvas.reset() method clears state while preserving configuration:

Sources: agent/canvas.py324-367

Advanced Variable Patterns

Streaming Outputs with Partial Functions

Components can output functools.partial objects for streaming data:

The Canvas engine handles partial functions specially:

During variable resolution, the stream is consumed and concatenated
Message components stream partial outputs in real-time
After streaming completes, the full text is set as the output value

Sources: agent/canvas.py175-179 agent/canvas.py506-549

Retrieval Reference Tracking

Components can track which component's retrieval results they're using:

This enables citation generation when multiple retrieval components exist.

Sources: agent/component/base.py507-509

Variable Type Validation

Environment variables support type definitions:

Types are used for:

UI rendering (text input vs number input vs checkbox)
Default value assignment during reset
Documentation generation

Sources: agent/canvas.py346-366

State Access Patterns in Common Components

LLM Component

Sources: agent/component/llm.py226-259

Retrieval Component

Agent Component

Sources: agent/component/agent_with_tools.py278-410

Categorize Component

Sources: agent/component/categorize.py108-156

Debugging and Inspection

Component Input Form Generation

The Canvas provides an API to inspect component inputs for debugging:

This returns resolved input elements:

Sources: api/apps/canvas_app.py312-329

Component Debug Execution

The debug endpoint allows testing individual components with mock inputs:

The debug flow:

Creates fresh Canvas instance
Resets state
Sets _param.debug_inputs with provided params
Invokes component with mock inputs
Consumes any streaming outputs
Returns all outputs

Sources: api/apps/canvas_app.py332-366

Tool Call Trace Logging

Agent components log tool invocations to Redis for debugging:

Traces are retrieved via:

Sources: agent/canvas.py779-801 api/apps/canvas_app.py551-562

State Lifecycle in Workflow Execution

Sources: agent/canvas.py369-661

Summary

RAGFlow's state management system provides a robust foundation for complex agent workflows through five distinct state buckets (globals, history, retrieval, memory, variables) and a flexible variable resolution system. Key characteristics:

Separation of Concerns: Each bucket serves a specific purpose (system state, conversation context, retrieved data, long-term memory, user variables)
Component Isolation: Components read from and write to their own input/output dictionaries while accessing shared state through variable references
Variable Resolution: A unified syntax ({component@output}, {sys.*}, {env.*}) enables data flow between components with nested path support
State Persistence: Complete state serialization to JSON enables session resumption and multi-turn conversations
Debugging Support: Input form generation, debug execution, and trace logging facilitate component development and troubleshooting

This architecture enables building sophisticated RAG applications with complex data flows while maintaining clarity and debuggability.

Sources: agent/canvas.py40-831 agent/component/base.py365-585 api/db/services/canvas_service.py192-251

State and Variable Management

Relevant source files

For information about the Canvas execution engine and workflow orchestration, see Canvas Engine and DSL. For component architecture and lifecycle, see Component System Architecture.

State Architecture Overview

The Five State Buckets

Sources: agent/canvas.py284-316 agent/canvas.py191-267

Globals Bucket

The globals bucket stores system-wide variables that are available to all components throughout the workflow execution.

System Variables

System variables are automatically managed by the Canvas engine and follow the sys.* naming convention:

Variable	Type	Description	Updated By
`sys.query`	string	Current user query	Canvas.run()
`sys.user_id`	string	Tenant/user identifier	Canvas initialization
`sys.conversation_turns`	integer	Number of conversation rounds	Canvas.run() increment
`sys.files`	list[str]	Uploaded file contents (images as base64, docs as text)	Canvas.run()
`sys.history`	list[str]	Formatted conversation history	add_user_input(), workflow completion

Sources: agent/canvas.py284-290 agent/canvas.py389-397

Environment Variables

User-defined variables follow the env.* naming convention and are configured in the Canvas DSL:

Environment variables are accessed as {env.api_key} in component parameters and are reset to their default values when Canvas.reset() is called.

Sources: agent/canvas.py310-367

Global Variable Lifecycle

Sources: agent/canvas.py324-367 agent/canvas.py389-397

History Bucket

The history bucket maintains the conversation context as a list of (role, content) tuples.

History Structure

History Management Methods

Method	Parameters	Description
`add_user_input(question)`	question: str	Appends user input to history
`get_history(window_size)`	window_size: int	Returns last N conversation turns

Window Size Calculation:

Window size of 13 returns last 26 entries (13 * 2 for user/assistant pairs)
Accessed as history[window_size * -2:]
Used to limit context sent to LLMs

Sources: agent/canvas.py297-309 agent/canvas.py718-731

Retrieval Bucket

The retrieval bucket accumulates retrieved document chunks and document aggregations throughout workflow execution. Each workflow run appends a new retrieval result.

Retrieval Structure

Retrieval Methods

Method	Parameters	Description
`add_reference(chunks, doc_infos)`	chunks: list, doc_infos: list	Adds chunks and doc aggregations to current retrieval
`get_reference()`	-	Returns latest retrieval result

Hash-Based Deduplication: Chunk IDs are hashed to 500 buckets using hash_str2int(chunk_id, 500) to prevent duplicate chunks within a single retrieval result.

Sources: agent/canvas.py315-316 agent/canvas.py803-821 rag/prompts/generator.py40-59

Memory Bucket

The memory bucket stores summaries of tool invocations for long-term agent context. Each entry is a tuple of (user_request, assistant_response, summary).

Memory Structure

Memory Methods

Method	Parameters	Description
`add_memory(user, assist, summ)`	user: str, assist: str, summ: str	Appends tool call summary to memory
`get_memory()`	-	Returns list of memory tuples

Memory Generation: The Agent component generates memory summaries using the tool_call_summary() function, which uses an LLM to create concise summaries of tool invocations.

Sources: agent/canvas.py823-827 rag/prompts/generator.py448-456

Component Variables and Data Flow

Components maintain their own input and output state through _param.inputs and _param.outputs dictionaries. These enable data flow between components via variable references.

Variable Reference Syntax

RAGFlow uses a three-part variable reference system:

Syntax	Example	Description
`{component_id@output_key}`	`{retrieval_0@content}`	Reference component output
`{[email protected]}`	`{[email protected]}`	Reference nested output
`{sys.variable}`	`{sys.query}`	Reference system variable
`{env.variable}`	`{env.api_key}`	Reference environment variable

Variable Resolution Pattern

Sources: agent/canvas.py164-235

Variable Resolution Implementation

The get_variable_value() method handles all variable resolution:

Nested Path Traversal: The get_variable_param_value() method supports accessing nested data structures:

Dictionary access: {[email protected]}
List access: {[email protected]}
Object attributes: {[email protected]}

Sources: agent/canvas.py191-235

String Template Resolution

The get_value_with_variable() method resolves variable references within strings:

The method:

Finds all variable patterns using regex: \{* *\{([a-zA-Z:0-9]+@[A-Za-z0-9_.-]+|sys\.[A-Za-z0-9_.]+|env\.[A-Za-z0-9_.]+)\} *\}*
Resolves each variable via get_variable_value()
Handles partial functions (streaming outputs) by consuming the stream
Replaces variable references with resolved values
Converts non-string values to JSON

Sources: agent/canvas.py164-189

Component Input/Output Flow

Components read inputs and write outputs through their parameter objects. The Canvas engine resolves variable references during component invocation.

Component Input Resolution

Sources: agent/component/base.py478-492

Input Element Discovery

Components discover their input variables by scanning parameter values for variable references:

This enables dynamic input form generation for component debugging.

Sources: agent/component/base.py500-511

Output Structure

Component outputs are stored in _param.outputs as dictionaries with value and type keys:

Reserved Output Keys:

_ERROR: Error message if component failed
_created_time: Timestamp when component started
_elapsed_time: Execution duration
_next: For branching components (Categorize, Switch), list of downstream component IDs

Sources: agent/component/base.py453-461

State Persistence and Session Management

Canvas state is persisted to support multi-turn conversations and session resumption.

DSL State Serialization

The Canvas.__str__() method serializes the complete state to JSON:

Sources: agent/canvas.py109-128 agent/canvas.py318-322

Session-Based State Management

Sources: api/db/services/canvas_service.py192-251

State Reset Behavior

The Canvas.reset() method clears state while preserving configuration:

Sources: agent/canvas.py324-367

Advanced Variable Patterns

Streaming Outputs with Partial Functions

Components can output functools.partial objects for streaming data:

The Canvas engine handles partial functions specially:

During variable resolution, the stream is consumed and concatenated
Message components stream partial outputs in real-time
After streaming completes, the full text is set as the output value

Sources: agent/canvas.py175-179 agent/canvas.py506-549

Retrieval Reference Tracking

Components can track which component's retrieval results they're using:

This enables citation generation when multiple retrieval components exist.

Sources: agent/component/base.py507-509

Variable Type Validation

Environment variables support type definitions:

Types are used for:

UI rendering (text input vs number input vs checkbox)
Default value assignment during reset
Documentation generation

Sources: agent/canvas.py346-366

State Access Patterns in Common Components

LLM Component

Sources: agent/component/llm.py226-259

Retrieval Component

Agent Component

Sources: agent/component/agent_with_tools.py278-410

Categorize Component

Sources: agent/component/categorize.py108-156

Debugging and Inspection

Component Input Form Generation

The Canvas provides an API to inspect component inputs for debugging:

This returns resolved input elements:

Sources: api/apps/canvas_app.py312-329

Component Debug Execution

The debug endpoint allows testing individual components with mock inputs:

The debug flow:

Creates fresh Canvas instance
Resets state
Sets _param.debug_inputs with provided params
Invokes component with mock inputs
Consumes any streaming outputs
Returns all outputs

Sources: api/apps/canvas_app.py332-366

Tool Call Trace Logging

Agent components log tool invocations to Redis for debugging:

Traces are retrieved via:

Sources: agent/canvas.py779-801 api/apps/canvas_app.py551-562

State Lifecycle in Workflow Execution

Sources: agent/canvas.py369-661

Summary

Separation of Concerns: Each bucket serves a specific purpose (system state, conversation context, retrieved data, long-term memory, user variables)
Component Isolation: Components read from and write to their own input/output dictionaries while accessing shared state through variable references
Variable Resolution: A unified syntax ({component@output}, {sys.*}, {env.*}) enables data flow between components with nested path support
State Persistence: Complete state serialization to JSON enables session resumption and multi-turn conversations
Debugging Support: Input form generation, debug execution, and trace logging facilitate component development and troubleshooting

This architecture enables building sophisticated RAG applications with complex data flows while maintaining clarity and debuggability.

Sources: agent/canvas.py40-831 agent/component/base.py365-585 api/db/services/canvas_service.py192-251

State and Variable Management

State Architecture Overview

The Five State Buckets

Globals Bucket

System Variables

Environment Variables

Global Variable Lifecycle

History Bucket

History Structure

History Management Methods

Retrieval Bucket

Retrieval Structure

Retrieval Methods

Memory Bucket

Memory Structure

Memory Methods

Component Variables and Data Flow

Variable Reference Syntax

Variable Resolution Pattern

Variable Resolution Implementation

String Template Resolution

Component Input/Output Flow

Component Input Resolution

Input Element Discovery

Output Structure

State Persistence and Session Management

DSL State Serialization

Session-Based State Management

State Reset Behavior

Advanced Variable Patterns

Streaming Outputs with Partial Functions

Retrieval Reference Tracking

Variable Type Validation

State Access Patterns in Common Components

LLM Component

Retrieval Component

Agent Component

Categorize Component

Debugging and Inspection

Component Input Form Generation

Component Debug Execution

Tool Call Trace Logging

State Lifecycle in Workflow Execution

Summary

On this page

State and Variable Management

State Architecture Overview

The Five State Buckets

Globals Bucket

System Variables

Environment Variables

Global Variable Lifecycle

History Bucket

History Structure

History Management Methods

Retrieval Bucket

Retrieval Structure

Retrieval Methods

Memory Bucket

Memory Structure

Memory Methods

Component Variables and Data Flow

Variable Reference Syntax

Variable Resolution Pattern

Variable Resolution Implementation

String Template Resolution

Component Input/Output Flow

Component Input Resolution

Input Element Discovery

Output Structure

State Persistence and Session Management

DSL State Serialization

Session-Based State Management

State Reset Behavior

Advanced Variable Patterns

Streaming Outputs with Partial Functions

Retrieval Reference Tracking

Variable Type Validation

State Access Patterns in Common Components

LLM Component