Distributed Execution and Scaling

Relevant source files

Purpose and Scope

This document covers n8n's distributed execution architecture, which enables horizontal scaling by distributing workflow executions across multiple worker processes. This includes the queue-based execution system, Bull/Redis integration, worker coordination, concurrency control, and the communication patterns between main and worker processes.

For information about the workflow execution lifecycle within a single process, see Workflow Execution Lifecycle. For details on the overall runtime architecture including process types, see Runtime Architecture and Process Models.

Execution Modes Overview

n8n supports two primary execution modes that determine how workflow executions are processed:

Mode	Description	Use Case
Regular	Workflows execute directly in the main process	Single-instance deployments, development
Queue	Workflows are enqueued to Redis and processed by worker instances	Production deployments requiring horizontal scaling

The execution mode is determined by the EXECUTIONS_MODE environment variable and accessed via ExecutionsConfig.mode.

Mode Selection in WorkflowRunner

When WorkflowRunner.run() is called, it decides whether to enqueue or execute directly:

shouldEnqueue = mode === 'queue' && executionMode !== 'manual'

By default, manual executions run in the main process even in queue mode, though this can be overridden with OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true.

Sources: packages/cli/src/workflow-runner.ts170-179

Queue-Based Architecture

Bull Queue and Redis

n8n uses Bull a Redis-backed job queue, to coordinate distributed execution. The queue infrastructure is managed by ScalingService.

Queue Initialization

The queue is initialized with a prefix (default: bull) and settings including maxStalledCount: 0 to disable Bull's automatic stall recovery (n8n implements its own queue recovery mechanism).

Sources: packages/cli/src/scaling/scaling.service.ts57-81

Job Data Structure

Each job in the queue carries the following data:

Field	Type	Description
`executionId`	string	Unique execution identifier
`workflowId`	string	Workflow being executed
`loadStaticData`	boolean	Whether to load workflow static data from DB
`pushRef`	string?	WebSocket reference for UI updates
`streamingEnabled`	boolean?	Whether streaming responses are enabled

Sources: packages/cli/src/scaling/scaling.types.ts17-23

Job Lifecycle

Sources: packages/cli/src/workflow-runner.ts370-410 packages/cli/src/scaling/job-processor.ts54-276 packages/cli/src/scaling/scaling.service.ts87-106

Main Process Responsibilities

Enqueueing Executions

The main process enqueues executions when in queue mode. The WorkflowRunner.enqueueExecution() method handles this:

Enqueue Process

Lazy Queue Setup: If ScalingService is not initialized, it's dynamically imported and the queue is set up
Create Job Data: Build JobData with execution metadata
Add to Queue: Call ScalingService.addJob() with priority (realtime: 50, non-realtime: 100)
Attach Cancelable Promise: Create a PCancelable wrapper that allows cancellation via stopJob()

jobData = {
    workflowId,
    executionId,
    loadStaticData: boolean,
    pushRef,
    streamingEnabled
}

Sources: packages/cli/src/workflow-runner.ts370-410 packages/cli/src/scaling/scaling.service.ts191-214

Job Lifecycle Hooks

The main process uses getLifecycleHooksForScalingMain() to set up minimal hooks that only fire workflowExecuteBefore. The actual post-execution hooks run on the worker to avoid split execution logging.

Sources: packages/cli/src/workflow-runner.ts398-402

Listening for Job Messages

Main and webhook processes register listeners on the global:progress event to receive messages from workers:

Message Types Handled by Main

Message Kind	Purpose	Handler Action
`send-chunk`	Stream chunk to client	Forward to `ActiveExecutions.sendChunk()`
`respond-to-webhook`	Send webhook response	Resolve response promise with decoded response
`job-finished`	Execution completed	Resolve response promise, log completion
`job-failed`	Execution failed	Log error with stack trace

Main/Webhook Listener Implementation

The listener examines the message kind and routes accordingly:

switch (msg.kind) {
    case 'send-chunk':
        activeExecutions.sendChunk(msg.executionId, msg.chunkText)
    case 'respond-to-webhook':
        decodedResponse = decodeWebhookResponse(msg.response)
        activeExecutions.resolveResponsePromise(msg.executionId, decodedResponse)
    case 'job-finished':
        activeExecutions.resolveResponsePromise(msg.executionId, success ? {} : error)
    case 'job-failed':
        logger.error(msg.errorMsg + msg.errorStack)
}

Sources: packages/cli/src/scaling/scaling.service.ts300-370

Queue Recovery

The leader main instance periodically checks for "dangling" executions—those marked as running in the database but missing from the queue. This handles cases where executions were not properly cleaned up.

Recovery Process

Configuration

N8N_EXECUTIONS_QUEUE_RECOVERY_INTERVAL: Minutes between checks (default: 180)
N8N_EXECUTIONS_QUEUE_RECOVERY_BATCH_SIZE: Max executions per check (default: 100)

The recovery mechanism accelerates if it finds a full batch, indicating more dangling executions may exist.

Sources: packages/cli/src/scaling/scaling.service.ts458-523

Queue Metrics Collection

When N8N_METRICS_INCLUDE_QUEUE_METRICS=true, the main instance collects queue metrics periodically:

Metrics Collected

active: Jobs currently being processed
waiting: Jobs in queue waiting for workers
completed: Jobs finished in current interval
failed: Jobs failed in current interval

These are emitted as job-counts-updated events for Prometheus exposition.

Sources: packages/cli/src/scaling/scaling.service.ts422-447

Worker Process Responsibilities

Worker Setup and Job Processing

Workers are initialized via the n8n worker command, which calls ScalingService.setupWorker(concurrency):

Worker Configuration

queue.process(JOB_TYPE_NAME, concurrency, async (job) => {
    await jobProcessor.processJob(job)
})

The concurrency parameter (from N8N_CONCURRENCY_PRODUCTION_LIMIT) determines how many jobs a single worker can process simultaneously.

Sources: packages/cli/src/scaling/scaling.service.ts83-109

Job Processing Flow

JobProcessor.processJob() Workflow

Key Components

Execution Retrieval: Load full execution data including workflowData, data, and mode
Static Data: Optionally reload workflow static data if loadStaticData=true
Timeout Calculation: Apply workflow-level and instance-level timeout constraints
Lifecycle Hooks: Use getLifecycleHooksForScalingWorker() which includes full hook suite
Execution: Route to WorkflowExecute or ManualExecutionService based on execution type

Sources: packages/cli/src/scaling/job-processor.ts54-276

Worker-Specific Lifecycle Hooks

Workers use getLifecycleHooksForScalingWorker() which includes handlers for:

Hook Handlers

Hook	Purpose
`sendResponse`	Send webhook responses via `job.progress(RespondToWebhookMessage)`
`sendChunk`	Stream chunks to main via `job.progress(SendChunkMessage)`
`workflowExecuteBefore`	Standard pre-execution hook
`workflowExecuteAfter`	Full post-execution processing including DB save, error workflows, statistics
`nodeExecuteBefore`	Node-level pre-execution
`nodeExecuteAfter`	Node-level post-execution, save progress if enabled

The key difference from main hooks: workers handle the full workflowExecuteAfter lifecycle including database persistence, while main instances only handle minimal UI notification.

Sources: packages/cli/src/execution-lifecycle/execution-lifecycle-hooks.ts331-392

Listening for Abort Commands

Workers listen for abort-job messages on the global:progress event:

queue.on('global:progress', (jobId, msg) => {
    if (msg.kind === 'abort-job') {
        jobProcessor.stopJob(jobId)
    }
})

When received, the worker cancels the running execution via workflowExecution.cancel().

Sources: packages/cli/src/scaling/scaling.service.ts274-279 packages/cli/src/scaling/job-processor.ts287-301

Communication Between Processes

Job Message Protocol

Main and worker processes communicate via Bull's job.progress() mechanism, which publishes messages to the Redis pub/sub channel. All messages implement the JobMessage union type.

Job Message Types

Message Flow Examples

Streaming Response: Worker → SendChunkMessage → Main → ActiveExecutions.sendChunk() → HTTP Response
Webhook Response: Worker → RespondToWebhookMessage → Main → ActiveExecutions.resolveResponsePromise()
Execution Complete: Worker → JobFinishedMessage → Main → Resolve post-execute promise
Cancel Execution: Main → AbortJobMessage → Worker → Cancel PCancelable

Sources: packages/cli/src/scaling/scaling.types.ts40-72 packages/cli/src/scaling/job-processor.ts149-169 packages/cli/src/scaling/scaling.service.ts312-364

Binary Data Encoding

When sending webhook responses that contain buffers, workers encode them as base64 strings:

if (Buffer.isBuffer(response.body)) {
    response.body = {
        '__@N8nEncodedBuffer@__': response.body.toString(BINARY_ENCODING)
    }
}

Main decodes these before resolving the response promise:

if ('__@N8nEncodedBuffer@__' in response.body) {
    response.body = Buffer.from(response.body['__@N8nEncodedBuffer@__'], BINARY_ENCODING)
}

Sources: packages/cli/src/scaling/job-processor.ts311-321 packages/cli/src/scaling/scaling.service.ts379-393

Push Service for UI Updates

While job messages handle execution coordination, UI updates flow through the Push service (WebSocket-based). Workers do not directly send push messages; instead, their lifecycle hooks persist execution data to the database, and main instances poll or receive events to update the UI.

Concurrency Control

n8n implements concurrency throttling to limit the number of simultaneous executions, preventing resource exhaustion. This system is separate from the Bull queue's job concurrency.

Concurrency Queues

The ConcurrencyControlService maintains two separate queues:

Queue Type	Purpose	Config Variable
`production`	Production executions (webhook, trigger, etc.)	`N8N_CONCURRENCY_PRODUCTION_LIMIT`
`evaluation`	Evaluation/manual test executions	`N8N_CONCURRENCY_EVALUATION_LIMIT`

Each queue has an independent capacity. Setting a limit to -1 disables throttling for that queue.

Sources: packages/cli/src/concurrency/concurrency-control.service.ts15-77

Throttling Mechanism

Concurrency Control Flow

ConcurrencyQueue Implementation

The ConcurrencyQueue maintains a capacity counter and a FIFO queue of waiting executions:

capacity = limit
queue = [{executionId, resolve}]

enqueue(executionId):
    capacity--
    if capacity < 0:
        return new Promise(resolve => queue.push({executionId, resolve}))

dequeue():
    capacity++
    if queue.length > 0:
        {executionId, resolve} = queue.shift()
        resolve()

Sources: packages/cli/src/concurrency/concurrency-queue.ts1-62 packages/cli/src/concurrency/concurrency-control.service.ts107-188

Capacity Reservation Pattern

Executions reserve capacity before starting and release it upon completion. The ConcurrencyCapacityReservation class encapsulates this:

reservation = new ConcurrencyCapacityReservation(concurrencyControl)

try {
    await reservation.reserve({mode, executionId})
    // ... run execution ...
} finally {
    reservation.release()
}

This ensures capacity is always released, even if execution creation fails.

Sources: packages/cli/src/concurrency/concurrency-capacity-reservation.ts1-48 packages/cli/src/active-executions.ts56-154

Queue Mode and Concurrency Control

Important: Concurrency control is disabled in queue mode (EXECUTIONS_MODE=queue). The throttling mechanism only applies to regular mode. In queue mode, concurrency is controlled by:

Number of worker processes
Worker concurrency setting (N8N_CONCURRENCY_PRODUCTION_LIMIT on worker)
Bull queue management

Sources: packages/cli/src/concurrency/concurrency-control.service.ts64-69

Scaling Patterns

Multi-Main with Leader Election

Multiple main instances can run simultaneously for high availability. One instance is elected as the "leader" and handles singleton responsibilities:

Leader Responsibilities

Queue recovery checks
Queue metrics collection (if enabled)
Active workflow management (in queue mode, webhook processes may also activate workflows)

Leader Election

n8n uses Redis-based leader election via the @n8n/multi-main-setup package. The InstanceSettings.isLeader flag determines leader status, and decorators trigger actions:

@OnLeaderTakeover(): Executed when becoming leader
@OnLeaderStepdown(): Executed when losing leadership

Sources: packages/cli/src/scaling/scaling.service.ts458-488

Multiple Workers

Worker processes can be horizontally scaled by running multiple n8n worker instances. Each worker:

Connects to the same Redis instance
Competes for jobs from the shared queue
Processes jobs independently with its own concurrency limit

Worker Scaling Considerations

Each worker's concurrency setting multiplies total processing capacity
Workers should have access to the same database and binary data storage
Environment variables must be consistent across workers

Dedicated Webhook Instances

In large deployments, webhook handling can be offloaded to dedicated processes:

Webhook instances:

Do not process queued executions
Handle only production webhook traffic
Enqueue workflow executions like main instances
Register webhook listeners (not worker listeners)

This separation allows scaling webhook ingestion independently from execution processing.

Sources: packages/cli/src/scaling/scaling.service.ts260-269

Active Executions Management

The ActiveExecutions service tracks in-progress executions in the current process, not globally. This is crucial in distributed mode.

Active Executions Lifecycle

State Tracking

Each active execution stores:

Sources: packages/cli/src/interfaces.ts116-126 packages/cli/src/active-executions.ts56-154

Waiting Executions

When a workflow enters a waiting state (e.g., via Wait node or Form node), it remains in activeExecutions but releases the workflowExecution reference. This prevents memory leaks while maintaining the ability to resume.

Resume Behavior

On resume, ActiveExecutions.add() is called with the existing executionId. The service:

Preserves the original startedAt timestamp
Preserves the responsePromise for webhook workflows
Updates the execution status to running

Sources: packages/cli/src/active-executions.ts193-207

Shutdown Handling

During graceful shutdown:

Regular Mode

Disable concurrency control (no new executions can start)
Cancel all running executions if cancelAll=true
Wait for remaining executions to complete

Queue Mode (Worker)

Pause the queue (no new jobs accepted)
Wait for running jobs to complete (poll every 500ms)

Sources: packages/cli/src/active-executions.ts290-322 packages/cli/src/scaling/scaling.service.ts155-169

Configuration Reference

Execution Mode Configuration

Variable	Default	Description
`EXECUTIONS_MODE`	`regular`	Execution mode: `regular` or `queue`
`OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS`	`false`	Whether to enqueue manual executions in queue mode

Queue Configuration

Variable	Default	Description
`QUEUE_BULL_PREFIX`	`bull`	Redis key prefix for Bull
`QUEUE_BULL_REDIS_HOST`	`localhost`	Redis host
`QUEUE_BULL_REDIS_PORT`	`6379`	Redis port
`QUEUE_BULL_REDIS_DB`	`0`	Redis database number
`QUEUE_BULL_REDIS_PASSWORD`	-	Redis password
`QUEUE_BULL_REDIS_TIMEOUT_THRESHOLD`	`10000`	Redis connection timeout (ms)

Worker Configuration

Variable	Default	Description
`N8N_CONCURRENCY_PRODUCTION_LIMIT`	`-1`	Max concurrent production executions per worker

Queue Recovery

Variable	Default	Description
`N8N_EXECUTIONS_QUEUE_RECOVERY_INTERVAL`	`180`	Minutes between recovery checks
`N8N_EXECUTIONS_QUEUE_RECOVERY_BATCH_SIZE`	`100`	Max executions per recovery check

Concurrency Control (Regular Mode Only)

Variable	Default	Description
`N8N_CONCURRENCY_PRODUCTION_LIMIT`	`-1`	Max concurrent production executions
`N8N_CONCURRENCY_EVALUATION_LIMIT`	`-1`	Max concurrent evaluation executions

Sources: packages/cli/src/scaling/scaling.service.ts57-73 packages/cli/src/concurrency/concurrency-control.service.ts26-77

Error Handling and Recovery

Job Stalled Detection

Bull has a built-in stalled job detection mechanism, but n8n disables it (maxStalledCount: 0) because:

False positives occur when workers are under heavy load
n8n implements its own queue recovery at the leader level

If a "job stalled" error is detected, n8n wraps it in MaxStalledCountError and emits a job-stalled event.

Sources: packages/cli/src/scaling/scaling.service.ts70 packages/cli/src/workflow-runner.ts434-447

Execution Recovery

The queue recovery mechanism (described earlier) handles dangling executions by:

Querying database for running/new executions
Querying Bull queue for active/waiting jobs
Marking missing executions as crashed

This runs periodically on the leader instance to catch executions that were never properly cleaned up due to crashes or network issues.

Sources: packages/cli/src/scaling/scaling.service.ts490-523

Job Cancellation

Jobs can be cancelled in two ways:

1. Before Processing (Waiting in Queue)

await job.remove()

2. During Processing (Active)

await job.progress({ kind: 'abort-job' })
await job.discard()  // prevent retries
await job.moveToFailed(error, true)

The worker receives the abort message and cancels the execution via PCancelable.cancel().

Sources: packages/cli/src/scaling/scaling.service.ts226-252

Performance Considerations

Job Priority

Jobs are enqueued with priority values:

Realtime executions: Priority 50
Normal executions: Priority 100

Lower priority values are processed first.

Sources: packages/cli/src/workflow-runner.ts396 packages/cli/src/scaling/scaling.service.ts191-214

Concurrency Tuning

Worker Concurrency

Set N8N_CONCURRENCY_PRODUCTION_LIMIT on each worker
Formula: Total Capacity = Workers × Per-Worker Concurrency
Consider memory and CPU resources per execution

Regular Mode Concurrency

Set N8N_CONCURRENCY_PRODUCTION_LIMIT on main instance
Prevents main instance from being overwhelmed
Separate limits for production and evaluation executions

Memory Management

Waiting Executions

Release workflowExecution reference to free memory
Keep minimal state in ActiveExecutions
Execution data remains in database

Job Data

removeOnComplete: true and removeOnFail: true in job options
Completed jobs are automatically removed from Redis
Reduces Redis memory usage

Sources: packages/cli/src/scaling/scaling.service.ts194-198 packages/cli/src/active-executions.ts136-148

Distributed Execution and Scaling

Relevant source files

Purpose and Scope

Execution Modes Overview

n8n supports two primary execution modes that determine how workflow executions are processed:

Mode	Description	Use Case
Regular	Workflows execute directly in the main process	Single-instance deployments, development
Queue	Workflows are enqueued to Redis and processed by worker instances	Production deployments requiring horizontal scaling

The execution mode is determined by the EXECUTIONS_MODE environment variable and accessed via ExecutionsConfig.mode.

Mode Selection in WorkflowRunner

When WorkflowRunner.run() is called, it decides whether to enqueue or execute directly:

shouldEnqueue = mode === 'queue' && executionMode !== 'manual'

By default, manual executions run in the main process even in queue mode, though this can be overridden with OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true.

Sources: packages/cli/src/workflow-runner.ts170-179

Queue-Based Architecture

Bull Queue and Redis

n8n uses Bull a Redis-backed job queue, to coordinate distributed execution. The queue infrastructure is managed by ScalingService.

Queue Initialization

The queue is initialized with a prefix (default: bull) and settings including maxStalledCount: 0 to disable Bull's automatic stall recovery (n8n implements its own queue recovery mechanism).

Sources: packages/cli/src/scaling/scaling.service.ts57-81

Job Data Structure

Each job in the queue carries the following data:

Field	Type	Description
`executionId`	string	Unique execution identifier
`workflowId`	string	Workflow being executed
`loadStaticData`	boolean	Whether to load workflow static data from DB
`pushRef`	string?	WebSocket reference for UI updates
`streamingEnabled`	boolean?	Whether streaming responses are enabled

Sources: packages/cli/src/scaling/scaling.types.ts17-23

Job Lifecycle

Sources: packages/cli/src/workflow-runner.ts370-410 packages/cli/src/scaling/job-processor.ts54-276 packages/cli/src/scaling/scaling.service.ts87-106

Main Process Responsibilities

Enqueueing Executions

The main process enqueues executions when in queue mode. The WorkflowRunner.enqueueExecution() method handles this:

Enqueue Process

Lazy Queue Setup: If ScalingService is not initialized, it's dynamically imported and the queue is set up
Create Job Data: Build JobData with execution metadata
Add to Queue: Call ScalingService.addJob() with priority (realtime: 50, non-realtime: 100)
Attach Cancelable Promise: Create a PCancelable wrapper that allows cancellation via stopJob()

jobData = {
    workflowId,
    executionId,
    loadStaticData: boolean,
    pushRef,
    streamingEnabled
}

Sources: packages/cli/src/workflow-runner.ts370-410 packages/cli/src/scaling/scaling.service.ts191-214

Job Lifecycle Hooks

Sources: packages/cli/src/workflow-runner.ts398-402

Listening for Job Messages

Main and webhook processes register listeners on the global:progress event to receive messages from workers:

Message Types Handled by Main

Message Kind	Purpose	Handler Action
`send-chunk`	Stream chunk to client	Forward to `ActiveExecutions.sendChunk()`
`respond-to-webhook`	Send webhook response	Resolve response promise with decoded response
`job-finished`	Execution completed	Resolve response promise, log completion
`job-failed`	Execution failed	Log error with stack trace

Main/Webhook Listener Implementation

The listener examines the message kind and routes accordingly:

switch (msg.kind) {
    case 'send-chunk':
        activeExecutions.sendChunk(msg.executionId, msg.chunkText)
    case 'respond-to-webhook':
        decodedResponse = decodeWebhookResponse(msg.response)
        activeExecutions.resolveResponsePromise(msg.executionId, decodedResponse)
    case 'job-finished':
        activeExecutions.resolveResponsePromise(msg.executionId, success ? {} : error)
    case 'job-failed':
        logger.error(msg.errorMsg + msg.errorStack)
}

Sources: packages/cli/src/scaling/scaling.service.ts300-370

Queue Recovery

Recovery Process

Configuration

N8N_EXECUTIONS_QUEUE_RECOVERY_INTERVAL: Minutes between checks (default: 180)
N8N_EXECUTIONS_QUEUE_RECOVERY_BATCH_SIZE: Max executions per check (default: 100)

The recovery mechanism accelerates if it finds a full batch, indicating more dangling executions may exist.

Sources: packages/cli/src/scaling/scaling.service.ts458-523

Queue Metrics Collection

When N8N_METRICS_INCLUDE_QUEUE_METRICS=true, the main instance collects queue metrics periodically:

Metrics Collected

active: Jobs currently being processed
waiting: Jobs in queue waiting for workers
completed: Jobs finished in current interval
failed: Jobs failed in current interval

These are emitted as job-counts-updated events for Prometheus exposition.

Sources: packages/cli/src/scaling/scaling.service.ts422-447

Worker Process Responsibilities

Worker Setup and Job Processing

Workers are initialized via the n8n worker command, which calls ScalingService.setupWorker(concurrency):

Worker Configuration

queue.process(JOB_TYPE_NAME, concurrency, async (job) => {
    await jobProcessor.processJob(job)
})

The concurrency parameter (from N8N_CONCURRENCY_PRODUCTION_LIMIT) determines how many jobs a single worker can process simultaneously.

Sources: packages/cli/src/scaling/scaling.service.ts83-109

Job Processing Flow

JobProcessor.processJob() Workflow

Key Components

Execution Retrieval: Load full execution data including workflowData, data, and mode
Static Data: Optionally reload workflow static data if loadStaticData=true
Timeout Calculation: Apply workflow-level and instance-level timeout constraints
Lifecycle Hooks: Use getLifecycleHooksForScalingWorker() which includes full hook suite
Execution: Route to WorkflowExecute or ManualExecutionService based on execution type

Sources: packages/cli/src/scaling/job-processor.ts54-276

Worker-Specific Lifecycle Hooks

Workers use getLifecycleHooksForScalingWorker() which includes handlers for:

Hook Handlers

Hook	Purpose
`sendResponse`	Send webhook responses via `job.progress(RespondToWebhookMessage)`
`sendChunk`	Stream chunks to main via `job.progress(SendChunkMessage)`
`workflowExecuteBefore`	Standard pre-execution hook
`workflowExecuteAfter`	Full post-execution processing including DB save, error workflows, statistics
`nodeExecuteBefore`	Node-level pre-execution
`nodeExecuteAfter`	Node-level post-execution, save progress if enabled

The key difference from main hooks: workers handle the full workflowExecuteAfter lifecycle including database persistence, while main instances only handle minimal UI notification.

Sources: packages/cli/src/execution-lifecycle/execution-lifecycle-hooks.ts331-392

Listening for Abort Commands

Workers listen for abort-job messages on the global:progress event:

queue.on('global:progress', (jobId, msg) => {
    if (msg.kind === 'abort-job') {
        jobProcessor.stopJob(jobId)
    }
})

When received, the worker cancels the running execution via workflowExecution.cancel().

Sources: packages/cli/src/scaling/scaling.service.ts274-279 packages/cli/src/scaling/job-processor.ts287-301

Communication Between Processes

Job Message Protocol

Main and worker processes communicate via Bull's job.progress() mechanism, which publishes messages to the Redis pub/sub channel. All messages implement the JobMessage union type.

Job Message Types

Message Flow Examples

Streaming Response: Worker → SendChunkMessage → Main → ActiveExecutions.sendChunk() → HTTP Response
Webhook Response: Worker → RespondToWebhookMessage → Main → ActiveExecutions.resolveResponsePromise()
Execution Complete: Worker → JobFinishedMessage → Main → Resolve post-execute promise
Cancel Execution: Main → AbortJobMessage → Worker → Cancel PCancelable

Sources: packages/cli/src/scaling/scaling.types.ts40-72 packages/cli/src/scaling/job-processor.ts149-169 packages/cli/src/scaling/scaling.service.ts312-364

Binary Data Encoding

When sending webhook responses that contain buffers, workers encode them as base64 strings:

if (Buffer.isBuffer(response.body)) {
    response.body = {
        '__@N8nEncodedBuffer@__': response.body.toString(BINARY_ENCODING)
    }
}

Main decodes these before resolving the response promise:

if ('__@N8nEncodedBuffer@__' in response.body) {
    response.body = Buffer.from(response.body['__@N8nEncodedBuffer@__'], BINARY_ENCODING)
}

Sources: packages/cli/src/scaling/job-processor.ts311-321 packages/cli/src/scaling/scaling.service.ts379-393

Push Service for UI Updates

Concurrency Control

n8n implements concurrency throttling to limit the number of simultaneous executions, preventing resource exhaustion. This system is separate from the Bull queue's job concurrency.

Concurrency Queues

The ConcurrencyControlService maintains two separate queues:

Queue Type	Purpose	Config Variable
`production`	Production executions (webhook, trigger, etc.)	`N8N_CONCURRENCY_PRODUCTION_LIMIT`
`evaluation`	Evaluation/manual test executions	`N8N_CONCURRENCY_EVALUATION_LIMIT`

Each queue has an independent capacity. Setting a limit to -1 disables throttling for that queue.

Sources: packages/cli/src/concurrency/concurrency-control.service.ts15-77

Throttling Mechanism

Concurrency Control Flow

ConcurrencyQueue Implementation

The ConcurrencyQueue maintains a capacity counter and a FIFO queue of waiting executions:

capacity = limit
queue = [{executionId, resolve}]

enqueue(executionId):
    capacity--
    if capacity < 0:
        return new Promise(resolve => queue.push({executionId, resolve}))

dequeue():
    capacity++
    if queue.length > 0:
        {executionId, resolve} = queue.shift()
        resolve()

Sources: packages/cli/src/concurrency/concurrency-queue.ts1-62 packages/cli/src/concurrency/concurrency-control.service.ts107-188

Capacity Reservation Pattern

Executions reserve capacity before starting and release it upon completion. The ConcurrencyCapacityReservation class encapsulates this:

reservation = new ConcurrencyCapacityReservation(concurrencyControl)

try {
    await reservation.reserve({mode, executionId})
    // ... run execution ...
} finally {
    reservation.release()
}

This ensures capacity is always released, even if execution creation fails.

Sources: packages/cli/src/concurrency/concurrency-capacity-reservation.ts1-48 packages/cli/src/active-executions.ts56-154

Queue Mode and Concurrency Control

Important: Concurrency control is disabled in queue mode (EXECUTIONS_MODE=queue). The throttling mechanism only applies to regular mode. In queue mode, concurrency is controlled by:

Number of worker processes
Worker concurrency setting (N8N_CONCURRENCY_PRODUCTION_LIMIT on worker)
Bull queue management

Sources: packages/cli/src/concurrency/concurrency-control.service.ts64-69

Scaling Patterns

Multi-Main with Leader Election

Multiple main instances can run simultaneously for high availability. One instance is elected as the "leader" and handles singleton responsibilities:

Leader Responsibilities

Queue recovery checks
Queue metrics collection (if enabled)
Active workflow management (in queue mode, webhook processes may also activate workflows)

Leader Election

n8n uses Redis-based leader election via the @n8n/multi-main-setup package. The InstanceSettings.isLeader flag determines leader status, and decorators trigger actions:

@OnLeaderTakeover(): Executed when becoming leader
@OnLeaderStepdown(): Executed when losing leadership

Sources: packages/cli/src/scaling/scaling.service.ts458-488

Multiple Workers

Worker processes can be horizontally scaled by running multiple n8n worker instances. Each worker:

Connects to the same Redis instance
Competes for jobs from the shared queue
Processes jobs independently with its own concurrency limit

Worker Scaling Considerations

Each worker's concurrency setting multiplies total processing capacity
Workers should have access to the same database and binary data storage
Environment variables must be consistent across workers

Dedicated Webhook Instances

In large deployments, webhook handling can be offloaded to dedicated processes:

Webhook instances:

Do not process queued executions
Handle only production webhook traffic
Enqueue workflow executions like main instances
Register webhook listeners (not worker listeners)

This separation allows scaling webhook ingestion independently from execution processing.

Sources: packages/cli/src/scaling/scaling.service.ts260-269

Active Executions Management

The ActiveExecutions service tracks in-progress executions in the current process, not globally. This is crucial in distributed mode.

Active Executions Lifecycle

State Tracking

Each active execution stores:

Sources: packages/cli/src/interfaces.ts116-126 packages/cli/src/active-executions.ts56-154

Waiting Executions

Resume Behavior

On resume, ActiveExecutions.add() is called with the existing executionId. The service:

Preserves the original startedAt timestamp
Preserves the responsePromise for webhook workflows
Updates the execution status to running

Sources: packages/cli/src/active-executions.ts193-207

Shutdown Handling

During graceful shutdown:

Regular Mode

Disable concurrency control (no new executions can start)
Cancel all running executions if cancelAll=true
Wait for remaining executions to complete

Queue Mode (Worker)

Pause the queue (no new jobs accepted)
Wait for running jobs to complete (poll every 500ms)

Sources: packages/cli/src/active-executions.ts290-322 packages/cli/src/scaling/scaling.service.ts155-169

Configuration Reference

Execution Mode Configuration

Variable	Default	Description
`EXECUTIONS_MODE`	`regular`	Execution mode: `regular` or `queue`
`OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS`	`false`	Whether to enqueue manual executions in queue mode

Queue Configuration

Variable	Default	Description
`QUEUE_BULL_PREFIX`	`bull`	Redis key prefix for Bull
`QUEUE_BULL_REDIS_HOST`	`localhost`	Redis host
`QUEUE_BULL_REDIS_PORT`	`6379`	Redis port
`QUEUE_BULL_REDIS_DB`	`0`	Redis database number
`QUEUE_BULL_REDIS_PASSWORD`	-	Redis password
`QUEUE_BULL_REDIS_TIMEOUT_THRESHOLD`	`10000`	Redis connection timeout (ms)

Worker Configuration

Variable	Default	Description
`N8N_CONCURRENCY_PRODUCTION_LIMIT`	`-1`	Max concurrent production executions per worker

Queue Recovery

Variable	Default	Description
`N8N_EXECUTIONS_QUEUE_RECOVERY_INTERVAL`	`180`	Minutes between recovery checks
`N8N_EXECUTIONS_QUEUE_RECOVERY_BATCH_SIZE`	`100`	Max executions per recovery check

Concurrency Control (Regular Mode Only)

Variable	Default	Description
`N8N_CONCURRENCY_PRODUCTION_LIMIT`	`-1`	Max concurrent production executions
`N8N_CONCURRENCY_EVALUATION_LIMIT`	`-1`	Max concurrent evaluation executions

Sources: packages/cli/src/scaling/scaling.service.ts57-73 packages/cli/src/concurrency/concurrency-control.service.ts26-77

Error Handling and Recovery

Job Stalled Detection

Bull has a built-in stalled job detection mechanism, but n8n disables it (maxStalledCount: 0) because:

False positives occur when workers are under heavy load
n8n implements its own queue recovery at the leader level

If a "job stalled" error is detected, n8n wraps it in MaxStalledCountError and emits a job-stalled event.

Sources: packages/cli/src/scaling/scaling.service.ts70 packages/cli/src/workflow-runner.ts434-447

Execution Recovery

The queue recovery mechanism (described earlier) handles dangling executions by:

Querying database for running/new executions
Querying Bull queue for active/waiting jobs
Marking missing executions as crashed

This runs periodically on the leader instance to catch executions that were never properly cleaned up due to crashes or network issues.

Sources: packages/cli/src/scaling/scaling.service.ts490-523

Job Cancellation

Jobs can be cancelled in two ways:

1. Before Processing (Waiting in Queue)

await job.remove()

2. During Processing (Active)

await job.progress({ kind: 'abort-job' })
await job.discard()  // prevent retries
await job.moveToFailed(error, true)

The worker receives the abort message and cancels the execution via PCancelable.cancel().

Sources: packages/cli/src/scaling/scaling.service.ts226-252

Performance Considerations

Job Priority

Jobs are enqueued with priority values:

Realtime executions: Priority 50
Normal executions: Priority 100

Lower priority values are processed first.

Sources: packages/cli/src/workflow-runner.ts396 packages/cli/src/scaling/scaling.service.ts191-214

Concurrency Tuning

Worker Concurrency

Set N8N_CONCURRENCY_PRODUCTION_LIMIT on each worker
Formula: Total Capacity = Workers × Per-Worker Concurrency
Consider memory and CPU resources per execution

Regular Mode Concurrency

Set N8N_CONCURRENCY_PRODUCTION_LIMIT on main instance
Prevents main instance from being overwhelmed
Separate limits for production and evaluation executions

Memory Management

Waiting Executions

Release workflowExecution reference to free memory
Keep minimal state in ActiveExecutions
Execution data remains in database

Job Data

removeOnComplete: true and removeOnFail: true in job options
Completed jobs are automatically removed from Redis
Reduces Redis memory usage

Sources: packages/cli/src/scaling/scaling.service.ts194-198 packages/cli/src/active-executions.ts136-148

Distributed Execution and Scaling

Purpose and Scope

Execution Modes Overview

Queue-Based Architecture

Bull Queue and Redis

Job Data Structure

Job Lifecycle

Main Process Responsibilities

Enqueueing Executions

Job Lifecycle Hooks

Listening for Job Messages

Queue Recovery

Queue Metrics Collection

Worker Process Responsibilities

Worker Setup and Job Processing

Job Processing Flow

Worker-Specific Lifecycle Hooks

Listening for Abort Commands

Communication Between Processes

Job Message Protocol

Binary Data Encoding

Push Service for UI Updates

Concurrency Control

Concurrency Queues

Throttling Mechanism

Capacity Reservation Pattern

Queue Mode and Concurrency Control

Scaling Patterns

Multi-Main with Leader Election

Multiple Workers

Dedicated Webhook Instances

Active Executions Management

Active Executions Lifecycle

Waiting Executions

Shutdown Handling

Configuration Reference

Execution Mode Configuration

Queue Configuration

Worker Configuration

Queue Recovery

Concurrency Control (Regular Mode Only)

Error Handling and Recovery

Job Stalled Detection

Execution Recovery

Job Cancellation

Performance Considerations

Job Priority

Concurrency Tuning

Memory Management

On this page

Distributed Execution and Scaling

Purpose and Scope

Execution Modes Overview

Queue-Based Architecture

Bull Queue and Redis

Job Data Structure

Job Lifecycle

Main Process Responsibilities

Enqueueing Executions

Job Lifecycle Hooks

Listening for Job Messages

Queue Recovery

Queue Metrics Collection

Worker Process Responsibilities

Worker Setup and Job Processing

Job Processing Flow

Worker-Specific Lifecycle Hooks

Listening for Abort Commands

Communication Between Processes

Job Message Protocol

Binary Data Encoding

Push Service for UI Updates

Concurrency Control

Concurrency Queues

Throttling Mechanism

Capacity Reservation Pattern

Queue Mode and Concurrency Control

Scaling Patterns

Multi-Main with Leader Election

Multiple Workers