Gas Cost Collection

Relevant source files

Purpose and Scope

This page documents the gas cost collection system in fuel-core, which processes VM benchmark results and generates gas cost tables used by the consensus layer. The system takes raw timing measurements from Criterion benchmarks and converts them into normalized gas costs that represent the computational expense of VM operations.

For information about the benchmark execution system that generates the input data, see VM Benchmarking System. For details on how these gas costs are used in consensus, see Consensus Parameters and Gas Costs.

Overview

The gas cost collection system is a post-processing utility that transforms benchmark timing data into consensus-layer gas cost configurations. Every VM operation in the Fuel VM has an associated gas cost that determines how much "gas" is consumed when executing that operation. These costs must:

Reflect actual computational expense: Operations that take longer to execute should cost more gas
Be consistent across implementations: Different nodes must agree on costs to maintain consensus
Be updatable: As the VM evolves or hardware changes, costs need recalibration

The collection system achieves this by:

Using a baseline operation (noop) as the unit of measurement
Expressing all costs relative to this baseline
Modeling size-dependent operations with linear regression
Supporting multiple output formats for different use cases

Key Terminology:

Baseline: The reference operation (typically noop) used to normalize all measurements
Relative Cost: Fixed cost expressed as multiples of baseline time
Dependent Cost: Variable cost that depends on input size (bytes processed, iterations, etc.)
Light Operation: Cost grows slowly with input size (many units per gas)
Heavy Operation: Cost grows quickly with input size (much gas per unit)

Sources: benches/src/bin/collect.rs1-250

Architecture and Data Flow

collect.rs CLI Tool Architecture

The system is implemented as a command-line binary at benches/src/bin/collect.rs that processes benchmark results through several stages:

Input Parsing: Reads Criterion JSON output from stdin or files
State Accumulation: Collects timing data, throughput measurements, and benchmark groups
Cost Computation: Applies linear regression for dependent costs, baseline comparison for fixed costs
Output Generation: Formats results as YAML, JSON, Rust code, or ConsensusParameters

Sources: benches/src/bin/collect.rs32-58 benches/src/bin/collect.rs142-249

State Management

State Data Structure

The State struct maintains all collected benchmark data during processing:

Field	Type	Purpose
`baseline`	`String`	ID of baseline operation (e.g., "noop/noop")
`all`	`bool`	Whether to include all sample values
`ids`	`HashMap<String, Duration>`	Map of benchmark ID to mean execution time
`throughput`	`HashMap<String, u64>`	Map of benchmark ID to throughput (bytes/iteration)
`groups`	`HashMap<String, Vec<String>>`	Map of group name to member benchmark IDs

Sources: benches/src/bin/collect.rs100-124

Input Format: Criterion JSON

The collection tool parses two types of JSON messages from Criterion:

Benchmark Complete Message

This message contains:

id: Unique benchmark identifier (format: group/name or group/name/size)
mean.estimate: Average execution time
mean.unit: Time unit (ns, us, ms, s)
throughput: Optional array with bytes/iterations processed

Group Complete Message

This message indicates:

All benchmarks in a group have completed
The list of benchmark IDs belonging to the group

The decode_input function parses these messages and returns Output::Mean or Output::Group variants.

Sources: benches/src/bin/collect.rs277-325

Cost Models

Cost Type Hierarchy

Cost Enum Structure

The system models VM operation costs using a two-level enum:

Cost enum (benches/src/bin/collect.rs136-140):

Relative(u64): Fixed cost expressed as multiple of baseline
Dependent(DependentCost): Variable cost that depends on input size

DependentCost enum (benches/src/bin/collect.rs130-133):

LightOperation { base: u64, units_per_gas: u64 }: When cost grows slowly
- Example: aloc (memory allocation) - 300 units per gas unit
- Formula: cost = base + (size / units_per_gas)
HeavyOperation { base: u64, gas_per_unit: u64 }: When cost grows quickly
- Example: scwq (state clear word queue) - 515 gas per unit
- Formula: cost = base + (size * gas_per_unit)

Sources: benches/src/bin/collect.rs129-140

Cost Computation Pipeline

Two-Stage Processing

The into_costs method (benches/src/bin/collect.rs513-516) processes benchmark data in two stages:

Stage 1: Dependent Cost Analysis

The into_dependent_costs method (benches/src/bin/collect.rs533-580) processes operations with variable costs:

Filter for throughput: Identify benchmarks with throughput measurements
Group by operation: Collect all size variants (e.g., mcp/10000, mcp/100000)
Convert to baseline units: Divide each mean time by baseline time
Perform linear regression: Analyze (size, cost) relationship
Classify operation type: Determine if Light or Heavy operation
Remove from state: Processed IDs are removed from ids map

Stage 2: Relative Cost Calculation

The into_relative_costs method (benches/src/bin/collect.rs582-599) processes remaining operations:

Iterate through groups: For each benchmark group
Find representative sample: Look for benchmark matching group name
Calculate ratio: Divide mean time by baseline time using map_to_ratio
Store as relative cost: Add to Costs hashmap

The map_to_ratio function (benches/src/bin/collect.rs327-333) performs ceiling division to ensure minimum cost of 1.

Sources: benches/src/bin/collect.rs513-599 benches/src/bin/collect.rs327-333

Linear Regression Analysis

Algorithm Overview

The linear regression analysis determines how operation cost scales with input size:

dependent_cost Function

The dependent_cost function (benches/src/bin/collect.rs640-758) analyzes the relationship between input size and execution cost:

1. Linear Regression Calculation

The linear_regression function (benches/src/bin/collect.rs608-638) computes the slope:

Given points (x_i, y_i):
  avg_x = Σx_i / n
  avg_y = Σy_i / n
  
  B = Σ((x_i - avg_x) * (y_i - avg_y)) / Σ((x_i - avg_x)²)
  
  slope = 1/B  (inverted to get units per cost)

2. Type Classification

Operations are classified by comparing first and last data points:

Type	Condition	Meaning
Linear	`abs(first.amount() - slope) < 20%` AND `abs(last.amount() - slope) < 20%`	Cost scales linearly with size
Logarithm	`first.price() > last.price()`	Cost per unit decreases (sublinear growth)
Exp	`first.price() < last.price()`	Cost per unit increases (superlinear growth)

Where:

point.price() = cost per element = y / x
point.amount() = elements per cost = x / y

3. Cost Calculation

For Linear operations:

Base cost = first point's y value
Amount = minimum (worst-case) of all points' amount()

For Logarithm/Exp operations:

Find base point where slope stabilizes
Use minimum subsequent amount as rate

4. Operation Classification

if amount > 1.0:
  → LightOperation { base, units_per_gas: amount }
else:
  → HeavyOperation { base, gas_per_unit: 1.0 / amount }

Sources: benches/src/bin/collect.rs608-758

Output Formats

YAML/JSON Format

The to_yaml (benches/src/bin/collect.rs498-507) and to_json (benches/src/bin/collect.rs509-511) methods serialize the Costs hashmap:

Rust Code Generation

The to_rust_code method (benches/src/bin/collect.rs375-492) generates a Rust source file:

Remapping and Cleanup:

Renames keywords: mod → mod_op, move → move_op
Removes script-specific variants: ret_script, rvrt_script, retd_script
Unifies contract variants: ret_contract → ret

Template Structure (benches/src/bin/collect.rs360-372):

Missing Key Detection: The generator compares generated keys against GasCostsValuesV6 structure and warns about any missing fields.

ConsensusParameters Format

The ConsensusParameters output format (benches/src/bin/collect.rs240-246) wraps the gas costs in a full consensus configuration:

Sources: benches/src/bin/collect.rs232-246 benches/src/bin/collect.rs375-492 benches/src/bin/collect.rs498-511

Usage

Command-Line Arguments

CLI Arguments (benches/src/bin/collect.rs32-58):

Argument	Short	Type	Default	Description
`--baseline`	`-b`	String	`"noop/noop"`	Benchmark ID to use as baseline
`--output`	`-o`	PathBuf	Current dir	Output file path (directory or file)
`--input`	`-i`	PathBuf	stdin	Input file or directory
`--debug`	`-d`	Flag	false	Print input/output for debugging
`--all`	`-a`	Flag	false	Include all sample values
`--format`	`-f`	Enum	`Yaml`	Output format (Yaml/Json/Rust/ConsensusParameters)

Typical Workflow

Step 1: Run Benchmarks

Step 2: Process Results

Step 3: Integration The generated Rust code is typically copied to benches/src/default_gas_costs.rs and used to update the consensus parameters.

Interactive Mode

The tool supports reading from stdin with Ctrl-C handling (benches/src/bin/collect.rs157-159):

The tool accumulates data until interrupted, allowing real-time processing of benchmark runs.

Sources: benches/src/bin/collect.rs32-58 benches/src/bin/collect.rs142-249

Integration with Benchmarking System

The gas cost collection tool is designed to work with the VM benchmarking system:

Benchmark Setup

VM benchmarks use the VmBench struct (benches/src/lib.rs110-187) to configure test scenarios:

Memory pools: Reusable VM instances to reduce allocation overhead
Database setup: Uses BenchDb (benches/benches/vm_set/blockchain.rs73-188) with RocksDB and contract state
Throughput annotation: Benchmarks specify bytes/iterations processed

Criterion Integration

The run_group_ref function (benches/benches/vm.rs28-76) executes benchmarks:

Prepares VM with VmBenchPrepared containing initial state
Measures instruction execution time with quanta::Clock
Resets VM state using diff to enable repeated measurements
Criterion outputs JSON with mean time and throughput

Example: Memory Copy Benchmark

For an operation like mcp (memory copy):

This generates multiple data points that collect.rs uses for linear regression.

Sources: benches/benches/vm.rs28-76 benches/src/lib.rs110-187 benches/benches/vm_set/blockchain.rs73-188

Example: Processing Memory Copy Costs

This example traces how the mcp (memory copy) operation is processed:

Input (Criterion JSON):

State After Parsing:

ids: {
  "mcp/10000" → 141.6ns,
  "mcp/100000" → 1416.0ns
}
throughput: {
  "mcp/10000" → 10000,
  "mcp/100000" → 100000
}
groups: {
  "mcp" → ["mcp/10000", "mcp/100000"]
}

Processing (assuming baseline = 20ns):

Convert to baseline units: 141.6 / 20 = 7, 1416.0 / 20 = 71
Create points: [(10000, 7), (100000, 71)]
Linear regression: slope ≈ 1408 units per gas unit
Classify as Linear (consistent scaling)
Output: LightOperation { base: 7, units_per_gas: 1408 }

Output (YAML):

Note: Actual values differ due to multiple sample points and statistical analysis.

Sources: benches/src/bin/collect.rs533-758

Gas Cost Collection

Relevant source files

Purpose and Scope

Overview

Reflect actual computational expense: Operations that take longer to execute should cost more gas
Be consistent across implementations: Different nodes must agree on costs to maintain consensus
Be updatable: As the VM evolves or hardware changes, costs need recalibration

The collection system achieves this by:

Using a baseline operation (noop) as the unit of measurement
Expressing all costs relative to this baseline
Modeling size-dependent operations with linear regression
Supporting multiple output formats for different use cases

Key Terminology:

Baseline: The reference operation (typically noop) used to normalize all measurements
Relative Cost: Fixed cost expressed as multiples of baseline time
Dependent Cost: Variable cost that depends on input size (bytes processed, iterations, etc.)
Light Operation: Cost grows slowly with input size (many units per gas)
Heavy Operation: Cost grows quickly with input size (much gas per unit)

Sources: benches/src/bin/collect.rs1-250

Architecture and Data Flow

collect.rs CLI Tool Architecture

The system is implemented as a command-line binary at benches/src/bin/collect.rs that processes benchmark results through several stages:

Input Parsing: Reads Criterion JSON output from stdin or files
State Accumulation: Collects timing data, throughput measurements, and benchmark groups
Cost Computation: Applies linear regression for dependent costs, baseline comparison for fixed costs
Output Generation: Formats results as YAML, JSON, Rust code, or ConsensusParameters

Sources: benches/src/bin/collect.rs32-58 benches/src/bin/collect.rs142-249

State Management

State Data Structure

The State struct maintains all collected benchmark data during processing:

Field	Type	Purpose
`baseline`	`String`	ID of baseline operation (e.g., "noop/noop")
`all`	`bool`	Whether to include all sample values
`ids`	`HashMap<String, Duration>`	Map of benchmark ID to mean execution time
`throughput`	`HashMap<String, u64>`	Map of benchmark ID to throughput (bytes/iteration)
`groups`	`HashMap<String, Vec<String>>`	Map of group name to member benchmark IDs

Sources: benches/src/bin/collect.rs100-124

Input Format: Criterion JSON

The collection tool parses two types of JSON messages from Criterion:

Benchmark Complete Message

This message contains:

id: Unique benchmark identifier (format: group/name or group/name/size)
mean.estimate: Average execution time
mean.unit: Time unit (ns, us, ms, s)
throughput: Optional array with bytes/iterations processed

Group Complete Message

This message indicates:

All benchmarks in a group have completed
The list of benchmark IDs belonging to the group

The decode_input function parses these messages and returns Output::Mean or Output::Group variants.

Sources: benches/src/bin/collect.rs277-325

Cost Models

Cost Type Hierarchy

Cost Enum Structure

The system models VM operation costs using a two-level enum:

Cost enum (benches/src/bin/collect.rs136-140):

Relative(u64): Fixed cost expressed as multiple of baseline
Dependent(DependentCost): Variable cost that depends on input size

DependentCost enum (benches/src/bin/collect.rs130-133):

LightOperation { base: u64, units_per_gas: u64 }: When cost grows slowly
- Example: aloc (memory allocation) - 300 units per gas unit
- Formula: cost = base + (size / units_per_gas)
HeavyOperation { base: u64, gas_per_unit: u64 }: When cost grows quickly
- Example: scwq (state clear word queue) - 515 gas per unit
- Formula: cost = base + (size * gas_per_unit)

Sources: benches/src/bin/collect.rs129-140

Cost Computation Pipeline

Two-Stage Processing

The into_costs method (benches/src/bin/collect.rs513-516) processes benchmark data in two stages:

Stage 1: Dependent Cost Analysis

The into_dependent_costs method (benches/src/bin/collect.rs533-580) processes operations with variable costs:

Filter for throughput: Identify benchmarks with throughput measurements
Group by operation: Collect all size variants (e.g., mcp/10000, mcp/100000)
Convert to baseline units: Divide each mean time by baseline time
Perform linear regression: Analyze (size, cost) relationship
Classify operation type: Determine if Light or Heavy operation
Remove from state: Processed IDs are removed from ids map

Stage 2: Relative Cost Calculation

The into_relative_costs method (benches/src/bin/collect.rs582-599) processes remaining operations:

Iterate through groups: For each benchmark group
Find representative sample: Look for benchmark matching group name
Calculate ratio: Divide mean time by baseline time using map_to_ratio
Store as relative cost: Add to Costs hashmap

The map_to_ratio function (benches/src/bin/collect.rs327-333) performs ceiling division to ensure minimum cost of 1.

Sources: benches/src/bin/collect.rs513-599 benches/src/bin/collect.rs327-333

Linear Regression Analysis

Algorithm Overview

The linear regression analysis determines how operation cost scales with input size:

dependent_cost Function

The dependent_cost function (benches/src/bin/collect.rs640-758) analyzes the relationship between input size and execution cost:

1. Linear Regression Calculation

The linear_regression function (benches/src/bin/collect.rs608-638) computes the slope:

Given points (x_i, y_i):
  avg_x = Σx_i / n
  avg_y = Σy_i / n
  
  B = Σ((x_i - avg_x) * (y_i - avg_y)) / Σ((x_i - avg_x)²)
  
  slope = 1/B  (inverted to get units per cost)

2. Type Classification

Operations are classified by comparing first and last data points:

Type	Condition	Meaning
Linear	`abs(first.amount() - slope) < 20%` AND `abs(last.amount() - slope) < 20%`	Cost scales linearly with size
Logarithm	`first.price() > last.price()`	Cost per unit decreases (sublinear growth)
Exp	`first.price() < last.price()`	Cost per unit increases (superlinear growth)

Where:

point.price() = cost per element = y / x
point.amount() = elements per cost = x / y

3. Cost Calculation

For Linear operations:

Base cost = first point's y value
Amount = minimum (worst-case) of all points' amount()

For Logarithm/Exp operations:

Find base point where slope stabilizes
Use minimum subsequent amount as rate

4. Operation Classification

if amount > 1.0:
  → LightOperation { base, units_per_gas: amount }
else:
  → HeavyOperation { base, gas_per_unit: 1.0 / amount }

Sources: benches/src/bin/collect.rs608-758

Output Formats

YAML/JSON Format

The to_yaml (benches/src/bin/collect.rs498-507) and to_json (benches/src/bin/collect.rs509-511) methods serialize the Costs hashmap:

Rust Code Generation

The to_rust_code method (benches/src/bin/collect.rs375-492) generates a Rust source file:

Remapping and Cleanup:

Renames keywords: mod → mod_op, move → move_op
Removes script-specific variants: ret_script, rvrt_script, retd_script
Unifies contract variants: ret_contract → ret

Template Structure (benches/src/bin/collect.rs360-372):

Missing Key Detection: The generator compares generated keys against GasCostsValuesV6 structure and warns about any missing fields.

ConsensusParameters Format

The ConsensusParameters output format (benches/src/bin/collect.rs240-246) wraps the gas costs in a full consensus configuration:

Sources: benches/src/bin/collect.rs232-246 benches/src/bin/collect.rs375-492 benches/src/bin/collect.rs498-511

Usage

Command-Line Arguments

CLI Arguments (benches/src/bin/collect.rs32-58):

Argument	Short	Type	Default	Description
`--baseline`	`-b`	String	`"noop/noop"`	Benchmark ID to use as baseline
`--output`	`-o`	PathBuf	Current dir	Output file path (directory or file)
`--input`	`-i`	PathBuf	stdin	Input file or directory
`--debug`	`-d`	Flag	false	Print input/output for debugging
`--all`	`-a`	Flag	false	Include all sample values
`--format`	`-f`	Enum	`Yaml`	Output format (Yaml/Json/Rust/ConsensusParameters)

Typical Workflow

Step 1: Run Benchmarks

Step 2: Process Results

Step 3: Integration The generated Rust code is typically copied to benches/src/default_gas_costs.rs and used to update the consensus parameters.

Interactive Mode

The tool supports reading from stdin with Ctrl-C handling (benches/src/bin/collect.rs157-159):

The tool accumulates data until interrupted, allowing real-time processing of benchmark runs.

Sources: benches/src/bin/collect.rs32-58 benches/src/bin/collect.rs142-249

Integration with Benchmarking System

The gas cost collection tool is designed to work with the VM benchmarking system:

Benchmark Setup

VM benchmarks use the VmBench struct (benches/src/lib.rs110-187) to configure test scenarios:

Memory pools: Reusable VM instances to reduce allocation overhead
Database setup: Uses BenchDb (benches/benches/vm_set/blockchain.rs73-188) with RocksDB and contract state
Throughput annotation: Benchmarks specify bytes/iterations processed

Criterion Integration

The run_group_ref function (benches/benches/vm.rs28-76) executes benchmarks:

Prepares VM with VmBenchPrepared containing initial state
Measures instruction execution time with quanta::Clock
Resets VM state using diff to enable repeated measurements
Criterion outputs JSON with mean time and throughput

Example: Memory Copy Benchmark

For an operation like mcp (memory copy):

This generates multiple data points that collect.rs uses for linear regression.

Sources: benches/benches/vm.rs28-76 benches/src/lib.rs110-187 benches/benches/vm_set/blockchain.rs73-188

Example: Processing Memory Copy Costs

This example traces how the mcp (memory copy) operation is processed:

Input (Criterion JSON):

State After Parsing:

ids: {
  "mcp/10000" → 141.6ns,
  "mcp/100000" → 1416.0ns
}
throughput: {
  "mcp/10000" → 10000,
  "mcp/100000" → 100000
}
groups: {
  "mcp" → ["mcp/10000", "mcp/100000"]
}

Processing (assuming baseline = 20ns):

Convert to baseline units: 141.6 / 20 = 7, 1416.0 / 20 = 71
Create points: [(10000, 7), (100000, 71)]
Linear regression: slope ≈ 1408 units per gas unit
Classify as Linear (consistent scaling)
Output: LightOperation { base: 7, units_per_gas: 1408 }

Output (YAML):

Note: Actual values differ due to multiple sample points and statistical analysis.

Sources: benches/src/bin/collect.rs533-758

Gas Cost Collection

Purpose and Scope

Overview

Architecture and Data Flow

State Management

Input Format: Criterion JSON

Benchmark Complete Message

Group Complete Message

Cost Models

Cost Type Hierarchy

Cost Computation Pipeline

Stage 1: Dependent Cost Analysis

Stage 2: Relative Cost Calculation

Linear Regression Analysis

Algorithm Overview

1. Linear Regression Calculation

2. Type Classification

3. Cost Calculation

4. Operation Classification

Output Formats

YAML/JSON Format

Rust Code Generation

ConsensusParameters Format

Usage

Command-Line Arguments

Typical Workflow

Interactive Mode

Integration with Benchmarking System

Benchmark Setup

Criterion Integration

Example: Memory Copy Benchmark

Example: Processing Memory Copy Costs

Input (Criterion JSON):

State After Parsing:

Processing (assuming baseline = 20ns):

Output (YAML):

On this page

Gas Cost Collection

Purpose and Scope

Overview

Architecture and Data Flow

State Management

Input Format: Criterion JSON

Benchmark Complete Message

Group Complete Message

Cost Models

Cost Type Hierarchy

Cost Computation Pipeline

Stage 1: Dependent Cost Analysis

Stage 2: Relative Cost Calculation

Linear Regression Analysis

Algorithm Overview

1. Linear Regression Calculation

2. Type Classification

3. Cost Calculation

4. Operation Classification

Output Formats

YAML/JSON Format

Rust Code Generation

ConsensusParameters Format

Usage

Command-Line Arguments

Typical Workflow

Interactive Mode

Integration with Benchmarking System

Benchmark Setup

Criterion Integration

Example: Memory Copy Benchmark

Example: Processing Memory Copy Costs

Input (Criterion JSON):

State After Parsing:

Processing (assuming baseline = 20ns):

Output (YAML):

On this page