Training Loop and Optimization

Relevant source files

This document describes the training loop execution and optimization components in PaddleOCR. It covers the main training iteration logic, optimizer setup, learning rate scheduling, automatic mixed precision (AMP) training, gradient computation, and parameter updates.

For information about:

Model architecture construction, see Model Architecture Building
Data loading and augmentation, see Data Processing Pipeline
Loss functions, see Loss Functions and Metrics
Model checkpointing details, see Model Checkpointing and Export

Training Loop Overview

The training system is orchestrated by the train() function in tools/program.py200-659 which implements the main training loop. The training process is initiated by tools/train.py which builds all necessary components (model, optimizer, loss, dataloaders) and then delegates to program.train().

The training loop operates on a per-epoch and per-batch basis, performing forward passes, loss computation, backward propagation, and parameter updates. It integrates periodic evaluation, learning rate scheduling, and checkpoint saving.

Key Responsibilities:

Execute forward and backward passes for each batch
Apply optimizer updates with configurable learning rate schedules
Support Automatic Mixed Precision (AMP) training
Track training statistics (loss, learning rate, throughput)
Perform periodic evaluation and save best models
Handle distributed training coordination

Sources: tools/program.py200-659 tools/train.py46-246

Training Loop Architecture

Sources: tools/train.py46-246 tools/program.py200-659

Optimizer and Learning Rate Scheduler Setup

Optimizers and learning rate schedulers are built using the build_optimizer() function from the ppocr.optimizer module. The configuration specifies the optimizer type, learning rate schedule, and hyperparameters.

Optimizer Creation

build_optimizer() in ppocr/optimizer/__init__.py34-66 is called from tools/train.py156-162 and performs three sequential steps:

Calls build_lr_scheduler() to build the LR schedule object from the lr sub-config
Optionally builds a weight decay regularizer from the regularizer or weight_decay config key
Instantiates the optimizer, applying gradient clipping if clip_norm or clip_norm_global is present

It returns two values:

optimizer: A paddle.optimizer instance, operating only on parameters where param.trainable == True
lr_scheduler: A paddle.optimizer.lr.LRScheduler instance, or a float for constant LR

Available Optimizer Types

Optimizer wrapper classes are defined in ppocr/optimizer/optimizer.py Each wraps a Paddle optimizer and filters parameters to only trainable ones:

Class	Underlying API	Notable Parameters
`Momentum`	`paddle.optimizer.Momentum`	`momentum`, `weight_decay`, `grad_clip`
`Adam`	`paddle.optimizer.Adam`	`beta1`, `beta2`, `epsilon`, `weight_decay`, `grad_clip`, `group_lr`

The Adam class supports a group_lr mode that assigns different learning rates to specific parameter groups (used in VisionLAN multi-stage training via training_step).

Optimizer build flow:

Sources: tools/train.py156-162 ppocr/optimizer/__init__.py34-66 ppocr/optimizer/optimizer.py

Learning Rate Scheduling

The learning rate scheduler is stepped after each batch in tools/program.py448-449:

The current learning rate is retrieved before each batch update in tools/program.py334:

This learning rate value is logged along with other training statistics to track optimization progress over time.

Available learning rate schedule classes are defined in ppocr/optimizer/learning_rate.py and ppocr/optimizer/lr_scheduler.py The schedule is selected via the lr.name key in the Optimizer config:

Class	Schedule Type	Key Parameters
`Linear`	Polynomial decay (`paddle.optimizer.lr.PolynomialDecay`)	`learning_rate`, `end_lr`, `power`, `warmup_epoch`
`Cosine`	Cosine annealing (`paddle.optimizer.lr.CosineAnnealingDecay`)	`learning_rate`, `warmup_epoch`
`LinearWarmupCosine`	Linear warmup then cosine	`learning_rate`, `warmup_steps`, `start_lr`, `min_lr`
`Step`	Step-based decay	`learning_rate`, `step_size`, `gamma`
`CyclicalCosineDecay`	Cyclical cosine	`learning_rate`, `T_max`, `cycle`, `eta_min`
`OneCycleDecay`	One-cycle policy	`max_lr`, `epochs`, `steps_per_epoch`, `pct_start`, `div_factor`
`TwoStepCosineDecay`	Two-step cosine	see ppocr/optimizer/lr_scheduler.py

Classes that accept warmup_epoch automatically wrap the primary schedule with paddle.optimizer.lr.LinearWarmup. The default schedule name when name is omitted is "Const" (constant learning rate).

Sources: tools/program.py334 tools/program.py448-449 ppocr/optimizer/learning_rate.py ppocr/optimizer/lr_scheduler.py

Regularization and Gradient Clipping

Regularization applies weight decay during optimization. The regularizer or weight_decay key in the Optimizer config controls this. Available classes in ppocr/optimizer/regularizer.py:

Class	Penalty	Underlying API
`L1Decay`	Sum of absolute weights	`paddle.regularizer.L1Decay(coeff)`
`L2Decay`	Sum of squared weights	Returns `coeff` as a float passed to `weight_decay`

Gradient Clipping is configured via the Optimizer config and applied inside build_optimizer() at ppocr/optimizer/__init__.py55-62:

Config Key	Clipping API	Behavior
`clip_norm`	`paddle.nn.ClipGradByNorm`	Clips each gradient tensor independently by a per-tensor norm bound
`clip_norm_global`	`paddle.nn.ClipGradByGlobalNorm`	Clips the global norm of all gradients together

If neither key is present, no gradient clipping is applied (grad_clip=None).

Sources: ppocr/optimizer/regularizer.py ppocr/optimizer/__init__.py55-62

Forward and Backward Pass Execution

The forward and backward pass logic varies based on the model type and whether AMP is enabled. The training loop handles multiple model architectures with different input requirements.

Forward Pass Logic

The forward pass execution is implemented in tools/program.py346-388:

The model type is determined by configuration and affects how batch data is passed:

Table models: Pass images and additional data from batch[1:]
KIE models: Pass the entire batch
CAN models: Pass first three elements of batch
Formula models (LaTeXOCR, UniMERNet, PP-FormulaNet): Pass entire batch
Standard models: Pass only images

Sources: tools/program.py346-388

Loss Calculation and Backward Pass

Loss calculation and backward propagation occur in tools/program.py365-392:

Without AMP:

The loss calculation returns a dictionary where the key "loss" contains the main loss value to be backpropagated. After backward pass, optimizer.step() updates model parameters.

Sources: tools/program.py365-392

Gradient Clearing

After each optimizer step, gradients are cleared in tools/program.py394:

This prevents gradient accumulation across batches (unless explicitly desired for gradient accumulation strategies).

Sources: tools/program.py394

Automatic Mixed Precision (AMP) Training

PaddleOCR supports Automatic Mixed Precision training to accelerate training and reduce memory usage. AMP automatically manages the use of float16 or bfloat16 precision during forward and backward passes while maintaining float32 precision for critical operations.

AMP Configuration

AMP is configured in tools/train.py171-212:

Configuration Parameter	Description	Default
`use_amp`	Enable AMP training	`False`
`amp_level`	Precision level: "O1" or "O2"	`"O2"`
`amp_dtype`	Data type: "float16" or "bfloat16"	`"float16"`
`amp_custom_black_list`	Ops to keep in float32	`[]`
`amp_custom_white_list`	Ops to force to lower precision	`[]`
`scale_loss`	Initial loss scaling factor	`1.0`
`use_dynamic_loss_scaling`	Enable dynamic loss scaling	`False`

Sources: tools/train.py171-212

AMP Training Flow

Sources: tools/train.py185-212 tools/program.py339-369

AMP Training Execution

When AMP is enabled, the forward and backward pass is wrapped in tools/program.py339-369:

Key AMP Operations:

auto_cast: Context manager that automatically casts operations to lower precision
to_float32(): Converts predictions back to float32 for stable loss calculation (defined in tools/program.py180-197)
scaler.scale(): Scales loss to prevent underflow in float16 gradients
scaler.minimize(): Unscales gradients and applies optimizer step with gradient clipping

Sources: tools/program.py180-197 tools/program.py339-369

AMP Flags and Settings

For CUDA devices, additional flags are set in tools/train.py186-194:

These flags optimize batch normalization and GEMM operations for AMP training.

Sources: tools/train.py186-194

Training Statistics and Logging

Training statistics are tracked using the TrainingStats class, which maintains running averages of metrics using a smoothing window.

TrainingStats Initialization

The statistics tracker is initialized in tools/program.py262:

The log_smooth_window parameter (from config) determines how many iterations to average over for smooth metric reporting.

Metric Collection

Statistics are updated at two points during training:

1. Training Metrics (loss, learning rate) in tools/program.py452-457:

2. Evaluation Metrics (during cal_metric_during_train) in tools/program.py396-440:

After computing metrics with the evaluation class, they are updated:

Sources: tools/program.py262 tools/program.py396-457

Logging Output

Training progress is logged every print_batch_step iterations in tools/program.py464-495:

The log output includes:

Epoch and global step
Loss values (averaged over print interval)
Learning rate
Data loading time (avg_reader_cost)
Batch processing time (avg_batch_cost)
Throughput (images per second)
ETA (estimated time to completion)
Memory usage (GPU memory reserved/allocated, if available)

Example log format:

epoch: [2/300], global_step: 1500, loss: 1.234, lr: 0.001, 
avg_reader_cost: 0.050 s, avg_batch_cost: 0.200 s, avg_samples: 32, 
ips: 160.0 samples/s, eta: 12:34:56, max_mem_reserved: 2048 MB, 
max_mem_allocated: 1536 MB

Sources: tools/program.py464-495

Visualizer Integration

If a visualizer logger is provided (e.g., VisualDL, Weights & Biases), metrics are logged in tools/program.py459-462:

Sources: tools/program.py459-462

Evaluation During Training

Periodic evaluation during training allows early stopping and best model selection. Evaluation is triggered based on global step counts.

Evaluation Configuration

Evaluation timing is configured in tools/program.py226-254:

Configuration Parameter	Description
`eval_batch_step`	List `[start_step, interval_step]` or single value
`eval_batch_epoch`	Alternative: evaluate every N epochs
`start_eval_step`	First step at which to begin evaluation

If eval_batch_step is a list like [0, 2000], evaluation starts at step 0 and repeats every 2000 steps. If it's converted to use epochs, the interval is step_per_epoch * eval_batch_epoch.

Sources: tools/program.py226-254

Evaluation Execution

Evaluation is triggered in tools/program.py501-590 when the condition is met:

The evaluation process:

Sources: tools/program.py501-590

Best Model Tracking

The best model is tracked using best_model_dict in tools/program.py259-261:

The main_indicator is the primary metric to optimize (e.g., "acc" for accuracy, "hmean" for F1-score). When a new best is found in tools/program.py538-568 the model is saved with prefix "best_accuracy".

Sources: tools/program.py259-261 tools/program.py538-568

Model Averaging

For certain models like SRN, model averaging is applied before evaluation in tools/program.py506-513:

This technique averages model parameters over recent iterations to improve stability and performance.

Sources: tools/program.py337 tools/program.py506-513

Training Loop Components Summary

The following table summarizes the key components and their file locations:

Component	Primary Location	Key Functions/Classes
Training Entry	tools/train.py	`main()`
Training Orchestration	tools/program.py200-659	`train()`
Optimizer Building	ppocr.optimizer (imported)	`build_optimizer()`
Loss Building	ppocr.losses (imported)	`build_loss()`
Evaluation Loop	tools/program.py661-770	`eval()`
Statistics Tracking	ppocr.utils.stats (imported)	`TrainingStats`
Model Checkpointing	ppocr.utils.save_load (imported)	`save_model()`, `load_model()`
AMP Utilities	tools/program.py180-197	`to_float32()`

Sources: tools/train.py tools/program.py

Training Iteration Flow

The complete training iteration flow from a single batch perspective:

Sources: tools/program.py328-591

Training Loop and Optimization

Relevant source files

For information about:

Model architecture construction, see Model Architecture Building
Data loading and augmentation, see Data Processing Pipeline
Loss functions, see Loss Functions and Metrics
Model checkpointing details, see Model Checkpointing and Export

Training Loop Overview

Key Responsibilities:

Execute forward and backward passes for each batch
Apply optimizer updates with configurable learning rate schedules
Support Automatic Mixed Precision (AMP) training
Track training statistics (loss, learning rate, throughput)
Perform periodic evaluation and save best models
Handle distributed training coordination

Sources: tools/program.py200-659 tools/train.py46-246

Training Loop Architecture

Sources: tools/train.py46-246 tools/program.py200-659

Optimizer and Learning Rate Scheduler Setup

Optimizer Creation

build_optimizer() in ppocr/optimizer/__init__.py34-66 is called from tools/train.py156-162 and performs three sequential steps:

Calls build_lr_scheduler() to build the LR schedule object from the lr sub-config
Optionally builds a weight decay regularizer from the regularizer or weight_decay config key
Instantiates the optimizer, applying gradient clipping if clip_norm or clip_norm_global is present

It returns two values:

optimizer: A paddle.optimizer instance, operating only on parameters where param.trainable == True
lr_scheduler: A paddle.optimizer.lr.LRScheduler instance, or a float for constant LR

Available Optimizer Types

Optimizer wrapper classes are defined in ppocr/optimizer/optimizer.py Each wraps a Paddle optimizer and filters parameters to only trainable ones:

Class	Underlying API	Notable Parameters
`Momentum`	`paddle.optimizer.Momentum`	`momentum`, `weight_decay`, `grad_clip`
`Adam`	`paddle.optimizer.Adam`	`beta1`, `beta2`, `epsilon`, `weight_decay`, `grad_clip`, `group_lr`

The Adam class supports a group_lr mode that assigns different learning rates to specific parameter groups (used in VisionLAN multi-stage training via training_step).

Optimizer build flow:

Sources: tools/train.py156-162 ppocr/optimizer/__init__.py34-66 ppocr/optimizer/optimizer.py

Learning Rate Scheduling

The learning rate scheduler is stepped after each batch in tools/program.py448-449:

The current learning rate is retrieved before each batch update in tools/program.py334:

This learning rate value is logged along with other training statistics to track optimization progress over time.

Available learning rate schedule classes are defined in ppocr/optimizer/learning_rate.py and ppocr/optimizer/lr_scheduler.py The schedule is selected via the lr.name key in the Optimizer config:

Class	Schedule Type	Key Parameters
`Linear`	Polynomial decay (`paddle.optimizer.lr.PolynomialDecay`)	`learning_rate`, `end_lr`, `power`, `warmup_epoch`
`Cosine`	Cosine annealing (`paddle.optimizer.lr.CosineAnnealingDecay`)	`learning_rate`, `warmup_epoch`
`LinearWarmupCosine`	Linear warmup then cosine	`learning_rate`, `warmup_steps`, `start_lr`, `min_lr`
`Step`	Step-based decay	`learning_rate`, `step_size`, `gamma`
`CyclicalCosineDecay`	Cyclical cosine	`learning_rate`, `T_max`, `cycle`, `eta_min`
`OneCycleDecay`	One-cycle policy	`max_lr`, `epochs`, `steps_per_epoch`, `pct_start`, `div_factor`
`TwoStepCosineDecay`	Two-step cosine	see ppocr/optimizer/lr_scheduler.py

Sources: tools/program.py334 tools/program.py448-449 ppocr/optimizer/learning_rate.py ppocr/optimizer/lr_scheduler.py

Regularization and Gradient Clipping

Regularization applies weight decay during optimization. The regularizer or weight_decay key in the Optimizer config controls this. Available classes in ppocr/optimizer/regularizer.py:

Class	Penalty	Underlying API
`L1Decay`	Sum of absolute weights	`paddle.regularizer.L1Decay(coeff)`
`L2Decay`	Sum of squared weights	Returns `coeff` as a float passed to `weight_decay`

Gradient Clipping is configured via the Optimizer config and applied inside build_optimizer() at ppocr/optimizer/__init__.py55-62:

Config Key	Clipping API	Behavior
`clip_norm`	`paddle.nn.ClipGradByNorm`	Clips each gradient tensor independently by a per-tensor norm bound
`clip_norm_global`	`paddle.nn.ClipGradByGlobalNorm`	Clips the global norm of all gradients together

If neither key is present, no gradient clipping is applied (grad_clip=None).

Sources: ppocr/optimizer/regularizer.py ppocr/optimizer/__init__.py55-62

Forward and Backward Pass Execution

The forward and backward pass logic varies based on the model type and whether AMP is enabled. The training loop handles multiple model architectures with different input requirements.

Forward Pass Logic

The forward pass execution is implemented in tools/program.py346-388:

The model type is determined by configuration and affects how batch data is passed:

Table models: Pass images and additional data from batch[1:]
KIE models: Pass the entire batch
CAN models: Pass first three elements of batch
Formula models (LaTeXOCR, UniMERNet, PP-FormulaNet): Pass entire batch
Standard models: Pass only images

Sources: tools/program.py346-388

Loss Calculation and Backward Pass

Loss calculation and backward propagation occur in tools/program.py365-392:

Without AMP:

The loss calculation returns a dictionary where the key "loss" contains the main loss value to be backpropagated. After backward pass, optimizer.step() updates model parameters.

Sources: tools/program.py365-392

Gradient Clearing

After each optimizer step, gradients are cleared in tools/program.py394:

This prevents gradient accumulation across batches (unless explicitly desired for gradient accumulation strategies).

Sources: tools/program.py394

Automatic Mixed Precision (AMP) Training

AMP Configuration

AMP is configured in tools/train.py171-212:

Configuration Parameter	Description	Default
`use_amp`	Enable AMP training	`False`
`amp_level`	Precision level: "O1" or "O2"	`"O2"`
`amp_dtype`	Data type: "float16" or "bfloat16"	`"float16"`
`amp_custom_black_list`	Ops to keep in float32	`[]`
`amp_custom_white_list`	Ops to force to lower precision	`[]`
`scale_loss`	Initial loss scaling factor	`1.0`
`use_dynamic_loss_scaling`	Enable dynamic loss scaling	`False`

Sources: tools/train.py171-212

AMP Training Flow

Sources: tools/train.py185-212 tools/program.py339-369

AMP Training Execution

When AMP is enabled, the forward and backward pass is wrapped in tools/program.py339-369:

Key AMP Operations:

auto_cast: Context manager that automatically casts operations to lower precision
to_float32(): Converts predictions back to float32 for stable loss calculation (defined in tools/program.py180-197)
scaler.scale(): Scales loss to prevent underflow in float16 gradients
scaler.minimize(): Unscales gradients and applies optimizer step with gradient clipping

Sources: tools/program.py180-197 tools/program.py339-369

AMP Flags and Settings

For CUDA devices, additional flags are set in tools/train.py186-194:

These flags optimize batch normalization and GEMM operations for AMP training.

Sources: tools/train.py186-194

Training Statistics and Logging

Training statistics are tracked using the TrainingStats class, which maintains running averages of metrics using a smoothing window.

TrainingStats Initialization

The statistics tracker is initialized in tools/program.py262:

The log_smooth_window parameter (from config) determines how many iterations to average over for smooth metric reporting.

Metric Collection

Statistics are updated at two points during training:

1. Training Metrics (loss, learning rate) in tools/program.py452-457:

2. Evaluation Metrics (during cal_metric_during_train) in tools/program.py396-440:

After computing metrics with the evaluation class, they are updated:

Sources: tools/program.py262 tools/program.py396-457

Logging Output

Training progress is logged every print_batch_step iterations in tools/program.py464-495:

The log output includes:

Epoch and global step
Loss values (averaged over print interval)
Learning rate
Data loading time (avg_reader_cost)
Batch processing time (avg_batch_cost)
Throughput (images per second)
ETA (estimated time to completion)
Memory usage (GPU memory reserved/allocated, if available)

Example log format:

epoch: [2/300], global_step: 1500, loss: 1.234, lr: 0.001, 
avg_reader_cost: 0.050 s, avg_batch_cost: 0.200 s, avg_samples: 32, 
ips: 160.0 samples/s, eta: 12:34:56, max_mem_reserved: 2048 MB, 
max_mem_allocated: 1536 MB

Sources: tools/program.py464-495

Visualizer Integration

If a visualizer logger is provided (e.g., VisualDL, Weights & Biases), metrics are logged in tools/program.py459-462:

Sources: tools/program.py459-462

Evaluation During Training

Periodic evaluation during training allows early stopping and best model selection. Evaluation is triggered based on global step counts.

Evaluation Configuration

Evaluation timing is configured in tools/program.py226-254:

Configuration Parameter	Description
`eval_batch_step`	List `[start_step, interval_step]` or single value
`eval_batch_epoch`	Alternative: evaluate every N epochs
`start_eval_step`	First step at which to begin evaluation

If eval_batch_step is a list like [0, 2000], evaluation starts at step 0 and repeats every 2000 steps. If it's converted to use epochs, the interval is step_per_epoch * eval_batch_epoch.

Sources: tools/program.py226-254

Evaluation Execution

Evaluation is triggered in tools/program.py501-590 when the condition is met:

The evaluation process:

Sources: tools/program.py501-590

Best Model Tracking

The best model is tracked using best_model_dict in tools/program.py259-261:

Sources: tools/program.py259-261 tools/program.py538-568

Model Averaging

For certain models like SRN, model averaging is applied before evaluation in tools/program.py506-513:

This technique averages model parameters over recent iterations to improve stability and performance.

Sources: tools/program.py337 tools/program.py506-513

Training Loop Components Summary

The following table summarizes the key components and their file locations:

Component	Primary Location	Key Functions/Classes
Training Entry	tools/train.py	`main()`
Training Orchestration	tools/program.py200-659	`train()`
Optimizer Building	ppocr.optimizer (imported)	`build_optimizer()`
Loss Building	ppocr.losses (imported)	`build_loss()`
Evaluation Loop	tools/program.py661-770	`eval()`
Statistics Tracking	ppocr.utils.stats (imported)	`TrainingStats`
Model Checkpointing	ppocr.utils.save_load (imported)	`save_model()`, `load_model()`
AMP Utilities	tools/program.py180-197	`to_float32()`

Sources: tools/train.py tools/program.py

Training Iteration Flow

The complete training iteration flow from a single batch perspective:

Sources: tools/program.py328-591

Training Loop and Optimization

Training Loop Overview

Training Loop Architecture

Optimizer and Learning Rate Scheduler Setup

Optimizer Creation

Available Optimizer Types

Learning Rate Scheduling

Regularization and Gradient Clipping

Forward and Backward Pass Execution

Forward Pass Logic

Loss Calculation and Backward Pass

Gradient Clearing

Automatic Mixed Precision (AMP) Training

AMP Configuration

AMP Training Flow

AMP Training Execution

AMP Flags and Settings

Training Statistics and Logging

TrainingStats Initialization

Metric Collection

Logging Output

Visualizer Integration

Evaluation During Training

Evaluation Configuration

Evaluation Execution

Best Model Tracking

Model Averaging

Training Loop Components Summary

Training Iteration Flow

On this page

Training Loop and Optimization

Training Loop Overview

Training Loop Architecture

Optimizer and Learning Rate Scheduler Setup

Optimizer Creation

Available Optimizer Types

Learning Rate Scheduling

Regularization and Gradient Clipping

Forward and Backward Pass Execution

Forward Pass Logic

Loss Calculation and Backward Pass

Gradient Clearing

Automatic Mixed Precision (AMP) Training

AMP Configuration

AMP Training Flow

AMP Training Execution

AMP Flags and Settings

Training Statistics and Logging

TrainingStats Initialization

Metric Collection

Logging Output

Visualizer Integration

Evaluation During Training

Evaluation Configuration

Evaluation Execution

Best Model Tracking

Model Averaging

Training Loop Components Summary

Training Iteration Flow

On this page