This document describes the training loop execution and optimization components in PaddleOCR. It covers the main training iteration logic, optimizer setup, learning rate scheduling, automatic mixed precision (AMP) training, gradient computation, and parameter updates.
For information about:
The training system is orchestrated by the train() function in tools/program.py200-659 which implements the main training loop. The training process is initiated by tools/train.py which builds all necessary components (model, optimizer, loss, dataloaders) and then delegates to program.train().
The training loop operates on a per-epoch and per-batch basis, performing forward passes, loss computation, backward propagation, and parameter updates. It integrates periodic evaluation, learning rate scheduling, and checkpoint saving.
Key Responsibilities:
Sources: tools/program.py200-659 tools/train.py46-246
Sources: tools/train.py46-246 tools/program.py200-659
Optimizers and learning rate schedulers are built using the build_optimizer() function from the ppocr.optimizer module. The configuration specifies the optimizer type, learning rate schedule, and hyperparameters.
build_optimizer() in ppocr/optimizer/__init__.py34-66 is called from tools/train.py156-162 and performs three sequential steps:
build_lr_scheduler() to build the LR schedule object from the lr sub-configregularizer or weight_decay config keyclip_norm or clip_norm_global is presentIt returns two values:
paddle.optimizer instance, operating only on parameters where param.trainable == Truepaddle.optimizer.lr.LRScheduler instance, or a float for constant LROptimizer wrapper classes are defined in ppocr/optimizer/optimizer.py Each wraps a Paddle optimizer and filters parameters to only trainable ones:
| Class | Underlying API | Notable Parameters |
|---|---|---|
Momentum | paddle.optimizer.Momentum | momentum, weight_decay, grad_clip |
Adam | paddle.optimizer.Adam | beta1, beta2, epsilon, weight_decay, grad_clip, group_lr |
The Adam class supports a group_lr mode that assigns different learning rates to specific parameter groups (used in VisionLAN multi-stage training via training_step).
Optimizer build flow:
Sources: tools/train.py156-162 ppocr/optimizer/__init__.py34-66 ppocr/optimizer/optimizer.py
The learning rate scheduler is stepped after each batch in tools/program.py448-449:
The current learning rate is retrieved before each batch update in tools/program.py334:
This learning rate value is logged along with other training statistics to track optimization progress over time.
Available learning rate schedule classes are defined in ppocr/optimizer/learning_rate.py and ppocr/optimizer/lr_scheduler.py The schedule is selected via the lr.name key in the Optimizer config:
| Class | Schedule Type | Key Parameters |
|---|---|---|
Linear | Polynomial decay (paddle.optimizer.lr.PolynomialDecay) | learning_rate, end_lr, power, warmup_epoch |
Cosine | Cosine annealing (paddle.optimizer.lr.CosineAnnealingDecay) | learning_rate, warmup_epoch |
LinearWarmupCosine | Linear warmup then cosine | learning_rate, warmup_steps, start_lr, min_lr |
Step | Step-based decay | learning_rate, step_size, gamma |
CyclicalCosineDecay | Cyclical cosine | learning_rate, T_max, cycle, eta_min |
OneCycleDecay | One-cycle policy | max_lr, epochs, steps_per_epoch, pct_start, div_factor |
TwoStepCosineDecay | Two-step cosine | see ppocr/optimizer/lr_scheduler.py |
Classes that accept warmup_epoch automatically wrap the primary schedule with paddle.optimizer.lr.LinearWarmup. The default schedule name when name is omitted is "Const" (constant learning rate).
Sources: tools/program.py334 tools/program.py448-449 ppocr/optimizer/learning_rate.py ppocr/optimizer/lr_scheduler.py
Regularization applies weight decay during optimization. The regularizer or weight_decay key in the Optimizer config controls this. Available classes in ppocr/optimizer/regularizer.py:
| Class | Penalty | Underlying API |
|---|---|---|
L1Decay | Sum of absolute weights | paddle.regularizer.L1Decay(coeff) |
L2Decay | Sum of squared weights | Returns coeff as a float passed to weight_decay |
Gradient Clipping is configured via the Optimizer config and applied inside build_optimizer() at ppocr/optimizer/__init__.py55-62:
| Config Key | Clipping API | Behavior |
|---|---|---|
clip_norm | paddle.nn.ClipGradByNorm | Clips each gradient tensor independently by a per-tensor norm bound |
clip_norm_global | paddle.nn.ClipGradByGlobalNorm | Clips the global norm of all gradients together |
If neither key is present, no gradient clipping is applied (grad_clip=None).
Sources: ppocr/optimizer/regularizer.py ppocr/optimizer/__init__.py55-62
The forward and backward pass logic varies based on the model type and whether AMP is enabled. The training loop handles multiple model architectures with different input requirements.
The forward pass execution is implemented in tools/program.py346-388:
The model type is determined by configuration and affects how batch data is passed:
images and additional data from batch[1:]Sources: tools/program.py346-388
Loss calculation and backward propagation occur in tools/program.py365-392:
Without AMP:
The loss calculation returns a dictionary where the key "loss" contains the main loss value to be backpropagated. After backward pass, optimizer.step() updates model parameters.
Sources: tools/program.py365-392
After each optimizer step, gradients are cleared in tools/program.py394:
This prevents gradient accumulation across batches (unless explicitly desired for gradient accumulation strategies).
Sources: tools/program.py394
PaddleOCR supports Automatic Mixed Precision training to accelerate training and reduce memory usage. AMP automatically manages the use of float16 or bfloat16 precision during forward and backward passes while maintaining float32 precision for critical operations.
AMP is configured in tools/train.py171-212:
| Configuration Parameter | Description | Default |
|---|---|---|
use_amp | Enable AMP training | False |
amp_level | Precision level: "O1" or "O2" | "O2" |
amp_dtype | Data type: "float16" or "bfloat16" | "float16" |
amp_custom_black_list | Ops to keep in float32 | [] |
amp_custom_white_list | Ops to force to lower precision | [] |
scale_loss | Initial loss scaling factor | 1.0 |
use_dynamic_loss_scaling | Enable dynamic loss scaling | False |
Sources: tools/train.py171-212
Sources: tools/train.py185-212 tools/program.py339-369
When AMP is enabled, the forward and backward pass is wrapped in tools/program.py339-369:
Key AMP Operations:
Sources: tools/program.py180-197 tools/program.py339-369
For CUDA devices, additional flags are set in tools/train.py186-194:
These flags optimize batch normalization and GEMM operations for AMP training.
Sources: tools/train.py186-194
Training statistics are tracked using the TrainingStats class, which maintains running averages of metrics using a smoothing window.
The statistics tracker is initialized in tools/program.py262:
The log_smooth_window parameter (from config) determines how many iterations to average over for smooth metric reporting.
Statistics are updated at two points during training:
1. Training Metrics (loss, learning rate) in tools/program.py452-457:
2. Evaluation Metrics (during cal_metric_during_train) in tools/program.py396-440:
After computing metrics with the evaluation class, they are updated:
Sources: tools/program.py262 tools/program.py396-457
Training progress is logged every print_batch_step iterations in tools/program.py464-495:
The log output includes:
avg_reader_cost)avg_batch_cost)Example log format:
epoch: [2/300], global_step: 1500, loss: 1.234, lr: 0.001,
avg_reader_cost: 0.050 s, avg_batch_cost: 0.200 s, avg_samples: 32,
ips: 160.0 samples/s, eta: 12:34:56, max_mem_reserved: 2048 MB,
max_mem_allocated: 1536 MB
Sources: tools/program.py464-495
If a visualizer logger is provided (e.g., VisualDL, Weights & Biases), metrics are logged in tools/program.py459-462:
Sources: tools/program.py459-462
Periodic evaluation during training allows early stopping and best model selection. Evaluation is triggered based on global step counts.
Evaluation timing is configured in tools/program.py226-254:
| Configuration Parameter | Description |
|---|---|
eval_batch_step | List [start_step, interval_step] or single value |
eval_batch_epoch | Alternative: evaluate every N epochs |
start_eval_step | First step at which to begin evaluation |
If eval_batch_step is a list like [0, 2000], evaluation starts at step 0 and repeats every 2000 steps. If it's converted to use epochs, the interval is step_per_epoch * eval_batch_epoch.
Sources: tools/program.py226-254
Evaluation is triggered in tools/program.py501-590 when the condition is met:
The evaluation process:
Sources: tools/program.py501-590
The best model is tracked using best_model_dict in tools/program.py259-261:
The main_indicator is the primary metric to optimize (e.g., "acc" for accuracy, "hmean" for F1-score). When a new best is found in tools/program.py538-568 the model is saved with prefix "best_accuracy".
Sources: tools/program.py259-261 tools/program.py538-568
For certain models like SRN, model averaging is applied before evaluation in tools/program.py506-513:
This technique averages model parameters over recent iterations to improve stability and performance.
Sources: tools/program.py337 tools/program.py506-513
The following table summarizes the key components and their file locations:
| Component | Primary Location | Key Functions/Classes |
|---|---|---|
| Training Entry | tools/train.py | main() |
| Training Orchestration | tools/program.py200-659 | train() |
| Optimizer Building | ppocr.optimizer (imported) | build_optimizer() |
| Loss Building | ppocr.losses (imported) | build_loss() |
| Evaluation Loop | tools/program.py661-770 | eval() |
| Statistics Tracking | ppocr.utils.stats (imported) | TrainingStats |
| Model Checkpointing | ppocr.utils.save_load (imported) | save_model(), load_model() |
| AMP Utilities | tools/program.py180-197 | to_float32() |
Sources: tools/train.py tools/program.py
The complete training iteration flow from a single batch perspective:
Sources: tools/program.py328-591
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.