This page documents how PaddleOCR saves training checkpoints, resumes from them, and exports models to the static inference format used by the deployment system. For inference using exported models, see Deployment and Inference. For training configuration including the Global section where checkpoint paths are set, see Model Configuration Files.
The checkpointing and export system has three distinct responsibilities:
The central implementation lives in ppocr/utils/save_load.py, orchestrated by the training loop in tools/program.py.
Each checkpoint consists of up to four files written with a shared path prefix:
| File | Content |
|---|---|
{prefix}.pdparams | Model parameter weights (serialized state dict) |
{prefix}.pdopt | Optimizer state dict (momentum, etc.) |
{prefix}.states | Training metadata: best metric, epoch number (pickled dict) |
{prefix}.info.json | Optional structured metadata (epoch, metric) written when uniform_output_enabled=True |
For the best_accuracy prefix, all files are also duplicated under a best_model/ subdirectory as model.pdparams and model.pdopt, providing a stable path independent of epoch numbering.
KIE/NLP models (LayoutLM family) use a different protocol: instead of paddle.save, the backbone calls save_pretrained() (the HuggingFace-style save), and no .pdparams file is written.
Sources: ppocr/utils/save_load.py214-280
The train() function in tools/program.py calls save_model in three situations:
Save event flow during training:
Sources: tools/program.py500-650
| Trigger | Prefix | When |
|---|---|---|
| New best metric | best_accuracy | Evaluation score exceeds previous best |
| End of every epoch | latest | Always, only on rank 0 |
| Periodic epoch | iter_epoch_{epoch} | Every save_epoch_step epochs, on rank 0 |
The uniform_output_enabled flag (set in Global) causes export() to run automatically alongside save_model at each save event, keeping a static inference model in sync with the training checkpoint.
Sources: tools/program.py538-650
save_model FunctionLocated in ppocr/utils/save_load.py214-280
save_model(model, optimizer, model_path, logger, config,
is_best=False, prefix="ppocr", **kwargs)
Key behaviors:
model_path if it doesn't exist.{prefix}.pdopt (optimizer state).{prefix}.pdparams via paddle.save(model.state_dict(), ...).{prefix}.states as a pickled dict of kwargs (contains best_model_dict, epoch, global_step).save_model_info kwarg is truthy, writes {prefix}.info.json and calls update_train_results().update_train_resultsppocr/utils/save_load.py283-390
Maintains {save_model_dir}/train_result.json, a structured manifest of model artifacts:
{
"model_name": "...",
"label_dict": "path/to/dict.txt",
"train_log": "train.log",
"config": "config.yaml",
"models": {
"best": { "score": 0.95, "pdparams": "...", "inference_config": "..." },
"last_1": { ... },
"last_2": { ... },
...
"last_5": { ... }
},
"done_flag": false
}
The last_N slots shift on each periodic save (a rolling window of 5). Inference artifact keys tracked: inference_config, pdmodel, pdiparams.
Sources: ppocr/utils/save_load.py283-390
The load_model function in ppocr/utils/save_load.py66-169 handles two distinct loading modes controlled by Global.checkpoints and Global.pretrained_model in the YAML config.
Load resolution diagram:
Sources: ppocr/utils/save_load.py66-169
| Config key | Files loaded | Optimizer restored | Epoch counter restored |
|---|---|---|---|
Global.checkpoints | .pdparams, .pdopt, .states | Yes | Yes (from .states) |
Global.pretrained_model | .pdparams only | No | No |
During loading, load_model and load_pretrained_params log warnings (not errors) for:
float16 in the checkpoint — these are upcast to float32 and a notice is logged.Sources: ppocr/utils/save_load.py118-210
There are two ways to trigger model export:
tools/export_model.py — parses config and calls export(config).Global.uniform_output_enabled: True, export(config, model, save_path) is called inside the training loop at each save event.Sources: tools/export_model.py1-37 tools/program.py542-548
export FunctionImplemented in ppocr/utils/export_model.py (imported as from ppocr.utils.export_model import export).
The function converts the dynamic PaddlePaddle model to a static inference graph. The output format depends on the Paddle version and the FLAGS_json_format_model environment variable:
| Condition | Model file | Params file |
|---|---|---|
Paddle >= 3.0 or FLAGS_json_format_model=1 | inference.json | inference.pdiparams |
| Older Paddle with flag unset | inference.pdmodel | inference.pdiparams, inference.pdiparams.info |
An inference.yml config file is always written alongside the model files.
Sources: ppocr/utils/save_load.py292-305
When uniform_output_enabled=True, the save_model_dir ends up with:
output/
config.yml
train.log
train_result.json
best_accuracy.pdparams
best_accuracy.pdopt
best_accuracy.states
best_accuracy.info.json
best_accuracy/
inference/
inference.json
inference.pdiparams
inference.yml
latest.pdparams
latest.pdopt
latest.states
latest/
inference/
...
iter_epoch_10.pdparams
...
best_model/
model.pdparams
model.pdopt
All checkpointing behavior is controlled from the Global section of the training YAML. See Model Configuration Files for the full config structure.
| Key | Type | Description |
|---|---|---|
save_model_dir | str | Root directory for all checkpoint output |
save_epoch_step | int | Epoch interval for periodic iter_epoch_N saves |
eval_batch_step | int or list | Step interval for evaluation (triggers best-model saves) |
checkpoints | str | Path prefix to resume from (e.g., output/best_accuracy) |
pretrained_model | str | Path prefix or URL for pretrained weights |
uniform_output_enabled | bool | Automatically export inference model on each save |
Sources: tools/program.py255-260 ppocr/utils/save_load.py66-75
Sources: tools/train.py214-217 tools/program.py35-42 ppocr/utils/save_load.py38 tools/export_model.py25-33
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.