Model Checkpointing and Export

Relevant source files

This page documents how PaddleOCR saves training checkpoints, resumes from them, and exports models to the static inference format used by the deployment system. For inference using exported models, see Deployment and Inference. For training configuration including the Global section where checkpoint paths are set, see Model Configuration Files.

Overview

The checkpointing and export system has three distinct responsibilities:

Saving checkpoints during training at configurable intervals and when a new best metric is reached.
Loading checkpoints to resume training or initialize weights from a pretrained model.
Exporting to static inference format — converting a dynamic training model to a serialized form consumable by Paddle Inference (and downstream C++/service deployments).

The central implementation lives in ppocr/utils/save_load.py, orchestrated by the training loop in tools/program.py.

Checkpoint File Format

Each checkpoint consists of up to four files written with a shared path prefix:

File	Content
`{prefix}.pdparams`	Model parameter weights (serialized state dict)
`{prefix}.pdopt`	Optimizer state dict (momentum, etc.)
`{prefix}.states`	Training metadata: best metric, epoch number (pickled dict)
`{prefix}.info.json`	Optional structured metadata (epoch, metric) written when `uniform_output_enabled=True`

For the best_accuracy prefix, all files are also duplicated under a best_model/ subdirectory as model.pdparams and model.pdopt, providing a stable path independent of epoch numbering.

KIE/NLP models (LayoutLM family) use a different protocol: instead of paddle.save, the backbone calls save_pretrained() (the HuggingFace-style save), and no .pdparams file is written.

Sources: ppocr/utils/save_load.py214-280

Save Triggers During Training

The train() function in tools/program.py calls save_model in three situations:

Save event flow during training:

Sources: tools/program.py500-650

Trigger	Prefix	When
New best metric	`best_accuracy`	Evaluation score exceeds previous best
End of every epoch	`latest`	Always, only on rank 0
Periodic epoch	`iter_epoch_{epoch}`	Every `save_epoch_step` epochs, on rank 0

The uniform_output_enabled flag (set in Global) causes export() to run automatically alongside save_model at each save event, keeping a static inference model in sync with the training checkpoint.

Sources: tools/program.py538-650

`save_model` Function

Located in ppocr/utils/save_load.py214-280

save_model(model, optimizer, model_path, logger, config,
           is_best=False, prefix="ppocr", **kwargs)

Key behaviors:

Creates model_path if it doesn't exist.
Always writes {prefix}.pdopt (optimizer state).
For non-NLP models, writes {prefix}.pdparams via paddle.save(model.state_dict(), ...).
Writes {prefix}.states as a pickled dict of kwargs (contains best_model_dict, epoch, global_step).
When save_model_info kwarg is truthy, writes {prefix}.info.json and calls update_train_results().

`update_train_results`

ppocr/utils/save_load.py283-390

Maintains {save_model_dir}/train_result.json, a structured manifest of model artifacts:

{
  "model_name": "...",
  "label_dict": "path/to/dict.txt",
  "train_log": "train.log",
  "config": "config.yaml",
  "models": {
    "best": { "score": 0.95, "pdparams": "...", "inference_config": "..." },
    "last_1": { ... },
    "last_2": { ... },
    ...
    "last_5": { ... }
  },
  "done_flag": false
}

The last_N slots shift on each periodic save (a rolling window of 5). Inference artifact keys tracked: inference_config, pdmodel, pdiparams.

Sources: ppocr/utils/save_load.py283-390

Loading Checkpoints

The load_model function in ppocr/utils/save_load.py66-169 handles two distinct loading modes controlled by Global.checkpoints and Global.pretrained_model in the YAML config.

Load resolution diagram:

Sources: ppocr/utils/save_load.py66-169

Resume vs. Pretrain

Config key	Files loaded	Optimizer restored	Epoch counter restored
`Global.checkpoints`	`.pdparams`, `.pdopt`, `.states`	Yes	Yes (from `.states`)
`Global.pretrained_model`	`.pdparams` only	No	No

Shape and dtype mismatch handling

During loading, load_model and load_pretrained_params log warnings (not errors) for:

Keys present in the checkpoint but absent in the current model.
Keys where the shape does not match.
Parameters that are float16 in the checkpoint — these are upcast to float32 and a notice is logged.

Sources: ppocr/utils/save_load.py118-210

Exporting to Inference Format

Entry Points

There are two ways to trigger model export:

Standalone script: tools/export_model.py — parses config and calls export(config).
Automatic during training: when Global.uniform_output_enabled: True, export(config, model, save_path) is called inside the training loop at each save event.

Sources: tools/export_model.py1-37 tools/program.py542-548

`export` Function

Implemented in ppocr/utils/export_model.py (imported as from ppocr.utils.export_model import export).

The function converts the dynamic PaddlePaddle model to a static inference graph. The output format depends on the Paddle version and the FLAGS_json_format_model environment variable:

Condition	Model file	Params file
Paddle >= 3.0 or `FLAGS_json_format_model=1`	`inference.json`	`inference.pdiparams`
Older Paddle with flag unset	`inference.pdmodel`	`inference.pdiparams`, `inference.pdiparams.info`

An inference.yml config file is always written alongside the model files.

Sources: ppocr/utils/save_load.py292-305

Typical output directory layout

When uniform_output_enabled=True, the save_model_dir ends up with:

output/
  config.yml
  train.log
  train_result.json
  best_accuracy.pdparams
  best_accuracy.pdopt
  best_accuracy.states
  best_accuracy.info.json
  best_accuracy/
    inference/
      inference.json
      inference.pdiparams
      inference.yml
  latest.pdparams
  latest.pdopt
  latest.states
  latest/
    inference/
      ...
  iter_epoch_10.pdparams
  ...
  best_model/
    model.pdparams
    model.pdopt

Configuration Reference

All checkpointing behavior is controlled from the Global section of the training YAML. See Model Configuration Files for the full config structure.

Key	Type	Description
`save_model_dir`	`str`	Root directory for all checkpoint output
`save_epoch_step`	`int`	Epoch interval for periodic `iter_epoch_N` saves
`eval_batch_step`	`int` or `list`	Step interval for evaluation (triggers best-model saves)
`checkpoints`	`str`	Path prefix to resume from (e.g., `output/best_accuracy`)
`pretrained_model`	`str`	Path prefix or URL for pretrained weights
`uniform_output_enabled`	`bool`	Automatically export inference model on each save

Sources: tools/program.py255-260 ppocr/utils/save_load.py66-75

Component Interaction Summary

Sources: tools/train.py214-217 tools/program.py35-42 ppocr/utils/save_load.py38 tools/export_model.py25-33

Model Checkpointing and Export

Relevant source files

Overview

The checkpointing and export system has three distinct responsibilities:

Saving checkpoints during training at configurable intervals and when a new best metric is reached.
Loading checkpoints to resume training or initialize weights from a pretrained model.
Exporting to static inference format — converting a dynamic training model to a serialized form consumable by Paddle Inference (and downstream C++/service deployments).

The central implementation lives in ppocr/utils/save_load.py, orchestrated by the training loop in tools/program.py.

Checkpoint File Format

Each checkpoint consists of up to four files written with a shared path prefix:

File	Content
`{prefix}.pdparams`	Model parameter weights (serialized state dict)
`{prefix}.pdopt`	Optimizer state dict (momentum, etc.)
`{prefix}.states`	Training metadata: best metric, epoch number (pickled dict)
`{prefix}.info.json`	Optional structured metadata (epoch, metric) written when `uniform_output_enabled=True`

For the best_accuracy prefix, all files are also duplicated under a best_model/ subdirectory as model.pdparams and model.pdopt, providing a stable path independent of epoch numbering.

KIE/NLP models (LayoutLM family) use a different protocol: instead of paddle.save, the backbone calls save_pretrained() (the HuggingFace-style save), and no .pdparams file is written.

Sources: ppocr/utils/save_load.py214-280

Save Triggers During Training

The train() function in tools/program.py calls save_model in three situations:

Save event flow during training:

Sources: tools/program.py500-650

Trigger	Prefix	When
New best metric	`best_accuracy`	Evaluation score exceeds previous best
End of every epoch	`latest`	Always, only on rank 0
Periodic epoch	`iter_epoch_{epoch}`	Every `save_epoch_step` epochs, on rank 0

Sources: tools/program.py538-650

`save_model` Function

Located in ppocr/utils/save_load.py214-280

save_model(model, optimizer, model_path, logger, config,
           is_best=False, prefix="ppocr", **kwargs)

Key behaviors:

Creates model_path if it doesn't exist.
Always writes {prefix}.pdopt (optimizer state).
For non-NLP models, writes {prefix}.pdparams via paddle.save(model.state_dict(), ...).
Writes {prefix}.states as a pickled dict of kwargs (contains best_model_dict, epoch, global_step).
When save_model_info kwarg is truthy, writes {prefix}.info.json and calls update_train_results().

`update_train_results`

ppocr/utils/save_load.py283-390

Maintains {save_model_dir}/train_result.json, a structured manifest of model artifacts:

{
  "model_name": "...",
  "label_dict": "path/to/dict.txt",
  "train_log": "train.log",
  "config": "config.yaml",
  "models": {
    "best": { "score": 0.95, "pdparams": "...", "inference_config": "..." },
    "last_1": { ... },
    "last_2": { ... },
    ...
    "last_5": { ... }
  },
  "done_flag": false
}

The last_N slots shift on each periodic save (a rolling window of 5). Inference artifact keys tracked: inference_config, pdmodel, pdiparams.

Sources: ppocr/utils/save_load.py283-390

Loading Checkpoints

The load_model function in ppocr/utils/save_load.py66-169 handles two distinct loading modes controlled by Global.checkpoints and Global.pretrained_model in the YAML config.

Load resolution diagram:

Sources: ppocr/utils/save_load.py66-169

Resume vs. Pretrain

Config key	Files loaded	Optimizer restored	Epoch counter restored
`Global.checkpoints`	`.pdparams`, `.pdopt`, `.states`	Yes	Yes (from `.states`)
`Global.pretrained_model`	`.pdparams` only	No	No

Shape and dtype mismatch handling

During loading, load_model and load_pretrained_params log warnings (not errors) for:

Keys present in the checkpoint but absent in the current model.
Keys where the shape does not match.
Parameters that are float16 in the checkpoint — these are upcast to float32 and a notice is logged.

Sources: ppocr/utils/save_load.py118-210

Exporting to Inference Format

Entry Points

There are two ways to trigger model export:

Standalone script: tools/export_model.py — parses config and calls export(config).
Automatic during training: when Global.uniform_output_enabled: True, export(config, model, save_path) is called inside the training loop at each save event.

Sources: tools/export_model.py1-37 tools/program.py542-548

`export` Function

Implemented in ppocr/utils/export_model.py (imported as from ppocr.utils.export_model import export).

The function converts the dynamic PaddlePaddle model to a static inference graph. The output format depends on the Paddle version and the FLAGS_json_format_model environment variable:

Condition	Model file	Params file
Paddle >= 3.0 or `FLAGS_json_format_model=1`	`inference.json`	`inference.pdiparams`
Older Paddle with flag unset	`inference.pdmodel`	`inference.pdiparams`, `inference.pdiparams.info`

An inference.yml config file is always written alongside the model files.

Sources: ppocr/utils/save_load.py292-305

Typical output directory layout

When uniform_output_enabled=True, the save_model_dir ends up with:

output/
  config.yml
  train.log
  train_result.json
  best_accuracy.pdparams
  best_accuracy.pdopt
  best_accuracy.states
  best_accuracy.info.json
  best_accuracy/
    inference/
      inference.json
      inference.pdiparams
      inference.yml
  latest.pdparams
  latest.pdopt
  latest.states
  latest/
    inference/
      ...
  iter_epoch_10.pdparams
  ...
  best_model/
    model.pdparams
    model.pdopt

Configuration Reference

All checkpointing behavior is controlled from the Global section of the training YAML. See Model Configuration Files for the full config structure.

Key	Type	Description
`save_model_dir`	`str`	Root directory for all checkpoint output
`save_epoch_step`	`int`	Epoch interval for periodic `iter_epoch_N` saves
`eval_batch_step`	`int` or `list`	Step interval for evaluation (triggers best-model saves)
`checkpoints`	`str`	Path prefix to resume from (e.g., `output/best_accuracy`)
`pretrained_model`	`str`	Path prefix or URL for pretrained weights
`uniform_output_enabled`	`bool`	Automatically export inference model on each save

Sources: tools/program.py255-260 ppocr/utils/save_load.py66-75

Component Interaction Summary

Sources: tools/train.py214-217 tools/program.py35-42 ppocr/utils/save_load.py38 tools/export_model.py25-33

Model Checkpointing and Export

Overview

Checkpoint File Format

Save Triggers During Training

`save_model` Function

`update_train_results`

Loading Checkpoints

Resume vs. Pretrain

Shape and dtype mismatch handling

Exporting to Inference Format

Entry Points

`export` Function

Typical output directory layout

Configuration Reference

Component Interaction Summary

On this page

Model Checkpointing and Export

Overview

Checkpoint File Format

Save Triggers During Training

`save_model` Function

`update_train_results`

Loading Checkpoints

Resume vs. Pretrain

Shape and dtype mismatch handling

Exporting to Inference Format

Entry Points

`export` Function

Typical output directory layout

Configuration Reference

Component Interaction Summary

On this page

Model Checkpointing and Export

Overview

Checkpoint File Format

Save Triggers During Training

save_model Function

update_train_results

Loading Checkpoints

Resume vs. Pretrain

Shape and dtype mismatch handling

Exporting to Inference Format

Entry Points

export Function

Typical output directory layout

Configuration Reference

Component Interaction Summary

On this page

Model Checkpointing and Export

Overview

Checkpoint File Format

Save Triggers During Training

save_model Function

update_train_results

Loading Checkpoints

Resume vs. Pretrain

Shape and dtype mismatch handling

Exporting to Inference Format

Entry Points

export Function

Typical output directory layout

Configuration Reference

Component Interaction Summary

On this page

`save_model` Function

`update_train_results`

`export` Function

`save_model` Function

`update_train_results`

`export` Function