Model Configuration Files

Relevant source files

Model configuration files are YAML files that define all aspects of model training, evaluation, and inference in PaddleOCR. They provide a declarative way to specify model architecture, training hyperparameters, data processing pipelines, and evaluation metrics without modifying code.

For information about inference configuration and deployment settings, see page 5.2. For training orchestration that uses these configurations, see page 4.1. For how model architectures are constructed from these configs, see page 4.2. For data processing pipelines driven by these configs, see page 4.3.

Configuration File Structure

Configuration files follow a hierarchical structure with eight top-level sections:

Diagram: YAML Config File — Top-Level Sections and Subsections

Sources: configs/table/table_mv3.yml1-130 configs/rec/rec_icdar15_train.yml1-99

Global Section

The Global section controls training execution, device settings, logging, checkpoint management, and task-specific settings shared across all pipeline stages.

Parameter	Type	Description	Example
`use_gpu`	bool	Enable GPU training	`true`
`epoch_num`	int	Total training epochs	`72`, `400`
`log_smooth_window`	int	Smoothing window for logged metrics	`20`
`print_batch_step`	int	Log interval in iterations	`5`, `10`
`save_model_dir`	str	Checkpoint output directory	`./output/rec/ic15/`
`save_epoch_step`	int	Checkpoint frequency in epochs	`3`, `400`
`eval_batch_step`	list[int]	Evaluation schedule `[start_iter, interval]`	`[0, 2000]`
`cal_metric_during_train`	bool	Compute metrics during training	`True`
`pretrained_model`	str	Path to pretrained weights (optional)
`checkpoints`	str	Path to resume training from checkpoint
`save_inference_dir`	str	Directory for exported inference model	`./`
`use_visualdl`	bool	Enable VisualDL logging	`False`
`infer_img`	str	Sample image path for inference mode	`doc/imgs_words_en/word_10.png`
`character_dict_path`	str	Path to character dictionary file	`ppocr/utils/en_dict.txt`
`max_text_length`	int	Maximum sequence length for recognition	`25`, `500`
`infer_mode`	bool	Run in inference-only mode	`False`
`use_space_char`	bool	Include space character in recognition	`False`
`save_res_path`	str	Path to write inference results	`./output/rec/predicts_ic15.txt`
`amp_custom_black_list`	list	Ops excluded from AMP	`['matmul_v2']`

Example Global Configuration:

The &max_text_length YAML anchor syntax (as used in table_mv3.yml) creates a reusable value that can be referenced elsewhere in the same file using *max_text_length, reducing duplication when multiple sections need the same parameter.

Sources: configs/rec/rec_icdar15_train.yml1-21 configs/table/table_mv3.yml1-23

Architecture Section

The Architecture section defines the neural network structure using a modular backbone-head pattern:

Diagram: Model Architecture Components

Architecture Configuration Structure:

Field	Description	Example Values
`model_type`	Task category	`det`, `rec`, `table`, `cls`
`algorithm`	Specific algorithm	`DB`, `SVTR_LCNet`, `TableAttn`
`Backbone.name`	Backbone network	`MobileNetV3`, `ResNet`, `PPLCNet`
`Backbone.*`	Backbone-specific params	`scale`, `model_name`, etc.
`Head.name`	Head network	`DBHead`, `CTCHead`, `TableAttentionHead`
`Head.out_channels`	Output dimension (for recognition)	Character dictionary size
`Head.*`	Head-specific params	`hidden_size`, `max_text_length`, etc.

Example Architecture Configuration:

The out_channels parameter for recognition heads is automatically calculated from the character dictionary during training initialization.

Sources: configs/table/table_mv3.yml36-49 tools/train.py73-138 ppocr/modeling/backbones/__init__.py18-152 ppocr/modeling/heads/__init__.py18-152

Loss Section

The Loss section specifies the loss function and its hyperparameters:

Loss Type	Use Case	Key Parameters
`DBLoss`	Text detection (DB algorithm)	`alpha`, `beta`, `ohem_ratio`
`CTCLoss`	Text recognition (CTC decoder)	-
`AttentionLoss`	Text recognition (attention decoder)	-
`TableAttentionLoss`	Table structure recognition	`structure_weight`, `loc_weight`
`MultiLoss`	Multi-head models	`loss_config_list`
`CombinedLoss`	Multiple loss combination	`loss_config_list`

Example Loss Configuration:

For multi-loss scenarios:

Sources: configs/table/table_mv3.yml50-53 ppocr/losses/__init__.py77-127

Optimizer Section

The Optimizer section configures the optimization algorithm and learning rate schedule:

Optimizer Configuration:

Optimizer Type	Common Parameters
`Adam`	`beta1`, `beta2`
`SGD`	`momentum`
`AdamW`	`beta1`, `beta2`, `weight_decay`

Learning Rate Schedules:

Schedule Type	Parameters	Behavior
`Cosine`	`learning_rate`, `warmup_epoch`	Cosine annealing with warmup
`Linear`	`learning_rate`, `epochs`, `end_lr`	Linear decay
`Piecewise`	`learning_rate`, `decay_epochs`, `gamma`	Step-wise decay at specified epochs
`MultiStepDecay`	`learning_rate`, `milestones`, `gamma`	Decay at milestone epochs

Sources: configs/table/table_mv3.yml25-34 tools/train.py156-162

Train and Eval Sections

The Train and Eval sections define data loading and preprocessing pipelines. Both follow the same structure: a dataset block (class name, data paths, transforms list) and a loader block (batch size, workers, shuffle).

Data pipeline flow through build_dataloader():

Sources: configs/rec/rec_icdar15_train.yml59-99 configs/table/table_mv3.yml63-130 ppocr/data/imaug/__init__.py

Common Dataset Classes:

Class	Format	Typical Use
`SimpleDataSet`	Text file with image path + label per line	Detection, recognition (flat files)
`LMDBDataSet`	LMDB binary database	Recognition (large-scale datasets)
`PubTabDataSet`	JSONL annotations	Table structure recognition

Common Transform Operations:

Transform	Purpose	Key Parameters
`DecodeImage`	Load image from bytes/file	`img_mode`, `channel_first`
`DetLabelEncode`	Encode detection polygon labels	—
`CTCLabelEncode`	Encode recognition labels (CTC)	Uses `character_dict_path` from `Global`
`TableLabelEncode`	Encode table structure labels	`max_text_length`, `learn_empty_box`, `loc_reg_num`
`RecResizeImg`	Resize image for recognition	`image_shape: [C, H, W]`
`ResizeTableImage`	Resize table image to max side	`max_len`
`PaddingTableImage`	Pad to fixed size	`size: [H, W]`
`NormalizeImage`	Normalize pixel values	`scale`, `mean`, `std`, `order`
`ToCHWImage`	Transpose HWC → CHW	—
`KeepKeys`	Select fields returned by dataloader	`keep_keys` list

Loader Parameters:

Parameter	Train	Eval	Description
`shuffle`	`True`	`False`	Shuffle dataset order
`batch_size_per_card`	e.g. `256`	e.g. `256`	Batch size per GPU
`drop_last`	`True`	`False`	Drop incomplete final batch
`num_workers`	e.g. `8`	e.g. `4`	Parallel data loading workers
`use_shared_memory`	`False`	`False`	Use shared memory for workers

Recognition Train section example (SimpleDataSet, CTC pipeline):

The Eval section uses identical structure but omits data augmentation transforms and sets shuffle: False and drop_last: False.

Sources: configs/rec/rec_icdar15_train.yml59-99 configs/rec/rec_mv3_none_bilstm_ctc.yml59-96 configs/table/table_mv3.yml63-130 ppocr/data/imaug/__init__.py ppocr/data/imaug/label_ops.py

PostProcess Section

The PostProcess section configures output decoding, converting raw model predictions into structured results:

PostProcess Classes by Task:

Task	PostProcess Class	Purpose
Detection	`DBPostProcess`	Decode binary segmentation maps to text boxes
Recognition (CTC)	`CTCLabelDecode`	Decode CTC predictions to text
Recognition (Attention)	`AttnLabelDecode`	Decode attention predictions to text
Table	`TableLabelDecode`	Decode table structure and cell locations
Classification	`ClsPostProcess`	Decode classification logits

The character_dict_path from the Global section is automatically passed to recognition post-processors.

Sources: configs/table/table_mv3.yml55-56 ppocr/postprocess/__init__.py66-117

Metric Section

The Metric section defines evaluation metrics:

Metric Types:

Task	Metric Class	Main Indicator	Description
Detection	`DetMetric`	`hmean`	F-score (harmonic mean of precision/recall)
Recognition	`RecMetric`	`acc`	Character-level accuracy
Table	`TableMetric`	`acc`	Structure accuracy (optional bbox IoU)
Classification	`ClsMetric`	`acc`	Classification accuracy

The main_indicator field specifies which metric to use for model selection during training.

Sources: configs/table/table_mv3.yml58-61 tools/program.py259-260

Configuration Loading and Merging

The configuration system supports YAML file loading and command-line overrides via dot-notation keys.

Configuration loading pipeline:

Sources: tools/program.py tools/train.py

Loading Process:

YAML file is specified with the -c flag and loaded by load_config() into a Python dict.
Command-line overrides are specified with repeated -o flags using dot-notation for nested keys:
merge_config() applies the overrides by traversing the config hierarchy with dot-separated keys and overwriting values.

Command-line Override Syntax:

Override Pattern	Effect
`-o Global.use_gpu=false`	Disable GPU
`-o Global.epoch_num=500`	Set epoch count
`-o Architecture.Backbone.scale=0.5`	Nested update
`-o Train.loader.batch_size_per_card=32`	Batch size override
`-o Global.pretrained_model=./output/best`	Load pretrained weights

merge_config() handles both top-level keys and dot-notation nested keys by traversing the config dictionary hierarchy.

Sources: tools/program.py tools/train.py

Dynamic Configuration Adjustment

During training initialization (tools/train.py), the configuration dict is programmatically modified based on model type and task requirements before the model is built. This is especially important for recognition models where out_channels must match the character dictionary size.

Runtime config adjustment flow:

Sources: tools/train.py74-138 tools/eval.py46-81

Automatic Adjustments for Recognition Models:

Character Dictionary Processing:
- character_dict_path from Global is read by the PostProcess class
- out_channels = len(character) (total including special tokens)
Algorithm-Specific Token Counting:
- CTC (CTCLabelDecode): char_num includes the CTC blank token
- SAR: char_num - 2 for the base prediction, adds ignore_index
- NRTR: char_num - 3 (blank + SOS + EOS excluded from head output)
Multi-Head Models:
- Builds out_channels_list dict mapping each decoder name to its char_num
- Example: {"CTCLabelDecode": 6625, "SARLabelDecode": 6627}
- Injected as config["Architecture"]["Head"]["out_channels_list"]

Sources: tools/train.py74-138 tools/eval.py46-81

Complete Configuration Examples

Recognition Config (`rec_icdar15_train.yml`)

A minimal CRNN recognition config with CTC decoder, MobileNetV3 backbone, and flat file dataset:

Sources: configs/rec/rec_icdar15_train.yml1-99

Table Recognition Config (`table_mv3.yml`)

A table structure recognition config using YAML anchors to share parameters across sections:

Sources: configs/table/table_mv3.yml1-130

Best Practices

1. YAML Anchors and References:

Use &anchor_name to define reusable values
Reference with *anchor_name to avoid duplication
Reduces errors when changing shared parameters

2. Command-line Overrides for Experimentation:

3. Validation Configuration:

Keep Eval transforms minimal (no augmentation)
Set loader.shuffle=False for reproducible validation
Enable cal_metric_during_train to monitor convergence

4. Character Dictionary Management:

Use character_dict_path in Global section for consistency
Ensure dictionary matches your dataset's character set
For recognition: out_channels is auto-calculated

5. AMP Configuration:

Add amp_custom_black_list to avoid numerical instability
Example: ['matmul_v2', 'elementwise_add'] for tables
Use amp_level: "O2" for maximum speed with careful black-listing

Sources: configs/table/table_mv3.yml1-107 tools/train.py171-211

Model Configuration Files

Relevant source files

Configuration File Structure

Configuration files follow a hierarchical structure with eight top-level sections:

Diagram: YAML Config File — Top-Level Sections and Subsections

Sources: configs/table/table_mv3.yml1-130 configs/rec/rec_icdar15_train.yml1-99

Global Section

The Global section controls training execution, device settings, logging, checkpoint management, and task-specific settings shared across all pipeline stages.

Parameter	Type	Description	Example
`use_gpu`	bool	Enable GPU training	`true`
`epoch_num`	int	Total training epochs	`72`, `400`
`log_smooth_window`	int	Smoothing window for logged metrics	`20`
`print_batch_step`	int	Log interval in iterations	`5`, `10`
`save_model_dir`	str	Checkpoint output directory	`./output/rec/ic15/`
`save_epoch_step`	int	Checkpoint frequency in epochs	`3`, `400`
`eval_batch_step`	list[int]	Evaluation schedule `[start_iter, interval]`	`[0, 2000]`
`cal_metric_during_train`	bool	Compute metrics during training	`True`
`pretrained_model`	str	Path to pretrained weights (optional)
`checkpoints`	str	Path to resume training from checkpoint
`save_inference_dir`	str	Directory for exported inference model	`./`
`use_visualdl`	bool	Enable VisualDL logging	`False`
`infer_img`	str	Sample image path for inference mode	`doc/imgs_words_en/word_10.png`
`character_dict_path`	str	Path to character dictionary file	`ppocr/utils/en_dict.txt`
`max_text_length`	int	Maximum sequence length for recognition	`25`, `500`
`infer_mode`	bool	Run in inference-only mode	`False`
`use_space_char`	bool	Include space character in recognition	`False`
`save_res_path`	str	Path to write inference results	`./output/rec/predicts_ic15.txt`
`amp_custom_black_list`	list	Ops excluded from AMP	`['matmul_v2']`

Example Global Configuration:

Sources: configs/rec/rec_icdar15_train.yml1-21 configs/table/table_mv3.yml1-23

Architecture Section

The Architecture section defines the neural network structure using a modular backbone-head pattern:

Diagram: Model Architecture Components

Architecture Configuration Structure:

Field	Description	Example Values
`model_type`	Task category	`det`, `rec`, `table`, `cls`
`algorithm`	Specific algorithm	`DB`, `SVTR_LCNet`, `TableAttn`
`Backbone.name`	Backbone network	`MobileNetV3`, `ResNet`, `PPLCNet`
`Backbone.*`	Backbone-specific params	`scale`, `model_name`, etc.
`Head.name`	Head network	`DBHead`, `CTCHead`, `TableAttentionHead`
`Head.out_channels`	Output dimension (for recognition)	Character dictionary size
`Head.*`	Head-specific params	`hidden_size`, `max_text_length`, etc.

Example Architecture Configuration:

The out_channels parameter for recognition heads is automatically calculated from the character dictionary during training initialization.

Sources: configs/table/table_mv3.yml36-49 tools/train.py73-138 ppocr/modeling/backbones/__init__.py18-152 ppocr/modeling/heads/__init__.py18-152

Loss Section

The Loss section specifies the loss function and its hyperparameters:

Loss Type	Use Case	Key Parameters
`DBLoss`	Text detection (DB algorithm)	`alpha`, `beta`, `ohem_ratio`
`CTCLoss`	Text recognition (CTC decoder)	-
`AttentionLoss`	Text recognition (attention decoder)	-
`TableAttentionLoss`	Table structure recognition	`structure_weight`, `loc_weight`
`MultiLoss`	Multi-head models	`loss_config_list`
`CombinedLoss`	Multiple loss combination	`loss_config_list`

Example Loss Configuration:

For multi-loss scenarios:

Sources: configs/table/table_mv3.yml50-53 ppocr/losses/__init__.py77-127

Optimizer Section

The Optimizer section configures the optimization algorithm and learning rate schedule:

Optimizer Configuration:

Optimizer Type	Common Parameters
`Adam`	`beta1`, `beta2`
`SGD`	`momentum`
`AdamW`	`beta1`, `beta2`, `weight_decay`

Learning Rate Schedules:

Schedule Type	Parameters	Behavior
`Cosine`	`learning_rate`, `warmup_epoch`	Cosine annealing with warmup
`Linear`	`learning_rate`, `epochs`, `end_lr`	Linear decay
`Piecewise`	`learning_rate`, `decay_epochs`, `gamma`	Step-wise decay at specified epochs
`MultiStepDecay`	`learning_rate`, `milestones`, `gamma`	Decay at milestone epochs

Sources: configs/table/table_mv3.yml25-34 tools/train.py156-162

Train and Eval Sections

Data pipeline flow through build_dataloader():

Sources: configs/rec/rec_icdar15_train.yml59-99 configs/table/table_mv3.yml63-130 ppocr/data/imaug/__init__.py

Common Dataset Classes:

Class	Format	Typical Use
`SimpleDataSet`	Text file with image path + label per line	Detection, recognition (flat files)
`LMDBDataSet`	LMDB binary database	Recognition (large-scale datasets)
`PubTabDataSet`	JSONL annotations	Table structure recognition

Common Transform Operations:

Transform	Purpose	Key Parameters
`DecodeImage`	Load image from bytes/file	`img_mode`, `channel_first`
`DetLabelEncode`	Encode detection polygon labels	—
`CTCLabelEncode`	Encode recognition labels (CTC)	Uses `character_dict_path` from `Global`
`TableLabelEncode`	Encode table structure labels	`max_text_length`, `learn_empty_box`, `loc_reg_num`
`RecResizeImg`	Resize image for recognition	`image_shape: [C, H, W]`
`ResizeTableImage`	Resize table image to max side	`max_len`
`PaddingTableImage`	Pad to fixed size	`size: [H, W]`
`NormalizeImage`	Normalize pixel values	`scale`, `mean`, `std`, `order`
`ToCHWImage`	Transpose HWC → CHW	—
`KeepKeys`	Select fields returned by dataloader	`keep_keys` list

Loader Parameters:

Parameter	Train	Eval	Description
`shuffle`	`True`	`False`	Shuffle dataset order
`batch_size_per_card`	e.g. `256`	e.g. `256`	Batch size per GPU
`drop_last`	`True`	`False`	Drop incomplete final batch
`num_workers`	e.g. `8`	e.g. `4`	Parallel data loading workers
`use_shared_memory`	`False`	`False`	Use shared memory for workers

Recognition Train section example (SimpleDataSet, CTC pipeline):

The Eval section uses identical structure but omits data augmentation transforms and sets shuffle: False and drop_last: False.

Sources: configs/rec/rec_icdar15_train.yml59-99 configs/rec/rec_mv3_none_bilstm_ctc.yml59-96 configs/table/table_mv3.yml63-130 ppocr/data/imaug/__init__.py ppocr/data/imaug/label_ops.py

PostProcess Section

The PostProcess section configures output decoding, converting raw model predictions into structured results:

PostProcess Classes by Task:

Task	PostProcess Class	Purpose
Detection	`DBPostProcess`	Decode binary segmentation maps to text boxes
Recognition (CTC)	`CTCLabelDecode`	Decode CTC predictions to text
Recognition (Attention)	`AttnLabelDecode`	Decode attention predictions to text
Table	`TableLabelDecode`	Decode table structure and cell locations
Classification	`ClsPostProcess`	Decode classification logits

The character_dict_path from the Global section is automatically passed to recognition post-processors.

Sources: configs/table/table_mv3.yml55-56 ppocr/postprocess/__init__.py66-117

Metric Section

The Metric section defines evaluation metrics:

Metric Types:

Task	Metric Class	Main Indicator	Description
Detection	`DetMetric`	`hmean`	F-score (harmonic mean of precision/recall)
Recognition	`RecMetric`	`acc`	Character-level accuracy
Table	`TableMetric`	`acc`	Structure accuracy (optional bbox IoU)
Classification	`ClsMetric`	`acc`	Classification accuracy

The main_indicator field specifies which metric to use for model selection during training.

Sources: configs/table/table_mv3.yml58-61 tools/program.py259-260

Configuration Loading and Merging

The configuration system supports YAML file loading and command-line overrides via dot-notation keys.

Configuration loading pipeline:

Sources: tools/program.py tools/train.py

Loading Process:

YAML file is specified with the -c flag and loaded by load_config() into a Python dict.
Command-line overrides are specified with repeated -o flags using dot-notation for nested keys:
merge_config() applies the overrides by traversing the config hierarchy with dot-separated keys and overwriting values.

Command-line Override Syntax:

Override Pattern	Effect
`-o Global.use_gpu=false`	Disable GPU
`-o Global.epoch_num=500`	Set epoch count
`-o Architecture.Backbone.scale=0.5`	Nested update
`-o Train.loader.batch_size_per_card=32`	Batch size override
`-o Global.pretrained_model=./output/best`	Load pretrained weights

merge_config() handles both top-level keys and dot-notation nested keys by traversing the config dictionary hierarchy.

Sources: tools/program.py tools/train.py

Dynamic Configuration Adjustment

Runtime config adjustment flow:

Sources: tools/train.py74-138 tools/eval.py46-81

Automatic Adjustments for Recognition Models:

Character Dictionary Processing:
- character_dict_path from Global is read by the PostProcess class
- out_channels = len(character) (total including special tokens)
Algorithm-Specific Token Counting:
- CTC (CTCLabelDecode): char_num includes the CTC blank token
- SAR: char_num - 2 for the base prediction, adds ignore_index
- NRTR: char_num - 3 (blank + SOS + EOS excluded from head output)
Multi-Head Models:
- Builds out_channels_list dict mapping each decoder name to its char_num
- Example: {"CTCLabelDecode": 6625, "SARLabelDecode": 6627}
- Injected as config["Architecture"]["Head"]["out_channels_list"]

Sources: tools/train.py74-138 tools/eval.py46-81

Complete Configuration Examples

Recognition Config (`rec_icdar15_train.yml`)

A minimal CRNN recognition config with CTC decoder, MobileNetV3 backbone, and flat file dataset:

Sources: configs/rec/rec_icdar15_train.yml1-99

Table Recognition Config (`table_mv3.yml`)

A table structure recognition config using YAML anchors to share parameters across sections:

Sources: configs/table/table_mv3.yml1-130

Best Practices

1. YAML Anchors and References:

Use &anchor_name to define reusable values
Reference with *anchor_name to avoid duplication
Reduces errors when changing shared parameters

2. Command-line Overrides for Experimentation:

3. Validation Configuration:

Keep Eval transforms minimal (no augmentation)
Set loader.shuffle=False for reproducible validation
Enable cal_metric_during_train to monitor convergence

4. Character Dictionary Management:

Use character_dict_path in Global section for consistency
Ensure dictionary matches your dataset's character set
For recognition: out_channels is auto-calculated

5. AMP Configuration:

Add amp_custom_black_list to avoid numerical instability
Example: ['matmul_v2', 'elementwise_add'] for tables
Use amp_level: "O2" for maximum speed with careful black-listing

Sources: configs/table/table_mv3.yml1-107 tools/train.py171-211

Model Configuration Files

Configuration File Structure

Global Section

Architecture Section

Loss Section

Optimizer Section

Train and Eval Sections

PostProcess Section

Metric Section

Configuration Loading and Merging

Dynamic Configuration Adjustment

Complete Configuration Examples

Recognition Config (`rec_icdar15_train.yml`)

Table Recognition Config (`table_mv3.yml`)

Best Practices

On this page

Model Configuration Files

Configuration File Structure

Global Section

Architecture Section

Loss Section

Optimizer Section

Train and Eval Sections

PostProcess Section

Metric Section

Configuration Loading and Merging

Dynamic Configuration Adjustment

Complete Configuration Examples

Recognition Config (`rec_icdar15_train.yml`)

Table Recognition Config (`table_mv3.yml`)

Best Practices

On this page

Model Configuration Files

Configuration File Structure

Global Section

Architecture Section

Loss Section

Optimizer Section

Train and Eval Sections

PostProcess Section

Metric Section

Configuration Loading and Merging

Dynamic Configuration Adjustment

Complete Configuration Examples

Recognition Config (rec_icdar15_train.yml)

Table Recognition Config (table_mv3.yml)

Best Practices

On this page

Model Configuration Files

Configuration File Structure

Global Section

Architecture Section

Loss Section

Optimizer Section

Train and Eval Sections

PostProcess Section

Metric Section

Configuration Loading and Merging

Dynamic Configuration Adjustment

Complete Configuration Examples

Recognition Config (rec_icdar15_train.yml)

Table Recognition Config (table_mv3.yml)

Best Practices

On this page

Recognition Config (`rec_icdar15_train.yml`)

Table Recognition Config (`table_mv3.yml`)

Recognition Config (`rec_icdar15_train.yml`)

Table Recognition Config (`table_mv3.yml`)