Configuration System

Relevant source files

This page describes the runtime configuration system used by PaddleOCR 3.x pipelines during inference and deployment. It covers the two-tier parameter system, the PaddleX YAML pipeline configuration format, and how PaddleOCR translates its own API parameters into the underlying PaddleX configuration structure.

This page covers inference-time configuration. For training configuration (YAML files under configs/), see page 4.6. For the relationship between PaddleOCR and PaddleX in general, see page 3.1.

Overview

PaddleOCR 3.x uses PaddleX as its inference backend. Every PaddleOCR pipeline is a thin wrapper around a PaddleX pipeline. The configuration system has two tiers:

Tier	How it works	Who uses it
Simple parameters	Named arguments to pipeline constructors and `predict()`	Most users
PaddleX config file	YAML file exported from PaddleX or PaddleOCR, then edited and loaded back	Advanced users, service deployment

Simple parameters (e.g., text_detection_model_name, layout_threshold) are exposed directly on the PaddleOCR Python and CLI APIs. The PaddleX YAML configuration format exposes the full depth of the underlying PaddleX pipeline structure, including nested sub-pipelines and sub-modules, and is the authoritative source for advanced tuning.

Configuration Architecture

The following diagram shows how configuration flows from user input to the underlying PaddleX pipeline.

Configuration Flow in PaddleOCR Pipelines

Sources: paddleocr/_pipelines/pp_structurev3.py28-139 paddleocr/_pipelines/base.py docs/version3.x/paddleocr_and_paddlex.en.md54-98

Pipeline Classes and Their Config Methods

Every PaddleOCR pipeline inherits from PaddleXPipelineWrapper and implements _get_paddlex_config_overrides().

Class Hierarchy and Config-Related Methods

Sources: paddleocr/_pipelines/pp_structurev3.py28-99 paddleocr/_pipelines/pp_chatocrv4_doc.py23-78 paddleocr/_pipelines/pp_doctranslation.py24-98 paddleocr/_pipelines/table_recognition_v2.py25-66 paddleocr/_pipelines/seal_recognition.py25-81 paddleocr/_pipelines/formula_recognition.py25-54

PaddleOCR Pipeline to PaddleX Registration Name Mapping

Each pipeline class declares a _paddlex_pipeline_name property that maps it to a PaddleX registered pipeline name. This name is used to instantiate the underlying PaddleX pipeline and to look up default configurations.

PaddleOCR Class	`_paddlex_pipeline_name`
`PaddleOCR` (General OCR)	`OCR`
`PPStructureV3`	`PP-StructureV3`
`PPChatOCRv4Doc`	`PP-ChatOCRv4-doc`
`TableRecognitionPipelineV2`	`table_recognition_v2`
`FormulaRecognitionPipeline`	`formula_recognition`
`SealRecognition`	`seal_recognition`
`PPDocTranslation`	`PP-DocTranslation`

Sources: docs/version3.x/paddleocr_and_paddlex.en.md39-52 paddleocr/_pipelines/pp_structurev3.py141-143 paddleocr/_pipelines/pp_chatocrv4_doc.py80-82 paddleocr/_pipelines/pp_doctranslation.py100-102

The PaddleX YAML Configuration File

Structure

The PaddleX YAML config represents a hierarchical pipeline structure. Pipelines can contain SubPipelines (composite pipelines) and SubModules (individual model runners). The hierarchy mirrors the internal PaddleX pipeline graph.

PaddleX Config Hierarchy (PP-StructureV3 example)

Sources: paddleocr/_pipelines/pp_structurev3.py304-525

Dot-Notation Key Paths

The _get_paddlex_config_overrides() method in each pipeline class constructs a flat dictionary using dot-separated keys that address nodes in the YAML hierarchy. For example:

Dot-notation key	What it controls
`SubModules.LayoutDetection.model_name`	Layout detection model name
`SubModules.LayoutDetection.threshold`	Score threshold for layout detection
`SubPipelines.GeneralOCR.SubModules.TextDetection.model_name`	Text detection model inside the OCR sub-pipeline
`SubPipelines.GeneralOCR.SubModules.TextRecognition.batch_size`	Recognition batch size
`SubPipelines.DocPreprocessor.use_doc_orientation_classify`	Toggle orientation classification
`SubPipelines.TableRecognition.SubPipelines.GeneralOCR.SubModules.TextRecognition.score_thresh`	Text recognition score threshold inside table recognition

The create_config_from_structure function (in paddleocr/_pipelines/utils.py) converts this flat dict into a nested dictionary suitable for the PaddleX pipeline initializer.

Sources: paddleocr/_pipelines/pp_structurev3.py304-525 paddleocr/_pipelines/pp_chatocrv4_doc.py285-400

Exporting a Configuration File

Call export_paddlex_config_to_yaml on any pipeline object to write the current effective configuration to a YAML file. This file can then be edited and loaded back.

Alternatively, the PaddleX CLI can retrieve a default config for any registered pipeline:

The exported YAML includes all parameters exposed through the PaddleOCR Python API plus additional PaddleX-level parameters not surfaced through PaddleOCR directly.

Sources: docs/version3.x/paddleocr_and_paddlex.en.md58-76

Loading a Configuration File

Python API

Pass the YAML file path (or a pre-loaded configuration dictionary) as paddlex_config when constructing the pipeline object:

A dictionary can also be passed directly:

CLI

Use the --paddlex_config flag:

Sources: docs/version3.x/paddleocr_and_paddlex.en.md83-98

Configuration Priority

When a PaddleX config file is provided alongside individual constructor parameters, the following precedence applies:

In practice:

The paddlex_config file or dict is loaded as the base configuration.
Any simple constructor arguments that are not None are applied on top as overrides (via _get_paddlex_config_overrides()).
Arguments that are None leave the corresponding value from the paddlex_config intact.

This means a paddlex_config file can set a default model name, but explicitly passing text_detection_model_name="PP-OCRv5_server_det" to the constructor will still override it.

Sources: docs/version3.x/paddleocr_and_paddlex.en.md84-98 paddleocr/_pipelines/pp_structurev3.py132-139

Internal Config Construction (`_get_paddlex_config_overrides`)

Each pipeline class stores its constructor arguments in self._params (a plain dict). The _get_paddlex_config_overrides method reads from self._params and returns a flat dict with dot-notation keys. Only non-None values become overrides.

The create_config_from_structure function (from paddleocr/_pipelines/utils.py) is called on the resulting STRUCTURE dict to convert it to a nested dict. This nested dict is then merged with any base paddlex_config.

Example from PPStructureV3._get_paddlex_config_overrides (abbreviated):

Sources: paddleocr/_pipelines/pp_structurev3.py304-525 paddleocr/_pipelines/formula_recognition.py120-176 paddleocr/_pipelines/seal_recognition.py171-237

Simple Parameter to PaddleX Config Key Reference

The following table shows how selected simple constructor parameters map to their PaddleX config paths in the PPStructureV3 pipeline:

Simple Parameter	PaddleX Config Path
`layout_detection_model_name`	`SubModules.LayoutDetection.model_name`
`layout_threshold`	`SubModules.LayoutDetection.threshold`
`layout_nms`	`SubModules.LayoutDetection.layout_nms`
`text_detection_model_name`	`SubPipelines.GeneralOCR.SubModules.TextDetection.model_name`
`text_det_thresh`	`SubPipelines.GeneralOCR.SubModules.TextDetection.thresh`
`text_det_unclip_ratio`	`SubPipelines.GeneralOCR.SubModules.TextDetection.unclip_ratio`
`text_recognition_model_name`	`SubPipelines.GeneralOCR.SubModules.TextRecognition.model_name`
`text_recognition_batch_size`	`SubPipelines.GeneralOCR.SubModules.TextRecognition.batch_size`
`use_doc_orientation_classify`	`SubPipelines.DocPreprocessor.use_doc_orientation_classify`
`use_doc_unwarping`	`SubPipelines.DocPreprocessor.use_doc_unwarping`
`formula_recognition_model_name`	`SubPipelines.FormulaRecognition.SubModules.FormulaRecognition.model_name`
`seal_text_detection_model_name`	`SubPipelines.SealRecognition.SubPipelines.SealOCR.SubModules.TextDetection.model_name`

Sources: paddleocr/_pipelines/pp_structurev3.py304-525

PaddleX Version Compatibility

The PaddleX pipeline configuration format is version-coupled. The table below shows which PaddleX version is required for each PaddleOCR release:

PaddleOCR Version	PaddleX Version
`3.0.x`	`3.0.x`
`3.1.x`	`>= 3.1.0, < 3.2.0`
`3.2.x`	`>= 3.2.0, < 3.3.0`
`3.3.x`	`>= 3.3.0, < 3.4.0`
`3.4.x`	`>= 3.4.0, < 3.5.0`

Using a YAML file exported from a different PaddleX version may cause compatibility issues.

Sources: docs/version3.x/paddleocr_and_paddlex.en.md26-36 pyproject.toml41-46

Configuration System

Relevant source files

This page covers inference-time configuration. For training configuration (YAML files under configs/), see page 4.6. For the relationship between PaddleOCR and PaddleX in general, see page 3.1.

Overview

PaddleOCR 3.x uses PaddleX as its inference backend. Every PaddleOCR pipeline is a thin wrapper around a PaddleX pipeline. The configuration system has two tiers:

Tier	How it works	Who uses it
Simple parameters	Named arguments to pipeline constructors and `predict()`	Most users
PaddleX config file	YAML file exported from PaddleX or PaddleOCR, then edited and loaded back	Advanced users, service deployment

Configuration Architecture

The following diagram shows how configuration flows from user input to the underlying PaddleX pipeline.

Configuration Flow in PaddleOCR Pipelines

Sources: paddleocr/_pipelines/pp_structurev3.py28-139 paddleocr/_pipelines/base.py docs/version3.x/paddleocr_and_paddlex.en.md54-98

Pipeline Classes and Their Config Methods

Every PaddleOCR pipeline inherits from PaddleXPipelineWrapper and implements _get_paddlex_config_overrides().

Class Hierarchy and Config-Related Methods

PaddleOCR Pipeline to PaddleX Registration Name Mapping

PaddleOCR Class	`_paddlex_pipeline_name`
`PaddleOCR` (General OCR)	`OCR`
`PPStructureV3`	`PP-StructureV3`
`PPChatOCRv4Doc`	`PP-ChatOCRv4-doc`
`TableRecognitionPipelineV2`	`table_recognition_v2`
`FormulaRecognitionPipeline`	`formula_recognition`
`SealRecognition`	`seal_recognition`
`PPDocTranslation`	`PP-DocTranslation`

Sources: docs/version3.x/paddleocr_and_paddlex.en.md39-52 paddleocr/_pipelines/pp_structurev3.py141-143 paddleocr/_pipelines/pp_chatocrv4_doc.py80-82 paddleocr/_pipelines/pp_doctranslation.py100-102

The PaddleX YAML Configuration File

Structure

PaddleX Config Hierarchy (PP-StructureV3 example)

Sources: paddleocr/_pipelines/pp_structurev3.py304-525

Dot-Notation Key Paths

The _get_paddlex_config_overrides() method in each pipeline class constructs a flat dictionary using dot-separated keys that address nodes in the YAML hierarchy. For example:

Dot-notation key	What it controls
`SubModules.LayoutDetection.model_name`	Layout detection model name
`SubModules.LayoutDetection.threshold`	Score threshold for layout detection
`SubPipelines.GeneralOCR.SubModules.TextDetection.model_name`	Text detection model inside the OCR sub-pipeline
`SubPipelines.GeneralOCR.SubModules.TextRecognition.batch_size`	Recognition batch size
`SubPipelines.DocPreprocessor.use_doc_orientation_classify`	Toggle orientation classification
`SubPipelines.TableRecognition.SubPipelines.GeneralOCR.SubModules.TextRecognition.score_thresh`	Text recognition score threshold inside table recognition

The create_config_from_structure function (in paddleocr/_pipelines/utils.py) converts this flat dict into a nested dictionary suitable for the PaddleX pipeline initializer.

Sources: paddleocr/_pipelines/pp_structurev3.py304-525 paddleocr/_pipelines/pp_chatocrv4_doc.py285-400

Exporting a Configuration File

Call export_paddlex_config_to_yaml on any pipeline object to write the current effective configuration to a YAML file. This file can then be edited and loaded back.

Alternatively, the PaddleX CLI can retrieve a default config for any registered pipeline:

The exported YAML includes all parameters exposed through the PaddleOCR Python API plus additional PaddleX-level parameters not surfaced through PaddleOCR directly.

Sources: docs/version3.x/paddleocr_and_paddlex.en.md58-76

Loading a Configuration File

Python API

Pass the YAML file path (or a pre-loaded configuration dictionary) as paddlex_config when constructing the pipeline object:

A dictionary can also be passed directly:

CLI

Use the --paddlex_config flag:

Sources: docs/version3.x/paddleocr_and_paddlex.en.md83-98

Configuration Priority

When a PaddleX config file is provided alongside individual constructor parameters, the following precedence applies:

In practice:

The paddlex_config file or dict is loaded as the base configuration.
Any simple constructor arguments that are not None are applied on top as overrides (via _get_paddlex_config_overrides()).
Arguments that are None leave the corresponding value from the paddlex_config intact.

This means a paddlex_config file can set a default model name, but explicitly passing text_detection_model_name="PP-OCRv5_server_det" to the constructor will still override it.

Sources: docs/version3.x/paddleocr_and_paddlex.en.md84-98 paddleocr/_pipelines/pp_structurev3.py132-139

Internal Config Construction (`_get_paddlex_config_overrides`)

Example from PPStructureV3._get_paddlex_config_overrides (abbreviated):

Sources: paddleocr/_pipelines/pp_structurev3.py304-525 paddleocr/_pipelines/formula_recognition.py120-176 paddleocr/_pipelines/seal_recognition.py171-237

Simple Parameter to PaddleX Config Key Reference

The following table shows how selected simple constructor parameters map to their PaddleX config paths in the PPStructureV3 pipeline:

Simple Parameter	PaddleX Config Path
`layout_detection_model_name`	`SubModules.LayoutDetection.model_name`
`layout_threshold`	`SubModules.LayoutDetection.threshold`
`layout_nms`	`SubModules.LayoutDetection.layout_nms`
`text_detection_model_name`	`SubPipelines.GeneralOCR.SubModules.TextDetection.model_name`
`text_det_thresh`	`SubPipelines.GeneralOCR.SubModules.TextDetection.thresh`
`text_det_unclip_ratio`	`SubPipelines.GeneralOCR.SubModules.TextDetection.unclip_ratio`
`text_recognition_model_name`	`SubPipelines.GeneralOCR.SubModules.TextRecognition.model_name`
`text_recognition_batch_size`	`SubPipelines.GeneralOCR.SubModules.TextRecognition.batch_size`
`use_doc_orientation_classify`	`SubPipelines.DocPreprocessor.use_doc_orientation_classify`
`use_doc_unwarping`	`SubPipelines.DocPreprocessor.use_doc_unwarping`
`formula_recognition_model_name`	`SubPipelines.FormulaRecognition.SubModules.FormulaRecognition.model_name`
`seal_text_detection_model_name`	`SubPipelines.SealRecognition.SubPipelines.SealOCR.SubModules.TextDetection.model_name`

Sources: paddleocr/_pipelines/pp_structurev3.py304-525

PaddleX Version Compatibility

The PaddleX pipeline configuration format is version-coupled. The table below shows which PaddleX version is required for each PaddleOCR release:

PaddleOCR Version	PaddleX Version
`3.0.x`	`3.0.x`
`3.1.x`	`>= 3.1.0, < 3.2.0`
`3.2.x`	`>= 3.2.0, < 3.3.0`
`3.3.x`	`>= 3.3.0, < 3.4.0`
`3.4.x`	`>= 3.4.0, < 3.5.0`

Using a YAML file exported from a different PaddleX version may cause compatibility issues.

Sources: docs/version3.x/paddleocr_and_paddlex.en.md26-36 pyproject.toml41-46

Configuration System

Overview

Configuration Architecture

Pipeline Classes and Their Config Methods

PaddleOCR Pipeline to PaddleX Registration Name Mapping

The PaddleX YAML Configuration File

Structure

Dot-Notation Key Paths

Exporting a Configuration File

Loading a Configuration File

Python API

CLI

Configuration Priority

Internal Config Construction (`_get_paddlex_config_overrides`)

Simple Parameter to PaddleX Config Key Reference

PaddleX Version Compatibility

On this page

Configuration System

Overview

Configuration Architecture

Pipeline Classes and Their Config Methods

PaddleOCR Pipeline to PaddleX Registration Name Mapping

The PaddleX YAML Configuration File

Structure

Dot-Notation Key Paths

Exporting a Configuration File

Loading a Configuration File

Python API

CLI

Configuration Priority

Internal Config Construction (`_get_paddlex_config_overrides`)

Simple Parameter to PaddleX Config Key Reference

PaddleX Version Compatibility

On this page

Configuration System

Overview

Configuration Architecture

Pipeline Classes and Their Config Methods

PaddleOCR Pipeline to PaddleX Registration Name Mapping

The PaddleX YAML Configuration File

Structure

Dot-Notation Key Paths

Exporting a Configuration File

Loading a Configuration File

Python API

CLI

Configuration Priority

Internal Config Construction (_get_paddlex_config_overrides)

Simple Parameter to PaddleX Config Key Reference

PaddleX Version Compatibility

On this page

Configuration System

Overview

Configuration Architecture

Pipeline Classes and Their Config Methods

PaddleOCR Pipeline to PaddleX Registration Name Mapping

The PaddleX YAML Configuration File

Structure

Dot-Notation Key Paths

Exporting a Configuration File

Loading a Configuration File

Python API

CLI

Configuration Priority

Internal Config Construction (_get_paddlex_config_overrides)

Simple Parameter to PaddleX Config Key Reference

PaddleX Version Compatibility

On this page

Internal Config Construction (`_get_paddlex_config_overrides`)

Internal Config Construction (`_get_paddlex_config_overrides`)