Document Preprocessing Modules

Relevant source files

This page covers the three optional preprocessing modules in PaddleOCR that prepare document images for downstream text recognition: Document Image Orientation Classification, Text Image Unwarping, and Text Line Orientation Classification. It also covers the doc_preprocessor sub-pipeline, which bundles the first two modules into a reusable preprocessing stage.

These modules are consumed by the OCR, PP-StructureV3, PP-ChatOCRv4, Seal Recognition, Table Recognition V2, and Formula Recognition pipelines. For details on those downstream pipelines, see pages 2.1, 2.3, and 2.4. For module training, see page 4.

Overview

Document images frequently arrive in non-standard states: rotated 90° or 180°, physically bent or warped, or with individual text lines printed in reverse orientation. Running OCR directly on such images degrades detection and recognition accuracy. The preprocessing modules correct these conditions before passing images downstream.

All three modules are optional in every pipeline they appear in. They add latency, so they should be enabled only when the input image set contains the corresponding artifacts.

Module Summary

Module	Task	Default Model	Output
Document Image Orientation Classification	4-class rotation detection	`PP-LCNet_x1_0_doc_ori`	Rotation angle (0°/90°/180°/270°) + rotated image
Text Image Unwarping	Geometric dewarping	`UVDoc`	Flattened/unwarped image
Text Line Orientation Classification	2-class per-line flip detection	`PP-LCNet_x0_25_textline_ori`	Rotation angle (0°/180°) per text line

Sources: docs/version3.x/pipeline_usage/OCR.en.md15-21 docs/version3.x/pipeline_usage/doc_preprocessor.md1-14

The `doc_preprocessor` Sub-Pipeline

The doc_preprocessor pipeline packages the Document Image Orientation Classification and Text Image Unwarping modules into a single reusable sub-pipeline. PP-StructureV3, Seal Recognition, Table Recognition V2, and Formula Recognition embed it as an optional preprocessing stage. The OCR pipeline includes both those modules directly (plus Text Line Orientation Classification) rather than embedding doc_preprocessor as a sub-pipeline.

Data flow through doc_preprocessor

Sources: docs/version3.x/pipeline_usage/doc_preprocessor.md1-14 docs/version3.x/pipeline_usage/doc_preprocessor.en.md1-20

Module 1: Document Image Orientation Classification

Purpose: Detects the rotational orientation of the entire document image and rotates it to 0° before further processing. Handles cases such as scanned documents fed upside-down or sideways.

Classes: 0°, 90°, 180°, 270°
Architecture: PP-LCNet_x1_0 (lightweight classification backbone)

Available Model

Model	Top-1 Acc (%)	GPU Inference ms [Std / HP]	CPU Inference ms [Std / HP]	Size (MB)
`PP-LCNet_x1_0_doc_ori`	99.06	2.62 / 0.59	3.24 / 1.19	7

Benchmark: PaddleOCR self-built dataset covering ID cards and documents; 1,000 images.
Hardware: NVIDIA Tesla T4 GPU, Intel Xeon Gold 6271C CPU.
Std = FP32 / no TRT; HP = best prior backend (Paddle / OpenVINO / TRT).

CLI / API Parameters

Parameter	Type	Default	Description
`use_doc_orientation_classify`	bool	`True`	Enable/disable this module
`doc_orientation_classify_model_name`	str	—	Override model name
`doc_orientation_classify_model_dir`	str	—	Local model directory; triggers download if unset

Sources: docs/version3.x/pipeline_usage/OCR.en.md27-51 docs/version3.x/pipeline_usage/OCR.md876-816

Module 2: Text Image Unwarping

Purpose: Corrects geometric distortion in document images caused by physical curvature, camera perspective, or page curl. Produces a flat, rectified image. Evaluated using Character Error Rate (CER) on the DocUNet benchmark.

Architecture: UVDoc (UV-space document unwarping)

Available Model

Model	CER	GPU Inference ms [Std / HP]	CPU Inference ms [Std / HP]	Size (MB)
`UVDoc`	0.179	19.05 / 19.05	— / 869.82	30.3

Note: CPU standard mode is not benchmarked for this model. GPU high-performance mode provides no speedup over standard mode for UVDoc.

CLI / API Parameters

Parameter	Type	Default	Description
`use_doc_unwarping`	bool	`True`	Enable/disable this module
`doc_unwarping_model_name`	str	—	Override model name
`doc_unwarping_model_dir`	str	—	Local model directory

Sources: docs/version3.x/pipeline_usage/OCR.en.md54-79 docs/version3.x/pipeline_usage/OCR.md54-79

Module 3: Text Line Orientation Classification

Purpose: Operates on individual detected text line crops (not the full document image). Identifies whether each text line is printed normally (0°) or upside-down (180°). Applied after text detection and before text recognition in the OCR pipeline.

Classes: 0°, 180°
Architecture: PP-LCNet_x0_25 (ultra-lightweight) or PP-LCNet_x1_0 (higher accuracy)

Available Models

Model	Top-1 Acc (%)	GPU Inference ms [Std / HP]	CPU Inference ms [Std / HP]	Size (MB)
`PP-LCNet_x0_25_textline_ori`	98.85	2.16 / 0.41	2.37 / 0.73	0.96
`PP-LCNet_x1_0_textline_ori`	99.42	— / —	2.98 / 2.98	6.5

Benchmark: PaddleOCR self-built dataset covering ID cards and documents; 1,000 images.

CLI / API Parameters

Parameter	Type	Default	Description
`use_textline_orientation`	bool	`True`	Enable/disable this module
`textline_orientation_model_name`	str	—	Override model name
`textline_orientation_model_dir`	str	—	Local model directory
`textline_orientation_batch_size`	int	`1`	Batch size for inference

This module only exists inside the OCR pipeline. It is not part of the doc_preprocessor sub-pipeline.

Sources: docs/version3.x/pipeline_usage/OCR.en.md81-115 docs/version3.x/pipeline_usage/OCR.md838-854

Code Entity Map

The diagram below maps user-visible module names to their default model identifiers and the CLI flags that control them.

Module names → model names → CLI flags

Sources: docs/version3.x/pipeline_usage/OCR.md804-854 docs/version3.x/pipeline_usage/OCR.en.md800-870

Pipeline Integration

The following diagram shows which preprocessing modules are optional components in each PaddleOCR pipeline.

Preprocessing modules across pipelines

Dashed-border modules are optional in all pipelines where they appear.

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.en.md11-19 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md13-24 docs/version3.x/pipeline_usage/seal_recognition.md14-21 docs/version3.x/pipeline_usage/table_recognition_v2.md18-27 docs/version3.x/pipeline_usage/formula_recognition.md15-18

Enabling and Disabling Modules

All three modules default to True (enabled) in the OCR pipeline. Pass the corresponding flag to disable:

In Python, these are constructor parameters on the pipeline object (see page 3.3 for Python API details).

When to disable:

use_doc_orientation_classify False — input images are always correctly oriented
use_doc_unwarping False — input images are flat scans (not photographs of curved documents); saves ~19ms GPU latency per image
use_textline_orientation False — text lines are uniformly right-side up; saves <3ms per batch

Sources: docs/version3.x/pipeline_usage/OCR.md763-776 docs/version3.x/pipeline_usage/OCR.en.md763-776

Benchmark Environment

All inference times reported in this page were measured under the following conditions:

Item	Specification
GPU	NVIDIA Tesla T4
CPU	Intel Xeon Gold 6271C @ 2.60GHz
OS / CUDA	Ubuntu 20.04 / CUDA 11.8 / cuDNN 8.9
TensorRT	8.6.1.6
PaddlePaddle	3.0.0
PaddleOCR	3.0.3

Mode	GPU Config	CPU Config	Backend
Standard	FP32 / no TRT	FP32 / 8 threads	PaddleInference
High-Performance	Best prior precision + acceleration	FP32 / 8 threads	Best prior backend (Paddle / OpenVINO / TRT)

Sources: docs/version3.x/pipeline_usage/OCR.md636-698

Document Preprocessing Modules

Relevant source files

Overview

All three modules are optional in every pipeline they appear in. They add latency, so they should be enabled only when the input image set contains the corresponding artifacts.

Module Summary

Module	Task	Default Model	Output
Document Image Orientation Classification	4-class rotation detection	`PP-LCNet_x1_0_doc_ori`	Rotation angle (0°/90°/180°/270°) + rotated image
Text Image Unwarping	Geometric dewarping	`UVDoc`	Flattened/unwarped image
Text Line Orientation Classification	2-class per-line flip detection	`PP-LCNet_x0_25_textline_ori`	Rotation angle (0°/180°) per text line

Sources: docs/version3.x/pipeline_usage/OCR.en.md15-21 docs/version3.x/pipeline_usage/doc_preprocessor.md1-14

The `doc_preprocessor` Sub-Pipeline

Data flow through doc_preprocessor

Sources: docs/version3.x/pipeline_usage/doc_preprocessor.md1-14 docs/version3.x/pipeline_usage/doc_preprocessor.en.md1-20

Module 1: Document Image Orientation Classification

Purpose: Detects the rotational orientation of the entire document image and rotates it to 0° before further processing. Handles cases such as scanned documents fed upside-down or sideways.

Classes: 0°, 90°, 180°, 270°
Architecture: PP-LCNet_x1_0 (lightweight classification backbone)

Available Model

Model	Top-1 Acc (%)	GPU Inference ms [Std / HP]	CPU Inference ms [Std / HP]	Size (MB)
`PP-LCNet_x1_0_doc_ori`	99.06	2.62 / 0.59	3.24 / 1.19	7

Benchmark: PaddleOCR self-built dataset covering ID cards and documents; 1,000 images.
Hardware: NVIDIA Tesla T4 GPU, Intel Xeon Gold 6271C CPU.
Std = FP32 / no TRT; HP = best prior backend (Paddle / OpenVINO / TRT).

CLI / API Parameters

Parameter	Type	Default	Description
`use_doc_orientation_classify`	bool	`True`	Enable/disable this module
`doc_orientation_classify_model_name`	str	—	Override model name
`doc_orientation_classify_model_dir`	str	—	Local model directory; triggers download if unset

Sources: docs/version3.x/pipeline_usage/OCR.en.md27-51 docs/version3.x/pipeline_usage/OCR.md876-816

Module 2: Text Image Unwarping

Architecture: UVDoc (UV-space document unwarping)

Available Model

Model	CER	GPU Inference ms [Std / HP]	CPU Inference ms [Std / HP]	Size (MB)
`UVDoc`	0.179	19.05 / 19.05	— / 869.82	30.3

Note: CPU standard mode is not benchmarked for this model. GPU high-performance mode provides no speedup over standard mode for UVDoc.

CLI / API Parameters

Parameter	Type	Default	Description
`use_doc_unwarping`	bool	`True`	Enable/disable this module
`doc_unwarping_model_name`	str	—	Override model name
`doc_unwarping_model_dir`	str	—	Local model directory

Sources: docs/version3.x/pipeline_usage/OCR.en.md54-79 docs/version3.x/pipeline_usage/OCR.md54-79

Module 3: Text Line Orientation Classification

Classes: 0°, 180°
Architecture: PP-LCNet_x0_25 (ultra-lightweight) or PP-LCNet_x1_0 (higher accuracy)

Available Models

Model	Top-1 Acc (%)	GPU Inference ms [Std / HP]	CPU Inference ms [Std / HP]	Size (MB)
`PP-LCNet_x0_25_textline_ori`	98.85	2.16 / 0.41	2.37 / 0.73	0.96
`PP-LCNet_x1_0_textline_ori`	99.42	— / —	2.98 / 2.98	6.5

Benchmark: PaddleOCR self-built dataset covering ID cards and documents; 1,000 images.

CLI / API Parameters

Parameter	Type	Default	Description
`use_textline_orientation`	bool	`True`	Enable/disable this module
`textline_orientation_model_name`	str	—	Override model name
`textline_orientation_model_dir`	str	—	Local model directory
`textline_orientation_batch_size`	int	`1`	Batch size for inference

This module only exists inside the OCR pipeline. It is not part of the doc_preprocessor sub-pipeline.

Sources: docs/version3.x/pipeline_usage/OCR.en.md81-115 docs/version3.x/pipeline_usage/OCR.md838-854

Code Entity Map

The diagram below maps user-visible module names to their default model identifiers and the CLI flags that control them.

Module names → model names → CLI flags

Sources: docs/version3.x/pipeline_usage/OCR.md804-854 docs/version3.x/pipeline_usage/OCR.en.md800-870

Pipeline Integration

The following diagram shows which preprocessing modules are optional components in each PaddleOCR pipeline.

Preprocessing modules across pipelines

Dashed-border modules are optional in all pipelines where they appear.

Enabling and Disabling Modules

All three modules default to True (enabled) in the OCR pipeline. Pass the corresponding flag to disable:

In Python, these are constructor parameters on the pipeline object (see page 3.3 for Python API details).

When to disable:

use_doc_orientation_classify False — input images are always correctly oriented
use_doc_unwarping False — input images are flat scans (not photographs of curved documents); saves ~19ms GPU latency per image
use_textline_orientation False — text lines are uniformly right-side up; saves <3ms per batch

Sources: docs/version3.x/pipeline_usage/OCR.md763-776 docs/version3.x/pipeline_usage/OCR.en.md763-776

Benchmark Environment

All inference times reported in this page were measured under the following conditions:

Item	Specification
GPU	NVIDIA Tesla T4
CPU	Intel Xeon Gold 6271C @ 2.60GHz
OS / CUDA	Ubuntu 20.04 / CUDA 11.8 / cuDNN 8.9
TensorRT	8.6.1.6
PaddlePaddle	3.0.0
PaddleOCR	3.0.3

Mode	GPU Config	CPU Config	Backend
Standard	FP32 / no TRT	FP32 / 8 threads	PaddleInference
High-Performance	Best prior precision + acceleration	FP32 / 8 threads	Best prior backend (Paddle / OpenVINO / TRT)

Sources: docs/version3.x/pipeline_usage/OCR.md636-698

Document Preprocessing Modules

Overview

The `doc_preprocessor` Sub-Pipeline

Module 1: Document Image Orientation Classification

Available Model

CLI / API Parameters

Module 2: Text Image Unwarping

Available Model

CLI / API Parameters

Module 3: Text Line Orientation Classification

Available Models

CLI / API Parameters

Code Entity Map

Pipeline Integration

Enabling and Disabling Modules

Benchmark Environment

On this page

Document Preprocessing Modules

Overview

The `doc_preprocessor` Sub-Pipeline

Module 1: Document Image Orientation Classification

Available Model

CLI / API Parameters

Module 2: Text Image Unwarping

Available Model

CLI / API Parameters

Module 3: Text Line Orientation Classification

Available Models

CLI / API Parameters

Code Entity Map

Pipeline Integration

Enabling and Disabling Modules

Benchmark Environment

On this page

Document Preprocessing Modules

Overview

The doc_preprocessor Sub-Pipeline

Module 1: Document Image Orientation Classification

Available Model

CLI / API Parameters

Module 2: Text Image Unwarping

Available Model

CLI / API Parameters

Module 3: Text Line Orientation Classification

Available Models

CLI / API Parameters

Code Entity Map

Pipeline Integration

Enabling and Disabling Modules

Benchmark Environment

On this page

Document Preprocessing Modules

Overview

The doc_preprocessor Sub-Pipeline

Module 1: Document Image Orientation Classification

Available Model

CLI / API Parameters

Module 2: Text Image Unwarping

Available Model

CLI / API Parameters

Module 3: Text Line Orientation Classification

Available Models

CLI / API Parameters

Code Entity Map

Pipeline Integration

Enabling and Disabling Modules

Benchmark Environment

On this page

The `doc_preprocessor` Sub-Pipeline

The `doc_preprocessor` Sub-Pipeline