This page covers the three optional preprocessing modules in PaddleOCR that prepare document images for downstream text recognition: Document Image Orientation Classification, Text Image Unwarping, and Text Line Orientation Classification. It also covers the doc_preprocessor sub-pipeline, which bundles the first two modules into a reusable preprocessing stage.
These modules are consumed by the OCR, PP-StructureV3, PP-ChatOCRv4, Seal Recognition, Table Recognition V2, and Formula Recognition pipelines. For details on those downstream pipelines, see pages 2.1, 2.3, and 2.4. For module training, see page 4.
Document images frequently arrive in non-standard states: rotated 90° or 180°, physically bent or warped, or with individual text lines printed in reverse orientation. Running OCR directly on such images degrades detection and recognition accuracy. The preprocessing modules correct these conditions before passing images downstream.
All three modules are optional in every pipeline they appear in. They add latency, so they should be enabled only when the input image set contains the corresponding artifacts.
Module Summary
| Module | Task | Default Model | Output |
|---|---|---|---|
| Document Image Orientation Classification | 4-class rotation detection | PP-LCNet_x1_0_doc_ori | Rotation angle (0°/90°/180°/270°) + rotated image |
| Text Image Unwarping | Geometric dewarping | UVDoc | Flattened/unwarped image |
| Text Line Orientation Classification | 2-class per-line flip detection | PP-LCNet_x0_25_textline_ori | Rotation angle (0°/180°) per text line |
Sources: docs/version3.x/pipeline_usage/OCR.en.md15-21 docs/version3.x/pipeline_usage/doc_preprocessor.md1-14
doc_preprocessor Sub-PipelineThe doc_preprocessor pipeline packages the Document Image Orientation Classification and Text Image Unwarping modules into a single reusable sub-pipeline. PP-StructureV3, Seal Recognition, Table Recognition V2, and Formula Recognition embed it as an optional preprocessing stage. The OCR pipeline includes both those modules directly (plus Text Line Orientation Classification) rather than embedding doc_preprocessor as a sub-pipeline.
Data flow through doc_preprocessor
Sources: docs/version3.x/pipeline_usage/doc_preprocessor.md1-14 docs/version3.x/pipeline_usage/doc_preprocessor.en.md1-20
Purpose: Detects the rotational orientation of the entire document image and rotates it to 0° before further processing. Handles cases such as scanned documents fed upside-down or sideways.
Classes: 0°, 90°, 180°, 270°
Architecture: PP-LCNet_x1_0 (lightweight classification backbone)
| Model | Top-1 Acc (%) | GPU Inference ms [Std / HP] | CPU Inference ms [Std / HP] | Size (MB) |
|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | 99.06 | 2.62 / 0.59 | 3.24 / 1.19 | 7 |
Benchmark: PaddleOCR self-built dataset covering ID cards and documents; 1,000 images.
Hardware: NVIDIA Tesla T4 GPU, Intel Xeon Gold 6271C CPU.
Std = FP32 / no TRT; HP = best prior backend (Paddle / OpenVINO / TRT).
| Parameter | Type | Default | Description |
|---|---|---|---|
use_doc_orientation_classify | bool | True | Enable/disable this module |
doc_orientation_classify_model_name | str | — | Override model name |
doc_orientation_classify_model_dir | str | — | Local model directory; triggers download if unset |
Sources: docs/version3.x/pipeline_usage/OCR.en.md27-51 docs/version3.x/pipeline_usage/OCR.md876-816
Purpose: Corrects geometric distortion in document images caused by physical curvature, camera perspective, or page curl. Produces a flat, rectified image. Evaluated using Character Error Rate (CER) on the DocUNet benchmark.
Architecture: UVDoc (UV-space document unwarping)
| Model | CER | GPU Inference ms [Std / HP] | CPU Inference ms [Std / HP] | Size (MB) |
|---|---|---|---|---|
UVDoc | 0.179 | 19.05 / 19.05 | — / 869.82 | 30.3 |
Note: CPU standard mode is not benchmarked for this model. GPU high-performance mode provides no speedup over standard mode for
UVDoc.
| Parameter | Type | Default | Description |
|---|---|---|---|
use_doc_unwarping | bool | True | Enable/disable this module |
doc_unwarping_model_name | str | — | Override model name |
doc_unwarping_model_dir | str | — | Local model directory |
Sources: docs/version3.x/pipeline_usage/OCR.en.md54-79 docs/version3.x/pipeline_usage/OCR.md54-79
Purpose: Operates on individual detected text line crops (not the full document image). Identifies whether each text line is printed normally (0°) or upside-down (180°). Applied after text detection and before text recognition in the OCR pipeline.
Classes: 0°, 180°
Architecture: PP-LCNet_x0_25 (ultra-lightweight) or PP-LCNet_x1_0 (higher accuracy)
| Model | Top-1 Acc (%) | GPU Inference ms [Std / HP] | CPU Inference ms [Std / HP] | Size (MB) |
|---|---|---|---|---|
PP-LCNet_x0_25_textline_ori | 98.85 | 2.16 / 0.41 | 2.37 / 0.73 | 0.96 |
PP-LCNet_x1_0_textline_ori | 99.42 | — / — | 2.98 / 2.98 | 6.5 |
Benchmark: PaddleOCR self-built dataset covering ID cards and documents; 1,000 images.
| Parameter | Type | Default | Description |
|---|---|---|---|
use_textline_orientation | bool | True | Enable/disable this module |
textline_orientation_model_name | str | — | Override model name |
textline_orientation_model_dir | str | — | Local model directory |
textline_orientation_batch_size | int | 1 | Batch size for inference |
This module only exists inside the OCR pipeline. It is not part of the
doc_preprocessorsub-pipeline.
Sources: docs/version3.x/pipeline_usage/OCR.en.md81-115 docs/version3.x/pipeline_usage/OCR.md838-854
The diagram below maps user-visible module names to their default model identifiers and the CLI flags that control them.
Module names → model names → CLI flags
Sources: docs/version3.x/pipeline_usage/OCR.md804-854 docs/version3.x/pipeline_usage/OCR.en.md800-870
The following diagram shows which preprocessing modules are optional components in each PaddleOCR pipeline.
Preprocessing modules across pipelines
Dashed-border modules are optional in all pipelines where they appear.
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.en.md11-19 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md13-24 docs/version3.x/pipeline_usage/seal_recognition.md14-21 docs/version3.x/pipeline_usage/table_recognition_v2.md18-27 docs/version3.x/pipeline_usage/formula_recognition.md15-18
All three modules default to True (enabled) in the OCR pipeline. Pass the corresponding flag to disable:
In Python, these are constructor parameters on the pipeline object (see page 3.3 for Python API details).
When to disable:
use_doc_orientation_classify False — input images are always correctly orienteduse_doc_unwarping False — input images are flat scans (not photographs of curved documents); saves ~19ms GPU latency per imageuse_textline_orientation False — text lines are uniformly right-side up; saves <3ms per batchSources: docs/version3.x/pipeline_usage/OCR.md763-776 docs/version3.x/pipeline_usage/OCR.en.md763-776
All inference times reported in this page were measured under the following conditions:
| Item | Specification |
|---|---|
| GPU | NVIDIA Tesla T4 |
| CPU | Intel Xeon Gold 6271C @ 2.60GHz |
| OS / CUDA | Ubuntu 20.04 / CUDA 11.8 / cuDNN 8.9 |
| TensorRT | 8.6.1.6 |
| PaddlePaddle | 3.0.0 |
| PaddleOCR | 3.0.3 |
| Mode | GPU Config | CPU Config | Backend |
|---|---|---|---|
| Standard | FP32 / no TRT | FP32 / 8 threads | PaddleInference |
| High-Performance | Best prior precision + acceleration | FP32 / 8 threads | Best prior backend (Paddle / OpenVINO / TRT) |
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.