This document provides an overview of PaddleOCR's core pipelines and specialized modules that form the foundation of the system's OCR and document understanding capabilities. Each pipeline combines multiple processing modules to solve specific document analysis tasks, from basic text recognition to complex document parsing and intelligent information extraction.
For installation and setup instructions, see Installation and Environment Setup. For detailed usage of individual pipelines, refer to their respective subsections: PP-OCRv5, PaddleOCR-VL, PP-StructureV3, and PP-ChatOCRv4.
PaddleOCR 3.x is built around four primary pipelines that address different document understanding tasks. These pipelines share common processing modules but are optimized for specific use cases.
Pipeline Characteristics Table
| Pipeline | Primary Use Case | Key Capabilities | Output Format |
|---|---|---|---|
| PP-OCRv5 | Universal text recognition | Single model supports 5 text types (CN, TCN, EN, JP, Pinyin) | Text + coordinates |
| PaddleOCR-VL | Multilingual document parsing | 109+ languages, complex elements (text, tables, formulas, charts) | Structured data (JSON/Markdown) |
| PP-StructureV3 | Complex document structure | Layout analysis, multi-column reading order, element recognition | Markdown/JSON with preserved structure |
| PP-ChatOCRv4 | Intelligent Q&A | LLM-integrated document understanding with ERNIE 4.5 | Precise answers to queries |
Sources: README.md1-100 docs/index.en.md1-92 docs/version3.x/pipeline_usage/OCR.md1-50
Sources: docs/version3.x/pipeline_usage/OCR.md14-22 docs/version3.x/pipeline_usage/PP-StructureV3.md11-20
PP-OCRv5 is the foundation pipeline for general text recognition tasks. It provides high-accuracy text detection and recognition in a unified framework.
Detection Models
| Model | mAP(0.5) | GPU Time (ms) | CPU Time (ms) | Size (MB) |
|---|---|---|---|---|
| PP-OCRv5_server_det | 83.8% | 89.55 / 70.19 | 383.15 / 383.15 | 84.3 |
| PP-OCRv5_mobile_det | 79.0% | 10.67 / 6.36 | 57.77 / 28.15 | 4.7 |
Recognition Models (Multi-Language Support)
| Model | Avg Accuracy | Languages Supported | Size (MB) |
|---|---|---|---|
| PP-OCRv5_server_rec | 86.38% | CN, TCN, EN, JP, Pinyin | 81 |
| PP-OCRv5_mobile_rec | 81.29% | CN, TCN, EN, JP, Pinyin | 16 |
| en_PP-OCRv5_mobile_rec | 85.25% | English + Numbers | 7.5 |
| korean_PP-OCRv5_mobile_rec | 88.0% | Korean + EN + Numbers | 14 |
| arabic_PP-OCRv5_mobile_rec | 81.27% | Arabic + Numbers | 7.6 |
| cyrillic_PP-OCRv5_mobile_rec | 80.27% | Cyrillic + Numbers | 7.7 |
| devanagari_PP-OCRv5_mobile_rec | 84.96% | Devanagari + Numbers | 7.5 |
Module Structure
The pipeline combines these modules defined in:
Sources: docs/version3.x/pipeline_usage/OCR.md1-700 docs/version3.x/pipeline_usage/OCR.en.md1-700 README.md67-76
PaddleOCR-VL is a compact 0.9B parameter vision-language model designed specifically for document parsing. It achieves SOTA performance on multilingual document understanding while maintaining minimal resource consumption.
| Feature | Specification |
|---|---|
| Model Size | 0.9B parameters |
| Visual Encoder | NaViT-style dynamic resolution |
| Language Model | ERNIE-4.5-0.3B |
| Language Support | 111 languages |
| OmniDocBench v1.5 | 94.5% accuracy |
| Complex Elements | Text, tables, formulas, charts |
| Special Capabilities | Irregular shape positioning, cross-page table merging |
PaddleOCR-VL supports multiple inference backends:
Sources: README.md61-96 README.md103-110 docs/index.en.md17-29
PP-StructureV3 converts complex document images and PDFs into structured formats (Markdown/JSON) while preserving the original layout and hierarchical structure. It integrates multiple specialized recognition modules.
PP-StructureV3 uses layout detection to identify 20 common document element types:
| Category | Element Types |
|---|---|
| Text Elements | Document title, paragraph title, text, abstract, references, footnotes, sidebar text |
| Structural | Page number, header, footer, table of contents |
| Technical | Algorithm, formula, formula number |
| Visual | Image, table, chart, seal |
| Captions | Figure/table/chart captions |
Key Layout Models
| Model | mAP(0.5) | GPU Time (ms) | Size (MB) | Description |
|---|---|---|---|---|
| PP-DocLayout_plus-L | 83.2% | 53.03 / 17.23 | 126.01 | Highest accuracy, 20 categories |
| PP-DocBlockLayout | 95.9% | 34.60 / 28.54 | 123.92 | Block detection for multi-column layouts |
| PP-DocLayout-L | 90.4% | 33.59 / 33.59 | 123.76 | 23 categories |
Table Recognition V2 Architecture
Formula Recognition Models
| Model | ExpRate | BLEU | Size (MB) |
|---|---|---|---|
| PP-FormulaNet-S | 87.10% | 96.80% | 40.8 |
| UniMERNet | 95.50% | 98.20% | 950.0 |
Chart Parsing
The PP-Chart2Table module converts chart images to tabular data for analysis.
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1-320 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1-320 docs/version3.x/pipeline_usage/table_recognition_v2.md1-100
PP-ChatOCRv4 combines OCR with Large Language Models (ERNIE 4.5) to enable intelligent question-answering over document content. It performs document parsing followed by LLM-based information extraction.
PP-ChatOCRv4 integrates the following modules:
| Module | Purpose | Key Models |
|---|---|---|
| Layout Detection | Identify document structure | PP-DocLayout-L/M/S |
| Table Recognition | Extract table structures | SLANeXt + TableCells_YOLO |
| Text Detection | Locate text regions | PP-OCRv5_server_det |
| Text Recognition | Extract text content | PP-OCRv5_server_rec |
| Formula Recognition | Extract LaTeX formulas | PP-FormulaNet, UniMERNet |
| Seal Detection | Detect seal locations | PP-OCRv4_server_seal_det |
The PPChatOCRv4 class provides these methods:
save_vector() - Store document vectors for retrievalload_vector() - Load stored document vectorssave_visual_info_list() - Store visual element informationload_visual_info_list() - Load visual element informationSources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-100 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md1-100 README.md151-153
PP-DocTranslation extends PP-StructureV3 with translation capabilities, converting documents across languages while preserving structure.
Key Features
| Feature | Description |
|---|---|
| Input Formats | PDF, images, Markdown |
| Output Format | Markdown with preserved structure |
| Translation Engine | ERNIE 4.5 |
| Structure Preservation | Full layout and hierarchy maintained |
| Custom Glossary | Support for domain-specific terms |
The PPDocTranslation class provides a translate() method with glossary and llm_request_interval parameters.
Sources: README.md172-173 README.md154-159
Specialized pipeline for extracting text from circular seals and stamps.
Seal Detection Models
| Model | Hmean | GPU Time (ms) | Size (MB) |
|---|---|---|---|
| PP-OCRv4_server_seal_det | 98.21% | 85.19 / 78.16 | 109 |
Sources: docs/version3.x/pipeline_usage/seal_recognition.md1-100 docs/version3.x/pipeline_usage/seal_recognition.en.md1-100
Converts mathematical formulas in images to LaTeX source code.
Model Comparison
| Model | Type Support | ExpRate | BLEU | Size (MB) |
|---|---|---|---|---|
| PP-FormulaNet-S | Printed simple | 87.10% | 96.80% | 40.8 |
| PP-FormulaNet-L | Printed complex | 89.70% | 97.60% | 133.3 |
| PP-FormulaNet-H | Handwritten | 90.80% | 98.20% | 331.0 |
| UniMERNet | All types | 95.50% | 98.20% | 950.0 |
Sources: docs/version3.x/pipeline_usage/formula_recognition.md1-100 docs/version3.x/pipeline_usage/formula_recognition.en.md1-100
Enhanced table recognition with classification-guided processing.
Table Structure Models
| Model | Accuracy | GPU Time (ms) | Size (MB) |
|---|---|---|---|
| SLANeXt_wired | 69.65% | 85.92 / 85.92 | 351 |
| SLANeXt_wireless | 69.65% | 85.92 / 85.92 | 351 |
Sources: docs/version3.x/pipeline_usage/table_recognition_v2.md1-150 docs/version3.x/pipeline_usage/table_recognition_v2.en.md1-150
Optional modules for handling non-standard document orientations and distortions.
Preprocessing Models
| Module | Model | Accuracy/CER | Size (MB) |
|---|---|---|---|
| Orientation | PP-LCNet_x1_0_doc_ori | 99.06% Top-1 | 7 |
| Unwarping | UVDoc | 0.179 CER | 30.3 |
Sources: docs/version3.x/pipeline_usage/doc_preprocessor.md1-100
The pipelines share common modules through a modular architecture. This design allows efficient reuse and consistent behavior across different pipelines.
Each pipeline uses a hierarchical configuration structure that allows customization of individual modules:
| Configuration Level | Description | Example Parameters |
|---|---|---|
| Pipeline Level | Top-level pipeline settings | device, save_path, enable_hpi |
| Module Level | Individual module configurations | text_detection_model_name, layout_detection_model_name |
| Model Level | Specific model parameters | det_limit_side_len, rec_batch_size |
Example Configuration Transformation
The system uses PaddleXPipelineWrapper as a base class that transforms flat configuration parameters into hierarchical structures for the underlying PaddleX infrastructure.
Sources: README.md198-251 docs/version3.x/pipeline_usage/OCR.md14-633 docs/version3.x/pipeline_usage/PP-StructureV3.md11-320
| Use Case | Recommended Pipeline | Key Models |
|---|---|---|
| Basic text extraction | PP-OCRv5 | PP-OCRv5_server_det + PP-OCRv5_server_rec |
| Multilingual documents (109+ languages) | PaddleOCR-VL | PaddleOCR-VL-0.9B or PaddleOCR-VL-1.5 |
| Complex document structure | PP-StructureV3 | PP-DocLayout_plus-L + SLANeXt + PP-Chart2Table |
| Document Q&A | PP-ChatOCRv4 | PP-StructureV3 + ERNIE 4.5 |
| Document translation | PP-DocTranslation | PP-StructureV3 + ERNIE 4.5 |
| Seal text only | Seal Recognition | PP-OCRv4_server_seal_det + PP-OCRv5_server_rec |
| Mathematical formulas | Formula Recognition | UniMERNet (high accuracy) or PP-FormulaNet (fast) |
| Target | Model Size | Recommended Variants |
|---|---|---|
| Server/Cloud | Large, high accuracy | *_server_* models |
| Edge/Mobile | Small, fast | *_mobile_* models |
| CPU-only | Optimized for CPU | Enable enable_mkldnn=True |
| GPU | Optimized for GPU | Use with enable_hpi=True for TensorRT |
| Language Category | Model | Supported Languages |
|---|---|---|
| Multi-type (single model) | PP-OCRv5_server_rec | CN, TCN, EN, JP, Pinyin |
| English | en_PP-OCRv5_mobile_rec | English, numbers |
| Korean | korean_PP-OCRv5_mobile_rec | Korean, English, numbers |
| Latin script | latin_PP-OCRv5_mobile_rec | Most Latin-based languages |
| Arabic script | arabic_PP-OCRv5_mobile_rec | Arabic letters, numbers |
| Cyrillic | cyrillic_PP-OCRv5_mobile_rec | Slavic languages |
| Devanagari | devanagari_PP-OCRv5_mobile_rec | Hindi, Sanskrit, numbers |
| Telugu | te_PP-OCRv5_mobile_rec | Telugu, numbers |
| Tamil | ta_PP-OCRv5_mobile_rec | Tamil, numbers |
| Thai | th_PP-OCRv5_mobile_rec | Thai, English, numbers |
| Greek | el_PP-OCRv5_mobile_rec | Greek, English, numbers |
| 109+ languages | PaddleOCR-VL | Comprehensive multilingual support |
Sources: docs/version3.x/pipeline_usage/OCR.md240-632 docs/version3.x/pipeline_usage/OCR.en.md240-632 README.md61-96
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.