PP-StructureV3 is a complex document parsing pipeline that converts document images and PDFs into structured, machine-readable formats (Markdown and JSON) while preserving the original document layout and hierarchical structure. This page documents the architecture, components, configuration, and usage of PP-StructureV3.
Related Pages:
Scope: This page covers the PP-StructureV3 pipeline architecture, module composition, configuration parameters, inference workflows, and output formats. Training individual modules is covered in their respective module documentation pages.
PP-StructureV3 is one of the four core pipelines in PaddleOCR 3.x, designed specifically for complex document understanding tasks. It extends basic OCR capabilities with comprehensive layout analysis and specialized element recognition.
Key Differentiators:
Sources: README.md1-30 docs/version3.x/pipeline_usage/PP-StructureV3.md1-20 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1-20
PP-StructureV3 follows a modular pipeline architecture where different modules process specific document elements. The pipeline class PPStructureV3 inherits from PaddleXPipelineWrapper to integrate with the PaddleX infrastructure.
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md11-20 docs/version3.x/pipeline_usage/PP-StructureV3.en.md11-20
PP-StructureV3 consists of seven modules or subpipelines:
| Module | Purpose | Optional | Models |
|---|---|---|---|
| Layout Detection | Identify document regions (text, table, formula, image, seal, etc.) | No | PP-DocLayout_plus-L, PP-DocBlockLayout, PP-DocLayout-L/M/S |
| General OCR | Extract text from detected regions | No | PP-OCRv5, PP-OCRv4 (detection + recognition) |
| Document Preprocessing | Correct orientation and distortion | Yes | PP-LCNet_x1_0_doc_ori, UVDoc |
| Table Recognition V2 | Parse table structure and content | Yes | SLANeXt_wired, SLANeXt_wireless, PP-TableMagic |
| Seal Text Recognition | Recognize curved seal text | Yes | PP-OCRv4 seal detection |
| Formula Recognition | Convert formulas to LaTeX | Yes | PP-FormulaNet, UniMERNet |
| Chart Parsing | Extract data from charts | Yes | PP-Chart2Table |
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md11-20 docs/version3.x/pipeline_usage/PP-StructureV3.en.md11-20
The default layout detection model (PP-DocLayout_plus-L) recognizes 20 categories:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md82-106 docs/version3.x/pipeline_usage/PP-StructureV3.en.md82-106
The PP-StructureV3 pipeline processes documents through a multi-stage workflow, with each stage handling specific document elements.
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md700-800 docs/version3.x/pipeline_usage/PP-StructureV3.en.md700-800
Each detected region is processed based on its category:
| Region Type | Processing Module | Output Format |
|---|---|---|
text, paragraph title, document title | OCR Subpipeline | Plain text with bounding boxes |
table | Table Recognition V2 | HTML table structure + cell content |
formula | Formula Recognition | LaTeX source code |
seal | Seal Text Recognition | Curved text content |
chart | Chart Parsing | Extracted table data |
image, figure | Image Handler | Image file path/reference |
page number, header, footer | OCR or Skip | Text or ignored based on config |
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md800-900
The layout detection module identifies and localizes different document regions. PP-StructureV3 uses PP-DocLayout_plus-L by default.
Key Models:
PP-DocLayout_plus-L: 83.2% [email protected], 20 categories, 126MBPP-DocBlockLayout: 95.9% [email protected], 1 category (Block), 124MB for sub-region detectionPP-DocLayout-L/M/S: 23-category models (90.4% / 75.2% / 70.9% mAP)Configuration:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md79-320 docs/version3.x/pipeline_usage/layout_detection.md
PP-StructureV3 uses an enhanced table recognition approach with separate handling for wired and wireless tables.
Architecture:
Key Models:
SLANeXt_wired: 69.65% accuracy, 351MB, for tables with bordersSLANeXt_wireless: 69.65% accuracy, 351MB, for borderless tablesSources: docs/version3.x/pipeline_usage/PP-StructureV3.md322-360 docs/version3.x/pipeline_usage/table_recognition_v2.md
Converts mathematical formulas in images to LaTeX source code.
Models:
PP-FormulaNet: 85.09% BLEU score, 88MB, optimized for printed formulasUniMERNet: 90.83% ExpRate, 1030MB, handles complex and handwritten formulasConfiguration:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md360-420 docs/version3.x/pipeline_usage/formula_recognition.md
Extracts structured data from chart images using the PP-Chart2Table model.
Capabilities:
Configuration:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md420-460 README.md171-173
Specialized for recognizing curved text in seals/stamps.
Components:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md460-520 docs/version3.x/pipeline_usage/seal_recognition.md
PP-StructureV3 configuration follows the PaddleX hierarchical structure with pipeline-level and module-level parameters.
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md520-640
Pipeline-Level Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
layout_detection_model_name | str | "PP-DocLayout_plus-L" | Layout detection model |
use_doc_orientation_classify | bool | False | Enable document orientation correction |
use_doc_unwarping | bool | False | Enable document unwarping |
use_doc_block_layout | bool | False | Enable sub-region detection |
use_table_recognition | bool | True | Enable table parsing |
use_formula_recognition | bool | True | Enable formula recognition |
use_seal_text_detection | bool | True | Enable seal text recognition |
use_chart_parsing | bool | True | Enable chart parsing |
page_range | list | [0, None] | Page range for PDF processing |
save_path | str | "./output" | Output directory |
Module-Level Parameters:
| Module | Key Parameters |
|---|---|
| Layout Detection | layout_detection_batch_size, layout_detection_device |
| Text Detection | text_detection_model_name, text_detection_batch_size |
| Text Recognition | text_recognition_model_name, text_recognition_batch_size |
| Table Recognition | table_recognition_model_name, table_classifier_model_name |
| Formula Recognition | formula_recognition_model_name, formula_recognition_batch_size |
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md640-700
PP-StructureV3 supports YAML configuration files:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1800-1900
Basic usage:
With configuration file:
Batch processing with options:
Specific page range:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md900-1000 docs/version3.x/pipeline_usage/PP-StructureV3.en.md900-1000
Basic inference:
With preprocessing:
Custom model configuration:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1000-1200 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1000-1200
Process multiple files:
Process specific page ranges:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1200-1300
PP-StructureV3 generates structured outputs in multiple formats to support different downstream applications.
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1600-1700
The Markdown output preserves document structure and hierarchy:
Features:
$...$ or $$...$$ blocksExample:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1700-1800
The JSON output provides machine-readable structured data:
Schema:
Field Descriptions:
type: Region category (text, table, formula, image, seal, chart, etc.)bbox: Bounding box coordinates [x1, y1, x2, y2]score: Detection confidence scorecontent: Recognized text contenthtml: Table structure in HTML formatlatex: Formula in LaTeX formatorder: Reading order in documentSources: docs/version3.x/pipeline_usage/PP-StructureV3.md1800-2000
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md2000-2100
PP-StructureV3 includes sophisticated reading order recovery for multi-column layouts, essential for preserving document semantics.
The pipeline uses block-level layout detection and spatial analysis to determine reading order:
Algorithm:
PP-DocBlockLayout (if enabled)Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md110-135
| Scenario | Recommended Configuration | Rationale |
|---|---|---|
| High accuracy | PP-DocLayout_plus-L + PP-OCRv5_server + UniMERNet | Best accuracy across all modules |
| Balanced | PP-DocLayout-M + PP-OCRv5_mobile + PP-FormulaNet | Good accuracy/speed tradeoff |
| High speed | PP-DocLayout-S + PP-OCRv5_mobile + disabled optional modules | Minimal processing time |
| Server deployment | PP-DocLayout_plus-L + all features enabled | Leverage GPU acceleration |
| Edge/Mobile | PP-DocLayout-S + selective features | Minimize memory/compute |
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md700-750
GPU Memory Constraints:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md640-700
Disabling optional modules can significantly improve speed:
| Configuration | Relative Speed | Use Case |
|---|---|---|
| All features enabled | 1.0x (baseline) | Complete document understanding |
| No preprocessing | 1.3x | Clean, well-oriented documents |
| No table recognition | 2.0x | Documents without tables |
| No formula recognition | 1.5x | Non-technical documents |
| No chart parsing | 1.2x | Documents without charts |
| Text-only (all optional disabled) | 3.0x | Simple text extraction |
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md2-100
PP-StructureV3 inherits from PaddleXPipelineWrapper to integrate with PaddleX infrastructure.
Key Integration Points:
Sources: README.md80-82 High-level architecture diagrams
Module Documentation:
Related Pipelines:
Deployment:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1-2500 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1-2500
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.