PP-StructureV2 System (Deprecated)

Relevant source files

Purpose and Scope

⚠️ CRITICAL DEPRECATION NOTICE: PP-StructureV2 is the deprecated second-generation document structure analysis pipeline in PaddleOCR. This system is no longer maintained and will be removed in a future release. This documentation is preserved for reference only.

Users must migrate to:

PP-StructureV3 (2.3) for document parsing with 20 layout categories, improved table recognition, and chart parsing
PP-ChatOCRv4 (2.4) for key information extraction with LLM integration

PP-StructureV2 provides basic layout analysis, table recognition, OCR, formula recognition, and KIE capabilities through the StructureSystem class ppstructure/predict_system.py44-272 However, it suffers from limitations that are resolved in PP-StructureV3:

Limited layout categories (5-10 vs. 20 in V3)
Lower table recognition accuracy (95.89% vs. improved SLANeXt in V3)
No chart recognition capability
No document preprocessing (orientation, unwarping)
Manual module coordination required

Related Documentation:

PP-StructureV3 (recommended): 2.3
PP-ChatOCRv4 for KIE: 2.4
Model training: 4
Deployment: 5

Sources: ppstructure/README.md1-17 ppstructure/predict_system.py44-272

Key Differences from PP-StructureV3

Before diving into PP-StructureV2 details, understand why migration to PP-StructureV3 is essential:

Aspect	PP-StructureV2	PP-StructureV3	Impact
Layout Categories	5-10 categories (PubLayNet/CDLA)	20 categories (PP-DocLayout)	V3 handles more document types
Table Recognition	3-model pipeline (DB+CRNN+SLANet)	SLANeXt with wired/wireless classification	10-15% accuracy improvement
Chart Support	None	PP-Chart2Table module	V3 extracts data from charts
Document Preprocessing	Manual orientation correction only	Built-in orientation + unwarping	Better handling of scanned docs
Reading Order	Basic top-to-bottom sort	Advanced multi-column detection	More accurate document recovery
API Integration	Standalone scripts	PaddleOCR 3.x wheel package	Easier installation and usage
Module Coordination	Manual `StructureSystem` configuration	Automatic pipeline management	Simplified development
Maintenance Status	Deprecated	Active development	V3 receives updates and bug fixes

Architecture Comparison:

Migration Urgency: PP-StructureV2 code in ppstructure/ directory will be completely removed in a future PaddleOCR release. All production systems must migrate to V3.

Sources: ppstructure/README.md1-17 README.md28-76

System Architecture Overview (PP-StructureV2)

PP-StructureV2 implements a modular pipeline architecture where documents are processed through sequential stages: layout detection, specialized recognition (OCR, tables, formulas), and optional post-processing for layout recovery. This architecture is replaced by PP-StructureV3's integrated pipeline 2.3.

High-Level Architecture

Key Code Entities:

StructureSystem ppstructure/predict_system.py44-272: Main orchestrator class
LayoutPredictor: Detects document regions (text, table, figure, title, etc.)
TextSystem: OCR detection + recognition from tools/infer/predict_system.py
TableSystem ppstructure/table/predict_table.py: Three-stage table recognition
TextRecognizer: Formula recognition with LaTeX output

Sources: ppstructure/predict_system.py44-272 ppstructure/utility.py28-156

Core Component: StructureSystem Class

The StructureSystem class is the central orchestrator that coordinates all recognition modules based on layout analysis results.

StructureSystem Initialization and Modules

Initialization Logic:

Mode Selection ppstructure/predict_system.py46-47: Supports structure (document parsing) or kie (key information extraction)
Conditional Module Loading ppstructure/predict_system.py60-89: Modules are only initialized if enabled via args
Shared Detectors ppstructure/predict_system.py76-79: Table system reuses text_system's detector/recognizer if available

Key Configuration Parameters ppstructure/utility.py28-156:

Parameter	Type	Default	Description
`mode`	str	"structure"	Pipeline mode: structure or kie
`layout`	bool	True	Enable layout analysis
`table`	bool	True	Enable table recognition
`ocr`	bool	True	Enable OCR for text regions
`formula`	bool	False	Enable formula recognition
`image_orientation`	bool	False	Enable orientation correction
`recovery`	bool	False	Enable layout recovery to DOCX

Sources: ppstructure/predict_system.py44-96 ppstructure/utility.py28-156

StructureSystem Inference Workflow

Key Optimizations ppstructure/predict_system.py137-148:

Global OCR First: Text is detected/recognized once for entire image, then filtered per region
Intersection Filtering ppstructure/predict_system.py255-271: _filter_text_res method checks bbox overlap using _has_intersection
ROI Extraction ppstructure/predict_system.py152-158: Each region processes only its cropped area

Output Structure:

Sources: ppstructure/predict_system.py98-202 ppstructure/predict_system.py204-271

Table Recognition Pipeline

PP-StructureV2's table recognition uses a three-model pipeline that combines text detection, text recognition, and structure prediction.

Three-Model Architecture

Model Responsibilities:

Model	Input	Output	Purpose
DB (Detection)	Table image	Text box coordinates	Locate individual text lines
CRNN (Recognition)	Cropped text boxes	Text string + confidence	Read text content
SLANet (Structure)	Table image	HTML structure + cell coords	Predict table layout, cell spans

Key Implementation:

TableSystem Class ppstructure/table/predict_table.py: Orchestrates the three models
Coordinate Matching: Aligns detected text boxes with predicted table cells
HTML Generation: Combines structure prediction with recognized text to produce HTML tables

Sources: ppstructure/table/README.md15-28 ppstructure/table/README_ch.md15-31

Table Recognition Performance

Performance on PubTabNet evaluation dataset:

Method	Acc	TEDS	Speed (CPU)
EDD	-	88.30%	-
TableRec-RARE	71.73%	93.88%	779ms
SLANet	76.31%	95.89%	766ms

Metrics:

Acc: Table structure accuracy (exact token match)
TEDS (Tree-Edit-Distance-based Similarity): Evaluates structure + text content
Speed: Single image inference on CPU with MKL

Sources: ppstructure/table/README.md30-42 ppstructure/table/README_ch.md34-48

TEDS Evaluation

TEDS evaluation compares predicted HTML tables against ground truth:

Ground Truth Format ppstructure/table/README.md101-104:

PMC5755158_010_01.png    <html><body><table><thead><tr><td></td>...

Evaluation Command ppstructure/table/README.md113-123:

Output:

teds: 95.89

Sources: ppstructure/table/README.md98-155 ppstructure/table/README_ch.md101-159

Layout Analysis Integration

Layout analysis identifies and classifies document regions before applying specialized recognition.

Layout Detection Models

PP-StructureV2 supports three layout analysis models based on PicoDet architecture:

Model	Dataset	Supported Classes	Use Case
English Layout	PubLayNet	Text, Title, List, Table, Figure	English documents
Chinese Layout	CDLA	Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation	Chinese academic papers
Table Layout	TableBank	Table	Table-only detection

LayoutPredictor Interface ppstructure/layout/predict_layout.py:

Input: Document image
Output: List of regions with {bbox, label, score}
Backend: PaddleDetection PicoDet models

Configuration ppstructure/utility.py53-64:

Sources: ppstructure/layout/README.md24-26 ppstructure/utility.py53-64

Layout Classes and Detection

Model Details ppstructure/layout/README.md160-174:

Backbone: PP-LCNet (Lightweight CNN)
Training: Supports FGD (Focal and Global Knowledge Distillation)
Format: PaddlePaddle inference model (.pdmodel, .pdiparams)

Sources: ppstructure/layout/README.md24-33 ppstructure/layout/README_ch.md23-29

Formula Recognition

Formula recognition extracts mathematical equations and converts them to LaTeX format.

Integration ppstructure/predict_system.py83-89:

Processing ppstructure/predict_system.py171-174:

Input: Equation region ROI from layout detection
Model: LaTeX-OCR or similar recognition model
Output: {"latex": "\\frac{1}{2}"}

Configuration ppstructure/utility.py44-51:

Sources: ppstructure/predict_system.py83-89 ppstructure/predict_system.py171-174 ppstructure/utility.py44-51

Key Information Extraction (KIE)

⚠️ DEPRECATED: KIE functionality in PP-StructureV2 is deprecated. Use PP-ChatOCRv4 instead (2.4).

KIE Mode

When mode="kie", StructureSystem uses SerRePredictor for Semantic Entity Recognition (SER) and Relation Extraction (RE):

Configuration ppstructure/utility.py66-72:

Inference ppstructure/predict_system.py196-200:

Sources: ppstructure/predict_system.py91-94 ppstructure/predict_system.py196-200 ppstructure/utility.py66-72

Layout Recovery

Layout recovery converts document analysis results into editable formats (DOCX, Markdown) while preserving layout.

Recovery Methods Comparison

PP-StructureV2 provides two layout recovery approaches:

Method	Input	Advantages	Disadvantages
Standard PDF Parse	Standard PDF only	Better for non-paper docs, preserves pagination	Some Chinese/English garbled, table formatting issues
Image Format PDF Parse	PDF or images	Better OCR for papers, handles 111 languages	Layout depends on analysis accuracy, spacing/fonts need improvement

Method Selection ppstructure/predict_system.py320-330:

Sources: ppstructure/recovery/README.md22-33 ppstructure/recovery/README_ch.md27-31

OCR-Based Layout Recovery Pipeline

Key Functions:

sorted_layout_boxes ppstructure/recovery/recovery_to_doc.py87-155
- Sorts regions for reading order
- Detects single vs double column layout
- Tags each region with layout field
convert_info_docx ppstructure/recovery/recovery_to_doc.py32-84
- Creates python-docx Document
- Handles column layout changes via sections
- Embeds images, tables, and formatted text
HtmlToDocx ppstructure/recovery/table_process.py142-325
- Converts HTML tables to docx tables
- Handles cell merging (rowspan, colspan)
- Preserves table structure
convert_info_markdown ppstructure/recovery/recovery_to_markdown.py129-187
- Exports to Markdown format
- Merges paragraphs intelligently
- Embeds images and tables

Sources: ppstructure/recovery/recovery_to_doc.py32-155 ppstructure/recovery/table_process.py142-325 ppstructure/recovery/recovery_to_markdown.py129-187

Layout Recovery Configuration

Enable Recovery ppstructure/utility.py114-130:

Dependencies ppstructure/recovery/requirements.txt1-5:

python-docx: DOCX file creation
beautifulsoup4: HTML parsing
fonttools>=4.43.0: Font handling
fire>=0.3.0: Command-line interface

Execution Flow ppstructure/predict_system.py371-395:

Sources: ppstructure/predict_system.py371-395 ppstructure/utility.py114-130 ppstructure/recovery/requirements.txt1-5

Configuration and Usage

Command-Line Arguments

Core Pipeline Arguments ppstructure/utility.py28-156:

Sources: ppstructure/utility.py28-156

Usage Examples

Basic Structure Analysis:

Table Recognition Only ppstructure/table/README.md73-81:

Layout Recovery to DOCX ppstructure/recovery/README_ch.md183-195:

Sources: ppstructure/table/README.md73-81 ppstructure/recovery/README_ch.md183-195

Output Structure

Result Saving ppstructure/predict_system.py274-302:

Directory Structure:

output/
└── structure/
    └── image_name/
        ├── res_0.txt          # JSON results for page 0
        ├── show_0.jpg         # Visualization with bounding boxes
        ├── [20,30,400,500]_0.xlsx   # Table at bbox [20,30,400,500]
        ├── [50,600,350,800]_0.jpg   # Figure at bbox [50,600,350,800]
        └── image_name_ocr.docx      # Layout recovery (if enabled)

Sources: ppstructure/predict_system.py274-302 ppstructure/predict_system.py344-370

Visualization and Debugging

Structure Result Visualization

draw_structure_result ppstructure/utility.py159-240 generates annotated images:

Color Assignment:

Each region type gets a random color
Colors cached in catid2color dict
Labels have white text on blue background

Word-Level Boxes ppstructure/utility.py243-298:

Function: cal_ocr_word_box(rec_str, box, rec_word_info)
Calculates bbox for each character/word
Useful for fine-grained text localization

Sources: ppstructure/utility.py159-298

Time Profiling

Performance Tracking ppstructure/predict_system.py99-109:

Each module accumulates timing information, logged at the end ppstructure/predict_system.py396:

Sources: ppstructure/predict_system.py99-109 ppstructure/predict_system.py396

Migration to PP-StructureV3

Why Migrate? Critical Reasons

⚠️ PP-StructureV2 is unmaintained. All production deployments must migrate to PP-StructureV3 to receive:

Bug Fixes: Security and stability patches only apply to V3
Model Updates: New PP-OCRv5, improved layout detection models
Hardware Support: Latest GPU drivers, XPU/NPU compatibility
Framework Compatibility: PaddlePaddle 3.x+ support
Community Support: V2 issues will not be addressed

Deprecation Timeline ppstructure/README.md4:

"This directory will be removed at an appropriate time in the future, and maintenance for the second generation will be discontinued."

The ppstructure/ directory containing V2 code will be completely deleted in a future PaddleOCR release (expected Q2 2025).

Sources: ppstructure/README.md1-16 README.md28-76

Step-by-Step Migration Guide

1. Document Parsing Migration

Old Code (PP-StructureV2):

New Code (PP-StructureV3):

Key Differences:

No manual model path configuration (auto-download)
Simplified API with clear parameters
Built-in export methods (save_to_markdown, save_to_json)
Chart recognition enabled by default
Document preprocessing included

Sources: ppstructure/predict_system.py44-96 README.md156-173

2. KIE Migration to PP-ChatOCRv4

Old Code (PP-StructureV2 KIE):

New Code (PP-ChatOCRv4):

Why PP-ChatOCRv4 is Better:

15% accuracy improvement over V2 KIE
ERNIE 4.5 integration: Understands context better
Natural language queries: No need to define extraction schema
Automatic structure detection: Handles tables, forms automatically
RAG support: Can query across multiple documents

Sources: ppstructure/README.md8-9 README.md36-38

3. Table Recognition Migration

Old Code:

New Code:

V3 Table Improvements:

Wired/Wireless classification: Better handles borderless tables
RT-DETR structure detection: More accurate cell detection
Cross-page table merging: Handles tables spanning pages

Sources: ppstructure/table/README.md15-48 README.md171-172

Configuration Mapping Reference

Complete mapping of V2 arguments to V3 parameters:

PP-StructureV2 Argument	Type	PP-StructureV3 Parameter	Notes
`--mode='structure'`	str	Default behavior	No parameter needed
`--mode='kie'`	str	Use `PPChatOCRv4Doc` instead	Separate class
`--layout=True`	bool	`use_region_detection=True`	Default True in V3
`--table=True`	bool	`use_table_recognition=True`	Default True in V3
`--ocr=True`	bool	Always enabled	Cannot disable in V3
`--formula=True`	bool	`use_formula_recognition=True`	Uses UniMERNet in V3
`--image_orientation=True`	bool	`use_doc_orientation_classify=True`	Better accuracy in V3
`--recovery=True`	bool	Auto-enabled	Use `save_to_markdown()`
`--det_model_dir`	path	`text_detection_model_name="PP-OCRv5_server_det"`	Model name not path
`--rec_model_dir`	path	`text_recognition_model_name="PP-OCRv5_server_rec"`	Model name not path
`--table_model_dir`	path	`table_model_name="SLANeXt"`	Model name not path
`--layout_model_dir`	path	`layout_model_name="PP-DocLayoutV3"`	Model name not path
`--layout_dict_path`	path	Auto-configured	No manual dict needed
`--use_pdf2docx_api`	bool	Removed	Use OCR-based markdown
`--recovery_to_markdown`	bool	`res.save_to_markdown()`	Built-in method

Model Name Examples in V3:

Text Detection: "PP-OCRv5_server_det", "PP-OCRv5_mobile_det"
Text Recognition: "PP-OCRv5_server_rec", "PP-OCRv5_mobile_rec"
Layout: "PP-DocLayoutV3", "PP-DocLayoutV2"
Table: "SLANeXt", "SLANet"
Formula: "UniMERNet", "PP-FormulaNet"

Sources: ppstructure/utility.py28-156

Code Migration Checklist

Pre-Migration Steps:

✅ Review current V2 code and identify all ppstructure imports
✅ Document custom model paths and configurations
✅ Test V2 output quality on representative documents
✅ Install PaddleOCR 3.x: pip install paddleocr[all]>=3.0

Migration Steps:

✅ Replace from ppstructure.predict_system import StructureSystem with from paddleocr import PPStructureV3
✅ Convert model paths to model names (remove _model_dir args)
✅ Replace args.mode="kie" code with PPChatOCRv4Doc class
✅ Update result parsing (use .print(), .save_to_*() methods)
✅ Enable new features: use_chart_recognition, use_doc_unwarping
✅ Test on same documents, compare quality
✅ Update deployment scripts to use PaddleOCR 3.x wheel
✅ Remove old V2 dependencies (if any custom modifications)

Post-Migration Validation:

✅ Verify output quality matches or exceeds V2
✅ Check performance (V3 may be slightly slower due to preprocessing)
✅ Test edge cases (rotated docs, scanned images)
✅ Update documentation and deployment guides
✅ Delete V2 code (ppstructure/ directory references)

Sources: ppstructure/README.md1-17

Breaking Changes and Known Issues

API Breaking Changes:

Module Location: V2 code in ppstructure/ directory → V3 in paddleocr wheel package
Class Names: StructureSystem → PPStructureV3
Return Format: V2 returns (result_list, time_dict) → V3 returns PipelineResult objects
Model Specification: V2 uses paths → V3 uses names with auto-download
KIE Separate: V2 mode="kie" → V3 requires separate PPChatOCRv4Doc class

Known Migration Issues:

Issue	V2 Behavior	V3 Solution
Custom trained models	Load via `--xxx_model_dir`	Not directly supported; use model conversion
PDF parsing with `pdf2docx`	`--use_pdf2docx_api=True`	Removed; use OCR-based markdown export
Time profiling dict	Returns detailed `time_dict`	Enable `show_log=True` for timing
Region filtering	Manual `_filter_text_res`	Auto-handled by pipeline
Custom preprocessing	Add before `StructureSystem`	Use `use_doc_preprocessor=True`

Compatibility Notes:

V2 models (.pdparams, .pdmodel) are not compatible with V3 pipeline
V2 config files (YAML) require conversion to V3 format
V2's save_structure_res JSON format differs from V3's save_to_json format

References:

PP-StructureV3 Full Documentation: 2.3
PP-ChatOCRv4 Documentation: 2.4
Version Migration Guide: 1.3
Installation Instructions: 1.2

Sources: ppstructure/README.md1-17 ppstructure/predict_system.py44-96

Parallel Processing Support

PP-StructureV2 supports multi-process inference for batch processing:

Multi-Process Launch ppstructure/predict_system.py401-415:

Image Distribution ppstructure/predict_system.py307:

Each process handles every Nth image, where N is total_process_num.

Usage:

Sources: ppstructure/predict_system.py307 ppstructure/predict_system.py401-415

Summary

PP-StructureV2 provides a comprehensive document understanding system with:

Modular Architecture: StructureSystem orchestrates layout, OCR, table, and formula recognition
Three-Model Table Pipeline: DB + CRNN + SLANet achieves 95.89% TEDS
Layout Recovery: Exports to DOCX/Markdown with preserved formatting
Flexible Configuration: Enable/disable modules via command-line arguments
Multi-Process Support: Parallel inference for batch processing

⚠️ Migration Recommended: Users should transition to PP-StructureV3 (2.3) for improved accuracy and features, and PP-ChatOCRv4 (2.4) for KIE tasks.

Key Code Entry Points:

Main orchestrator: ppstructure/predict_system.py44-272
Table recognition: ppstructure/table/predict_table.py
Layout recovery: ppstructure/recovery/recovery_to_doc.py32-155
Command-line: ppstructure/predict_system.py399-415

Sources: ppstructure/predict_system.py44-415 ppstructure/README.md1-17

PP-StructureV2 System (Deprecated)

Relevant source files

Purpose and Scope

Users must migrate to:

PP-StructureV3 (2.3) for document parsing with 20 layout categories, improved table recognition, and chart parsing
PP-ChatOCRv4 (2.4) for key information extraction with LLM integration

Limited layout categories (5-10 vs. 20 in V3)
Lower table recognition accuracy (95.89% vs. improved SLANeXt in V3)
No chart recognition capability
No document preprocessing (orientation, unwarping)
Manual module coordination required

Related Documentation:

PP-StructureV3 (recommended): 2.3
PP-ChatOCRv4 for KIE: 2.4
Model training: 4
Deployment: 5

Sources: ppstructure/README.md1-17 ppstructure/predict_system.py44-272

Key Differences from PP-StructureV3

Before diving into PP-StructureV2 details, understand why migration to PP-StructureV3 is essential:

Aspect	PP-StructureV2	PP-StructureV3	Impact
Layout Categories	5-10 categories (PubLayNet/CDLA)	20 categories (PP-DocLayout)	V3 handles more document types
Table Recognition	3-model pipeline (DB+CRNN+SLANet)	SLANeXt with wired/wireless classification	10-15% accuracy improvement
Chart Support	None	PP-Chart2Table module	V3 extracts data from charts
Document Preprocessing	Manual orientation correction only	Built-in orientation + unwarping	Better handling of scanned docs
Reading Order	Basic top-to-bottom sort	Advanced multi-column detection	More accurate document recovery
API Integration	Standalone scripts	PaddleOCR 3.x wheel package	Easier installation and usage
Module Coordination	Manual `StructureSystem` configuration	Automatic pipeline management	Simplified development
Maintenance Status	Deprecated	Active development	V3 receives updates and bug fixes

Architecture Comparison:

Migration Urgency: PP-StructureV2 code in ppstructure/ directory will be completely removed in a future PaddleOCR release. All production systems must migrate to V3.

Sources: ppstructure/README.md1-17 README.md28-76

System Architecture Overview (PP-StructureV2)

High-Level Architecture

Key Code Entities:

StructureSystem ppstructure/predict_system.py44-272: Main orchestrator class
LayoutPredictor: Detects document regions (text, table, figure, title, etc.)
TextSystem: OCR detection + recognition from tools/infer/predict_system.py
TableSystem ppstructure/table/predict_table.py: Three-stage table recognition
TextRecognizer: Formula recognition with LaTeX output

Sources: ppstructure/predict_system.py44-272 ppstructure/utility.py28-156

Core Component: StructureSystem Class

The StructureSystem class is the central orchestrator that coordinates all recognition modules based on layout analysis results.

StructureSystem Initialization and Modules

Initialization Logic:

Mode Selection ppstructure/predict_system.py46-47: Supports structure (document parsing) or kie (key information extraction)
Conditional Module Loading ppstructure/predict_system.py60-89: Modules are only initialized if enabled via args
Shared Detectors ppstructure/predict_system.py76-79: Table system reuses text_system's detector/recognizer if available

Key Configuration Parameters ppstructure/utility.py28-156:

Parameter	Type	Default	Description
`mode`	str	"structure"	Pipeline mode: structure or kie
`layout`	bool	True	Enable layout analysis
`table`	bool	True	Enable table recognition
`ocr`	bool	True	Enable OCR for text regions
`formula`	bool	False	Enable formula recognition
`image_orientation`	bool	False	Enable orientation correction
`recovery`	bool	False	Enable layout recovery to DOCX

Sources: ppstructure/predict_system.py44-96 ppstructure/utility.py28-156

StructureSystem Inference Workflow

Key Optimizations ppstructure/predict_system.py137-148:

Global OCR First: Text is detected/recognized once for entire image, then filtered per region
Intersection Filtering ppstructure/predict_system.py255-271: _filter_text_res method checks bbox overlap using _has_intersection
ROI Extraction ppstructure/predict_system.py152-158: Each region processes only its cropped area

Output Structure:

Sources: ppstructure/predict_system.py98-202 ppstructure/predict_system.py204-271

Table Recognition Pipeline

PP-StructureV2's table recognition uses a three-model pipeline that combines text detection, text recognition, and structure prediction.

Three-Model Architecture

Model Responsibilities:

Model	Input	Output	Purpose
DB (Detection)	Table image	Text box coordinates	Locate individual text lines
CRNN (Recognition)	Cropped text boxes	Text string + confidence	Read text content
SLANet (Structure)	Table image	HTML structure + cell coords	Predict table layout, cell spans

Key Implementation:

TableSystem Class ppstructure/table/predict_table.py: Orchestrates the three models
Coordinate Matching: Aligns detected text boxes with predicted table cells
HTML Generation: Combines structure prediction with recognized text to produce HTML tables

Sources: ppstructure/table/README.md15-28 ppstructure/table/README_ch.md15-31

Table Recognition Performance

Performance on PubTabNet evaluation dataset:

Method	Acc	TEDS	Speed (CPU)
EDD	-	88.30%	-
TableRec-RARE	71.73%	93.88%	779ms
SLANet	76.31%	95.89%	766ms

Metrics:

Acc: Table structure accuracy (exact token match)
TEDS (Tree-Edit-Distance-based Similarity): Evaluates structure + text content
Speed: Single image inference on CPU with MKL

Sources: ppstructure/table/README.md30-42 ppstructure/table/README_ch.md34-48

TEDS Evaluation

TEDS evaluation compares predicted HTML tables against ground truth:

Ground Truth Format ppstructure/table/README.md101-104:

PMC5755158_010_01.png    <html><body><table><thead><tr><td></td>...

Evaluation Command ppstructure/table/README.md113-123:

Output:

teds: 95.89

Sources: ppstructure/table/README.md98-155 ppstructure/table/README_ch.md101-159

Layout Analysis Integration

Layout analysis identifies and classifies document regions before applying specialized recognition.

Layout Detection Models

PP-StructureV2 supports three layout analysis models based on PicoDet architecture:

Model	Dataset	Supported Classes	Use Case
English Layout	PubLayNet	Text, Title, List, Table, Figure	English documents
Chinese Layout	CDLA	Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation	Chinese academic papers
Table Layout	TableBank	Table	Table-only detection

LayoutPredictor Interface ppstructure/layout/predict_layout.py:

Input: Document image
Output: List of regions with {bbox, label, score}
Backend: PaddleDetection PicoDet models

Configuration ppstructure/utility.py53-64:

Sources: ppstructure/layout/README.md24-26 ppstructure/utility.py53-64

Layout Classes and Detection

Model Details ppstructure/layout/README.md160-174:

Backbone: PP-LCNet (Lightweight CNN)
Training: Supports FGD (Focal and Global Knowledge Distillation)
Format: PaddlePaddle inference model (.pdmodel, .pdiparams)

Sources: ppstructure/layout/README.md24-33 ppstructure/layout/README_ch.md23-29

Formula Recognition

Formula recognition extracts mathematical equations and converts them to LaTeX format.

Integration ppstructure/predict_system.py83-89:

Processing ppstructure/predict_system.py171-174:

Input: Equation region ROI from layout detection
Model: LaTeX-OCR or similar recognition model
Output: {"latex": "\\frac{1}{2}"}

Configuration ppstructure/utility.py44-51:

Sources: ppstructure/predict_system.py83-89 ppstructure/predict_system.py171-174 ppstructure/utility.py44-51

Key Information Extraction (KIE)

⚠️ DEPRECATED: KIE functionality in PP-StructureV2 is deprecated. Use PP-ChatOCRv4 instead (2.4).

KIE Mode

When mode="kie", StructureSystem uses SerRePredictor for Semantic Entity Recognition (SER) and Relation Extraction (RE):

Configuration ppstructure/utility.py66-72:

Inference ppstructure/predict_system.py196-200:

Sources: ppstructure/predict_system.py91-94 ppstructure/predict_system.py196-200 ppstructure/utility.py66-72

Layout Recovery

Layout recovery converts document analysis results into editable formats (DOCX, Markdown) while preserving layout.

Recovery Methods Comparison

PP-StructureV2 provides two layout recovery approaches:

Method	Input	Advantages	Disadvantages
Standard PDF Parse	Standard PDF only	Better for non-paper docs, preserves pagination	Some Chinese/English garbled, table formatting issues
Image Format PDF Parse	PDF or images	Better OCR for papers, handles 111 languages	Layout depends on analysis accuracy, spacing/fonts need improvement

Method Selection ppstructure/predict_system.py320-330:

Sources: ppstructure/recovery/README.md22-33 ppstructure/recovery/README_ch.md27-31

OCR-Based Layout Recovery Pipeline

Key Functions:

sorted_layout_boxes ppstructure/recovery/recovery_to_doc.py87-155
- Sorts regions for reading order
- Detects single vs double column layout
- Tags each region with layout field
convert_info_docx ppstructure/recovery/recovery_to_doc.py32-84
- Creates python-docx Document
- Handles column layout changes via sections
- Embeds images, tables, and formatted text
HtmlToDocx ppstructure/recovery/table_process.py142-325
- Converts HTML tables to docx tables
- Handles cell merging (rowspan, colspan)
- Preserves table structure
convert_info_markdown ppstructure/recovery/recovery_to_markdown.py129-187
- Exports to Markdown format
- Merges paragraphs intelligently
- Embeds images and tables

Sources: ppstructure/recovery/recovery_to_doc.py32-155 ppstructure/recovery/table_process.py142-325 ppstructure/recovery/recovery_to_markdown.py129-187

Layout Recovery Configuration

Enable Recovery ppstructure/utility.py114-130:

Dependencies ppstructure/recovery/requirements.txt1-5:

python-docx: DOCX file creation
beautifulsoup4: HTML parsing
fonttools>=4.43.0: Font handling
fire>=0.3.0: Command-line interface

Execution Flow ppstructure/predict_system.py371-395:

Sources: ppstructure/predict_system.py371-395 ppstructure/utility.py114-130 ppstructure/recovery/requirements.txt1-5

Configuration and Usage

Command-Line Arguments

Core Pipeline Arguments ppstructure/utility.py28-156:

Sources: ppstructure/utility.py28-156

Usage Examples

Basic Structure Analysis:

Table Recognition Only ppstructure/table/README.md73-81:

Layout Recovery to DOCX ppstructure/recovery/README_ch.md183-195:

Sources: ppstructure/table/README.md73-81 ppstructure/recovery/README_ch.md183-195

Output Structure

Result Saving ppstructure/predict_system.py274-302:

Directory Structure:

output/
└── structure/
    └── image_name/
        ├── res_0.txt          # JSON results for page 0
        ├── show_0.jpg         # Visualization with bounding boxes
        ├── [20,30,400,500]_0.xlsx   # Table at bbox [20,30,400,500]
        ├── [50,600,350,800]_0.jpg   # Figure at bbox [50,600,350,800]
        └── image_name_ocr.docx      # Layout recovery (if enabled)

Sources: ppstructure/predict_system.py274-302 ppstructure/predict_system.py344-370

Visualization and Debugging

Structure Result Visualization

draw_structure_result ppstructure/utility.py159-240 generates annotated images:

Color Assignment:

Each region type gets a random color
Colors cached in catid2color dict
Labels have white text on blue background

Word-Level Boxes ppstructure/utility.py243-298:

Function: cal_ocr_word_box(rec_str, box, rec_word_info)
Calculates bbox for each character/word
Useful for fine-grained text localization

Sources: ppstructure/utility.py159-298

Time Profiling

Performance Tracking ppstructure/predict_system.py99-109:

Each module accumulates timing information, logged at the end ppstructure/predict_system.py396:

Sources: ppstructure/predict_system.py99-109 ppstructure/predict_system.py396

Migration to PP-StructureV3

Why Migrate? Critical Reasons

⚠️ PP-StructureV2 is unmaintained. All production deployments must migrate to PP-StructureV3 to receive:

Bug Fixes: Security and stability patches only apply to V3
Model Updates: New PP-OCRv5, improved layout detection models
Hardware Support: Latest GPU drivers, XPU/NPU compatibility
Framework Compatibility: PaddlePaddle 3.x+ support
Community Support: V2 issues will not be addressed

Deprecation Timeline ppstructure/README.md4:

"This directory will be removed at an appropriate time in the future, and maintenance for the second generation will be discontinued."

The ppstructure/ directory containing V2 code will be completely deleted in a future PaddleOCR release (expected Q2 2025).

Sources: ppstructure/README.md1-16 README.md28-76

Step-by-Step Migration Guide

1. Document Parsing Migration

Old Code (PP-StructureV2):

New Code (PP-StructureV3):

Key Differences:

No manual model path configuration (auto-download)
Simplified API with clear parameters
Built-in export methods (save_to_markdown, save_to_json)
Chart recognition enabled by default
Document preprocessing included

Sources: ppstructure/predict_system.py44-96 README.md156-173

2. KIE Migration to PP-ChatOCRv4

Old Code (PP-StructureV2 KIE):

New Code (PP-ChatOCRv4):

Why PP-ChatOCRv4 is Better:

15% accuracy improvement over V2 KIE
ERNIE 4.5 integration: Understands context better
Natural language queries: No need to define extraction schema
Automatic structure detection: Handles tables, forms automatically
RAG support: Can query across multiple documents

Sources: ppstructure/README.md8-9 README.md36-38

3. Table Recognition Migration

Old Code:

New Code:

V3 Table Improvements:

Wired/Wireless classification: Better handles borderless tables
RT-DETR structure detection: More accurate cell detection
Cross-page table merging: Handles tables spanning pages

Sources: ppstructure/table/README.md15-48 README.md171-172

Configuration Mapping Reference

Complete mapping of V2 arguments to V3 parameters:

PP-StructureV2 Argument	Type	PP-StructureV3 Parameter	Notes
`--mode='structure'`	str	Default behavior	No parameter needed
`--mode='kie'`	str	Use `PPChatOCRv4Doc` instead	Separate class
`--layout=True`	bool	`use_region_detection=True`	Default True in V3
`--table=True`	bool	`use_table_recognition=True`	Default True in V3
`--ocr=True`	bool	Always enabled	Cannot disable in V3
`--formula=True`	bool	`use_formula_recognition=True`	Uses UniMERNet in V3
`--image_orientation=True`	bool	`use_doc_orientation_classify=True`	Better accuracy in V3
`--recovery=True`	bool	Auto-enabled	Use `save_to_markdown()`
`--det_model_dir`	path	`text_detection_model_name="PP-OCRv5_server_det"`	Model name not path
`--rec_model_dir`	path	`text_recognition_model_name="PP-OCRv5_server_rec"`	Model name not path
`--table_model_dir`	path	`table_model_name="SLANeXt"`	Model name not path
`--layout_model_dir`	path	`layout_model_name="PP-DocLayoutV3"`	Model name not path
`--layout_dict_path`	path	Auto-configured	No manual dict needed
`--use_pdf2docx_api`	bool	Removed	Use OCR-based markdown
`--recovery_to_markdown`	bool	`res.save_to_markdown()`	Built-in method

Model Name Examples in V3:

Text Detection: "PP-OCRv5_server_det", "PP-OCRv5_mobile_det"
Text Recognition: "PP-OCRv5_server_rec", "PP-OCRv5_mobile_rec"
Layout: "PP-DocLayoutV3", "PP-DocLayoutV2"
Table: "SLANeXt", "SLANet"
Formula: "UniMERNet", "PP-FormulaNet"

Sources: ppstructure/utility.py28-156

Code Migration Checklist

Pre-Migration Steps:

✅ Review current V2 code and identify all ppstructure imports
✅ Document custom model paths and configurations
✅ Test V2 output quality on representative documents
✅ Install PaddleOCR 3.x: pip install paddleocr[all]>=3.0

Migration Steps:

✅ Replace from ppstructure.predict_system import StructureSystem with from paddleocr import PPStructureV3
✅ Convert model paths to model names (remove _model_dir args)
✅ Replace args.mode="kie" code with PPChatOCRv4Doc class
✅ Update result parsing (use .print(), .save_to_*() methods)
✅ Enable new features: use_chart_recognition, use_doc_unwarping
✅ Test on same documents, compare quality
✅ Update deployment scripts to use PaddleOCR 3.x wheel
✅ Remove old V2 dependencies (if any custom modifications)

Post-Migration Validation:

✅ Verify output quality matches or exceeds V2
✅ Check performance (V3 may be slightly slower due to preprocessing)
✅ Test edge cases (rotated docs, scanned images)
✅ Update documentation and deployment guides
✅ Delete V2 code (ppstructure/ directory references)

Sources: ppstructure/README.md1-17

Breaking Changes and Known Issues

API Breaking Changes:

Module Location: V2 code in ppstructure/ directory → V3 in paddleocr wheel package
Class Names: StructureSystem → PPStructureV3
Return Format: V2 returns (result_list, time_dict) → V3 returns PipelineResult objects
Model Specification: V2 uses paths → V3 uses names with auto-download
KIE Separate: V2 mode="kie" → V3 requires separate PPChatOCRv4Doc class

Known Migration Issues:

Issue	V2 Behavior	V3 Solution
Custom trained models	Load via `--xxx_model_dir`	Not directly supported; use model conversion
PDF parsing with `pdf2docx`	`--use_pdf2docx_api=True`	Removed; use OCR-based markdown export
Time profiling dict	Returns detailed `time_dict`	Enable `show_log=True` for timing
Region filtering	Manual `_filter_text_res`	Auto-handled by pipeline
Custom preprocessing	Add before `StructureSystem`	Use `use_doc_preprocessor=True`

Compatibility Notes:

V2 models (.pdparams, .pdmodel) are not compatible with V3 pipeline
V2 config files (YAML) require conversion to V3 format
V2's save_structure_res JSON format differs from V3's save_to_json format

References:

PP-StructureV3 Full Documentation: 2.3
PP-ChatOCRv4 Documentation: 2.4
Version Migration Guide: 1.3
Installation Instructions: 1.2

Sources: ppstructure/README.md1-17 ppstructure/predict_system.py44-96

Parallel Processing Support

PP-StructureV2 supports multi-process inference for batch processing:

Multi-Process Launch ppstructure/predict_system.py401-415:

Image Distribution ppstructure/predict_system.py307:

Each process handles every Nth image, where N is total_process_num.

Usage:

Sources: ppstructure/predict_system.py307 ppstructure/predict_system.py401-415

Summary

PP-StructureV2 provides a comprehensive document understanding system with:

Modular Architecture: StructureSystem orchestrates layout, OCR, table, and formula recognition
Three-Model Table Pipeline: DB + CRNN + SLANet achieves 95.89% TEDS
Layout Recovery: Exports to DOCX/Markdown with preserved formatting
Flexible Configuration: Enable/disable modules via command-line arguments
Multi-Process Support: Parallel inference for batch processing

⚠️ Migration Recommended: Users should transition to PP-StructureV3 (2.3) for improved accuracy and features, and PP-ChatOCRv4 (2.4) for KIE tasks.

Key Code Entry Points:

Main orchestrator: ppstructure/predict_system.py44-272
Table recognition: ppstructure/table/predict_table.py
Layout recovery: ppstructure/recovery/recovery_to_doc.py32-155
Command-line: ppstructure/predict_system.py399-415

Sources: ppstructure/predict_system.py44-415 ppstructure/README.md1-17

PP-StructureV2 System (Deprecated)

Purpose and Scope

Key Differences from PP-StructureV3

System Architecture Overview (PP-StructureV2)

High-Level Architecture

Core Component: StructureSystem Class

StructureSystem Initialization and Modules

StructureSystem Inference Workflow

Table Recognition Pipeline

Three-Model Architecture

Table Recognition Performance

TEDS Evaluation

Layout Analysis Integration

Layout Detection Models

Layout Classes and Detection

Formula Recognition

Key Information Extraction (KIE)

KIE Mode

Layout Recovery

Recovery Methods Comparison

OCR-Based Layout Recovery Pipeline

Layout Recovery Configuration

Configuration and Usage

Command-Line Arguments

Usage Examples

Output Structure

Visualization and Debugging

Structure Result Visualization

Time Profiling

Migration to PP-StructureV3

Why Migrate? Critical Reasons

Step-by-Step Migration Guide

1. Document Parsing Migration

2. KIE Migration to PP-ChatOCRv4

3. Table Recognition Migration

Configuration Mapping Reference

Code Migration Checklist

Breaking Changes and Known Issues

Parallel Processing Support

Summary

On this page

PP-StructureV2 System (Deprecated)

Purpose and Scope

Key Differences from PP-StructureV3

System Architecture Overview (PP-StructureV2)

High-Level Architecture

Core Component: StructureSystem Class

StructureSystem Initialization and Modules

StructureSystem Inference Workflow

Table Recognition Pipeline

Three-Model Architecture

Table Recognition Performance

TEDS Evaluation

Layout Analysis Integration

Layout Detection Models

Layout Classes and Detection

Formula Recognition

Key Information Extraction (KIE)

KIE Mode

Layout Recovery

Recovery Methods Comparison

OCR-Based Layout Recovery Pipeline

Layout Recovery Configuration

Configuration and Usage

Command-Line Arguments

Usage Examples

Output Structure

Visualization and Debugging

Structure Result Visualization

Time Profiling

Migration to PP-StructureV3

Why Migrate? Critical Reasons

Step-by-Step Migration Guide

1. Document Parsing Migration

2. KIE Migration to PP-ChatOCRv4

3. Table Recognition Migration

Configuration Mapping Reference

Code Migration Checklist

Breaking Changes and Known Issues

Parallel Processing Support