Development Tools

Relevant source files

ppstructure/pdf2word/README.md

This page documents the development and testing tools provided by PaddleOCR to support model training, benchmarking, validation, and data annotation. These tools form the infrastructure that enables developers to train custom models, validate model performance across different configurations, and prepare training datasets.

For information about model training itself, see Model Training System. For deployment-related testing, see Deployment and Inference. This page focuses specifically on the tooling infrastructure that supports the development workflow.

Overview of Development Tools

PaddleOCR provides three primary development tool categories:

TIPC (Train/Inference/Python/C++) Testing Framework - Automated testing system for validating training, inference, and deployment across multiple configurations
PDF2Word Application - Document conversion utility built on PP-StructureV2 for converting PDFs to editable Word documents
Document Analysis and Conversion Utilities - Tools for format conversion, layout recovery, and data preparation

TIPC Testing System Architecture with Code Entities

The diagram maps shell scripts to their key functions and shows how configuration files flow through the testing pipeline. Function names like func_parser_value(), func_inference(), and add_profiler_step() are the actual implementations in the codebase.

Sources: test_tipc/prepare.sh1-202 test_tipc/test_train_inference_python.sh1-344 test_tipc/benchmark_train.sh1-295 test_tipc/common_func.sh ppocr/utils/profiler.py1-131 </old_str>

<old_str>

Profiler Integration

PaddleOCR integrates with Paddle's profiler API through the ppocr/utils/profiler.py module.

ProfilerOptions Class (ppocr/utils/profiler.py27-85)

The ProfilerOptions class parses semicolon-separated key-value configuration strings:

Configuration options stored in self._options dict:

Option	Type	Default	Description
`batch_range`	list[int]	`[10, 20]`	Profiling range (start, end) iterations
`state`	str	`"All"`	Profiling scope: CPU, GPU, or All
`sorted_key`	str	`"total"`	Sort metric for summary: calls, total, max, min, ave
`tracer_option`	str	`"Default"`	Detail level: Default, OpDetail, AllOpDetail
`profile_path`	str	`"/tmp/profile"`	Output path for serialized profile data
`timer_only`	bool	`True`	If True, only throughput; if False, detailed operator statistics
`exit_on_finished`	bool	`True`	Exit after profiling completes

The _parse_from_string() method (ppocr/utils/profiler.py62-79) splits on semicolons and equals signs to populate the options dictionary.

add_profiler_step() Function (ppocr/utils/profiler.py87-131)

Global function called in training loops to enable profiling:

Implementation details:

Global state: Uses module-level _prof, _profiler_step_id, _profiler_options variables
First call (ppocr/utils/profiler.py109-119):
- Creates profiler.Profiler(scheduler=(...), on_trace_ready=profiler.export_chrome_tracing(...), timer_only=...)
- Calls _prof.start()
Subsequent calls (ppocr/utils/profiler.py121): Calls _prof.step() to advance profiler
End of range (ppocr/utils/profiler.py123-128):
- Calls _prof.stop()
- Calls _prof.summary(op_detail=True, thread_sep=False, time_unit="ms")
- Exports Chrome tracing to ./profiler_log/ directory
- Exits if exit_on_finished=True

Sources: ppocr/utils/profiler.py1-131 </old_str>

<new_str>

Architecture and Components

TIPC Testing Workflow with Code-Level Details

This diagram shows the precise command-line invocations and parameter names used by TIPC scripts. Each box contains actual shell variables, command flags, and file paths used in the implementation.

Sources: test_tipc/prepare.sh10-277 test_tipc/test_train_inference_python.sh14-343 test_tipc/benchmark_train.sh67-294

TIPC Testing Framework

TIPC (Train/Inference/Python/C++) is an automated testing framework that validates PaddleOCR models across multiple dimensions: training modes, inference configurations, hardware backends, and precision levels. It provides reproducible testing workflows and performance benchmarking capabilities.

Architecture and Components

TIPC Testing Workflow and Configuration Processing

This diagram shows how TIPC processes configuration files through three main scripts: prepare.sh for environment setup, test_train_inference_python.sh for comprehensive testing, and benchmark_train.sh for performance profiling. Each component extracts parameters from the .txt configuration files and executes different testing phases.

Sources: test_tipc/prepare.sh1-50 test_tipc/test_train_inference_python.sh1-100 test_tipc/benchmark_train.sh1-150

Configuration File Format

TIPC uses structured text configuration files (.txt) that define the complete testing matrix. The file is divided into sections separated by ## markers:

Section	Purpose	Key Parameters
`train_params`	Training configuration	`model_name`, `python`, `gpu_list`, `epoch_num`, `batch_size_per_card`
`eval_params`	Evaluation settings	`eval_py`, evaluation script parameters
`infer_params`	Inference testing matrix	`use_gpu_list`, `use_mkldnn_list`, `cpu_threads_list`, `precision_list`
`train_benchmark_params`	Benchmarking configuration	`batch_size`, `fp_items`, `epoch`, `profile_option`
`to_static_train_benchmark_params`	Dynamic-to-static training	Special trainer configuration

Example configuration structure from test_tipc/configs/layoutxlm_ser/train_infer_python.txt1-60:

===========================train_params===========================
model_name:layoutxlm_ser
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:fp32
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=17
...
trainer:norm_train
norm_train:tools/train.py -c config.yml -o Global.print_batch_step=1

The configuration uses special syntax:

key:value pairs for simple parameters
key:mode1=value1|mode2=value2 for mode-dependent values
func_parser_value() and func_parser_key() functions extract these values in shell scripts

Sources: test_tipc/configs/layoutxlm_ser/train_infer_python.txt1-60

Testing Modes

TIPC supports multiple testing modes defined by the MODE variable:

benchmark_train

Purpose: Performance profiling and throughput measurement
Behavior: Downloads benchmark datasets (e.g., icdar2015_benchmark, ic15_data_benchmark), runs training with profiler enabled, parses logs for IPS (images per second) metrics
Configuration: Uses expanded datasets (e.g., 10x duplication via test_tipc/prepare.sh155-157)
Output: Performance logs in benchmark_log/train_log/, speed metrics in benchmark_log/index/

lite_train_lite_infer

Purpose: Quick validation with minimal data and short training
Behavior: Downloads lite datasets (icdar2015_lite, ic15_data with lite ground truth files), trains for 1-2 epochs, tests inference
Dataset size: ~2,000 samples for detection, ~160,000 for recognition
Use case: CI/CD pipelines, rapid iteration

lite_train_whole_infer

Purpose: Quick training followed by comprehensive inference testing
Behavior: Lite training with full inference matrix (all precision/device combinations)
Use case: Validating inference optimizations without full training

whole_train_whole_infer

Purpose: Complete training and inference validation
Behavior: Full datasets, complete epoch counts, all inference configurations
Use case: Release validation, model acceptance testing

whole_infer

Purpose: Inference-only testing with pre-trained models
Behavior: Downloads inference models, skips training, runs full inference matrix
Use case: Deployment validation, inference optimization testing

Sources: test_tipc/prepare.sh6-11 test_tipc/test_train_inference_python.sh5-6

Script Functions and Workflow

prepare.sh - Environment Setup

Key functions and workflow:

Mode parsing (test_tipc/prepare.sh10-22):
Conditional dataset preparation - Uses model name pattern matching to download appropriate resources:
- Detection models: ICDAR2015 dataset from paddleocr.bj.bcebos.com/dataset/
- Recognition models: IC15 data with ground truth files
- Table models: PubTabNet dataset
- KIE models: XFUND dataset with special requirements (test_tipc/prepare.sh169-181)
Pretrained model download - Model-specific pretrained weights:

test_train_inference_python.sh - Testing Execution

Main execution flow (test_tipc/test_train_inference_python.sh216-343):

Training phase:
- Parse GPU configuration: single GPU, multi-GPU, or multi-machine
- Set up distributed launch with paddle.distributed.launch
- Configure AMP (Automatic Mixed Precision) if specified
- Execute training command: ${python} ${run_train} ${set_use_gpu} ${set_save_model} ...
- Log output to ${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}/train.log
Evaluation phase (test_tipc/test_train_inference_python.sh308-316):
- Run evaluation script if eval_py != "null"
- Use trained checkpoint from training phase
- Generate evaluation log
Export phase (test_tipc/test_train_inference_python.sh318-326):
- Execute tools/export_model.py to convert training checkpoint to inference model
- Save to ${save_log}/ directory
Inference phase - Call func_inference() function (test_tipc/test_train_inference_python.sh99-179):
- CPU inference loop: Iterate over use_mkldnn × cpu_threads × batch_size × precision
  - Skip invalid combinations (e.g., fp16 without MKL-DNN)
  - Build command: ${python} ${inference_py} use_gpu=False --enable_mkldnn=${use_mkldnn} --cpu_threads=${threads} ...
- GPU inference loop: Iterate over use_trt × precision × batch_size
  - Skip invalid combinations (e.g., int8 without TensorRT when not quantized)
  - Build command with TensorRT and precision flags
Status checking - After each phase, call status_check() to verify exit status and log results to results_python.log

benchmark_train.sh - Performance Profiling

Specialized workflow for benchmarking (test_tipc/benchmark_train.sh1-295):

Configuration modification (test_tipc/benchmark_train.sh145-148):
Dynamic epoch calculation (test_tipc/benchmark_train.sh18-28) - Adjusts epoch count based on device number (e.g., 4 GPUs = 4× epochs)
Profiling execution (test_tipc/benchmark_train.sh204-220):
- Set profile_option parameter: batch_range=[10,20];state=GPU;timer_only=True
- Run with 5-minute timeout: timeout 5m bash test_tipc/test_train_inference_python.sh ...
- Generate profiling logs in profiling_log/
Speed measurement (test_tipc/benchmark_train.sh221-236) - Run without profiler for accurate throughput:
Log parsing (test_tipc/benchmark_train.sh238-252) - Call external analysis script:

Sources: test_tipc/prepare.sh1-202 test_tipc/test_train_inference_python.sh1-344 test_tipc/benchmark_train.sh1-295

PDF2Word Document Conversion Tool

PDF2Word is a Windows application developed by PaddleOCR community member whjdark that converts PDF documents to editable Word format using PP-StructureV2 layout analysis and recovery models.

PDF2Word Processing Pipeline

The diagram shows how PDF2Word processes documents through PP-StructureV2's layout analysis, OCR, and table recognition modules before reconstructing formatted Word output.

Application Variants

Three distribution versions (ppstructure/pdf2word/README.md8-9):

Lite Version: Downloads dependencies and models during installation via internet; requires stable connection; longer installation time
Serve Version: Pre-packages dependencies; shorter installation time; larger download size
Full Version (v0.2+): Includes PDF parsing functionality; bundles all dependencies and models to minimize installation failures

Usage Modes

Windows Application (ppstructure/pdf2word/README.md8-12):

Download and run pdf2word.exe
Select document language (Chinese/English) - this determines which PP-StructureV2 model variant to use
Click convert to process PDF to Word
Click "Show Results" button to open output folder

Python Script (ppstructure/pdf2word/README.md24-31):

Launches GUI interface from Python environment.

PaddleOCR whl Package - For Linux/Mac users or those with Python environments, directly use the paddleocr whl package which includes PDF2Word functionality as documented in the PP-StructureV3 usage guide (ppstructure/pdf2word/README.md33-35).

Technical Implementation

PDF2Word is built on PP-StructureV2 (see PP-StructureV2 System for details):

Processing Pipeline:

Layout Detection: Identifies text blocks, tables, images, formulas
OCR Extraction: Runs text detection and recognition on each text region
Table Recognition: Uses SLANet model to extract table structure and cells
Layout Recovery: Reconstructs document formatting with proper spacing, fonts, and structure
Word Generation: Outputs formatted .docx file

Packaging: Uses QPT (Quick Python Tools) framework to package Python application as Windows executable.

System Requirements and Notes

Supported OS: Official Windows 10, Windows 11 (genuine versions only)
Initial startup time: 1-2 minutes on first launch depending on device for model loading
Compatibility: Office and WPS may render results differently; Office recommended for best results
Installation issues: Pirated Windows systems may encounter errors; use paddleocr whl package directly instead (ppstructure/pdf2word/README.md17-22)
Model downloads: Lite version downloads PP-StructureV2 models from paddleocr.bj.bcebos.com on first run

Sources: ppstructure/pdf2word/README.md1-50

Example Usage

Running benchmark tests:

Running lite training and inference tests:

Inference-only testing:

Sources: test_tipc/docs/benchmark_train.md10-28

Performance Metrics and Output

TIPC generates structured performance reports in JSON format. Example benchmark output from test_tipc/docs/benchmark_train.md38-40:

Log Directory Structure (test_tipc/docs/benchmark_train.md42-53):

benchmark_log/
├── index/                          # Speed metric files
│   ├── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C1_speed
│   └── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C4_speed
├── profiling_log/                  # Profiler output
│   └── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C1_profiling
└── train_log/                      # Raw training logs
    ├── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C1_log
    └── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C4_log

Performance Benchmarks

Sample performance data for various models on single NVIDIA V100 16G GPU (test_tipc/docs/benchmark_train.md59-76):

Model	Config File	Large Dataset FP32 FPS	Small Dataset FP32 FPS	Large Dataset FP16 FPS	Small Dataset FP16 FPS
ch_ppocr_mobile_v2.0_det	config	53.836	53.343 / 53.914 / 52.785	45.574	45.57 / 46.292 / 46.213
ch_ppocr_mobile_v2.0_rec	config	2083.311	2043.194 / 2066.372 / 2093.317	2153.261	2167.561 / 2165.726 / 2155.614
ch_PP-OCRv2_det	config	13.87	13.386 / 13.529 / 13.428	17.847	17.746 / 17.908 / 17.96
det_mv3_db_v2.0	config	61.802	62.078 / 61.802 / 62.008	82.947	84.294 / 84.457 / 84.005

Sources: test_tipc/docs/benchmark_train.md1-77

PDF2Word Document Conversion Tool

PDF2Word is a Windows application developed by PaddleOCR community member whjdark that converts PDF documents to editable Word format using PP-StructureV2 layout analysis and recovery models.

Application Variants

Three versions are available (ppstructure/pdf2word/README.md8-9):

Lite Version: Downloads dependencies and models during installation; requires stable internet connection; longer installation time
Serve Version: Pre-packages dependencies; shorter installation time; larger download size
Full Version (v0.2+): Includes PDF parsing functionality; bundles all dependencies and models to minimize installation failures

Usage Modes

Windows Application (ppstructure/pdf2word/README.md8-12):

Download and run pdf2word.exe
Select document language (Chinese/English) for optimal OCR model selection
Convert PDF to Word
Click "Show Results" to open output folder

Python Script (ppstructure/pdf2word/README.md24-31):

PaddleOCR whl Package - For Linux/Mac users or those with Python environments, directly use the paddleocr package which includes PDF2Word functionality as documented in the PP-StructureV3 usage guide (ppstructure/pdf2word/README.md33-35).

Technical Implementation

PDF2Word is built on PP-StructureV2 (legacy version, see Legacy Systems Reference for more details):

Layout detection identifies document structure
OCR extracts text from each region
Table recognition preserves table structure
Layout recovery reconstructs document formatting
Output generation creates formatted Word document

The application is packaged using QPT (Quick Python Tools) for cross-platform distribution.

System Requirements and Notes

Supported OS: Official Windows 10, Windows 11 (genuine versions only)
Initial startup time: 1-2 minutes on first launch depending on device
Compatibility: Office and WPS may render results differently; Office recommended
Installation issues: If errors occur on pirated Windows, use paddleocr whl package directly (ppstructure/pdf2word/README.md17-22)

Sources: ppstructure/pdf2word/README.md1-50

PPOCRLabel Annotation Tool

PPOCRLabel is a GUI-based data annotation tool specifically designed for OCR tasks. It provides visual annotation capabilities with keyboard shortcuts and auto-labeling features powered by pre-trained PaddleOCR models.

For detailed documentation on PPOCRLabel functionality, see PPOCRLabel Annotation Tool.

Key features:

Interactive bounding box annotation for text detection
Text transcription for recognition data
Support for multiple annotation formats
Auto-labeling using PP-OCRv3/v4 models to accelerate annotation
Keyboard shortcuts for efficient workflow
Direct integration with training pipeline via dataset format compatibility

Integration with Training: PPOCRLabel generates annotation files compatible with PaddleOCR's SimpleDataSet loader, allowing seamless transition from annotation to training (test_tipc/configs/det_r50_dcn_fce_ctw_v2_0/det_r50_vd_dcn_fce_ctw.yml64-68).

Sources: General knowledge from system overview

Document Analysis and Conversion Utilities

Beyond the primary tools, PaddleOCR provides additional utilities for document processing:

Layout Recovery Utilities

Tools for converting structured OCR results back into formatted documents:

Markdown generation from layout analysis results
Cross-page table merging for multi-page tables
Format conversion helpers for various output formats

YAML Configuration Management

Training and inference configurations use YAML files with specific structure (test_tipc/configs/layoutxlm_ser/ser_layoutxlm_xfund_zh.yml1-15):

These YAML files define:

Global training parameters (epochs, GPU usage, save directories)
Model architecture components (Transform, Backbone, Neck, Head)
Loss functions and metrics
Optimizer configuration with learning rate schedules
Dataset paths and data augmentation pipelines

For more details on model configuration structure, see Model Configuration Files.

Sources: test_tipc/configs/layoutxlm_ser/ser_layoutxlm_xfund_zh.yml1-123 test_tipc/configs/det_r50_dcn_fce_ctw_v2_0/det_r50_vd_dcn_fce_ctw.yml1-140

Summary

PaddleOCR's development tools provide comprehensive infrastructure for the complete development lifecycle:

TIPC Framework validates models across 5+ testing modes, multiple hardware configurations, and precision levels with automated benchmarking
PDF2Word offers accessible document conversion for Windows users without environment setup
PPOCRLabel streamlines dataset preparation with GUI annotation and auto-labeling
Configuration System uses structured text and YAML files to define testing matrices and model parameters

These tools integrate seamlessly with the training system (see Model Training System) and deployment infrastructure (see Deployment and Inference) to form a complete development workflow from data annotation through training validation to deployment testing.

Development Tools

Relevant source files

ppstructure/pdf2word/README.md

Overview of Development Tools

PaddleOCR provides three primary development tool categories:

TIPC (Train/Inference/Python/C++) Testing Framework - Automated testing system for validating training, inference, and deployment across multiple configurations
PDF2Word Application - Document conversion utility built on PP-StructureV2 for converting PDFs to editable Word documents
Document Analysis and Conversion Utilities - Tools for format conversion, layout recovery, and data preparation

TIPC Testing System Architecture with Code Entities

Sources: test_tipc/prepare.sh1-202 test_tipc/test_train_inference_python.sh1-344 test_tipc/benchmark_train.sh1-295 test_tipc/common_func.sh ppocr/utils/profiler.py1-131 </old_str>

<old_str>

Profiler Integration

PaddleOCR integrates with Paddle's profiler API through the ppocr/utils/profiler.py module.

ProfilerOptions Class (ppocr/utils/profiler.py27-85)

The ProfilerOptions class parses semicolon-separated key-value configuration strings:

Configuration options stored in self._options dict:

Option	Type	Default	Description
`batch_range`	list[int]	`[10, 20]`	Profiling range (start, end) iterations
`state`	str	`"All"`	Profiling scope: CPU, GPU, or All
`sorted_key`	str	`"total"`	Sort metric for summary: calls, total, max, min, ave
`tracer_option`	str	`"Default"`	Detail level: Default, OpDetail, AllOpDetail
`profile_path`	str	`"/tmp/profile"`	Output path for serialized profile data
`timer_only`	bool	`True`	If True, only throughput; if False, detailed operator statistics
`exit_on_finished`	bool	`True`	Exit after profiling completes

The _parse_from_string() method (ppocr/utils/profiler.py62-79) splits on semicolons and equals signs to populate the options dictionary.

add_profiler_step() Function (ppocr/utils/profiler.py87-131)

Global function called in training loops to enable profiling:

Implementation details:

Global state: Uses module-level _prof, _profiler_step_id, _profiler_options variables
First call (ppocr/utils/profiler.py109-119):
- Creates profiler.Profiler(scheduler=(...), on_trace_ready=profiler.export_chrome_tracing(...), timer_only=...)
- Calls _prof.start()
Subsequent calls (ppocr/utils/profiler.py121): Calls _prof.step() to advance profiler
End of range (ppocr/utils/profiler.py123-128):
- Calls _prof.stop()
- Calls _prof.summary(op_detail=True, thread_sep=False, time_unit="ms")
- Exports Chrome tracing to ./profiler_log/ directory
- Exits if exit_on_finished=True

Sources: ppocr/utils/profiler.py1-131 </old_str>

<new_str>

Architecture and Components

TIPC Testing Workflow with Code-Level Details

This diagram shows the precise command-line invocations and parameter names used by TIPC scripts. Each box contains actual shell variables, command flags, and file paths used in the implementation.

Sources: test_tipc/prepare.sh10-277 test_tipc/test_train_inference_python.sh14-343 test_tipc/benchmark_train.sh67-294

TIPC Testing Framework

Architecture and Components

TIPC Testing Workflow and Configuration Processing

Sources: test_tipc/prepare.sh1-50 test_tipc/test_train_inference_python.sh1-100 test_tipc/benchmark_train.sh1-150

Configuration File Format

TIPC uses structured text configuration files (.txt) that define the complete testing matrix. The file is divided into sections separated by ## markers:

Section	Purpose	Key Parameters
`train_params`	Training configuration	`model_name`, `python`, `gpu_list`, `epoch_num`, `batch_size_per_card`
`eval_params`	Evaluation settings	`eval_py`, evaluation script parameters
`infer_params`	Inference testing matrix	`use_gpu_list`, `use_mkldnn_list`, `cpu_threads_list`, `precision_list`
`train_benchmark_params`	Benchmarking configuration	`batch_size`, `fp_items`, `epoch`, `profile_option`
`to_static_train_benchmark_params`	Dynamic-to-static training	Special trainer configuration

Example configuration structure from test_tipc/configs/layoutxlm_ser/train_infer_python.txt1-60:

===========================train_params===========================
model_name:layoutxlm_ser
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:fp32
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=17
...
trainer:norm_train
norm_train:tools/train.py -c config.yml -o Global.print_batch_step=1

The configuration uses special syntax:

key:value pairs for simple parameters
key:mode1=value1|mode2=value2 for mode-dependent values
func_parser_value() and func_parser_key() functions extract these values in shell scripts

Sources: test_tipc/configs/layoutxlm_ser/train_infer_python.txt1-60

Testing Modes

TIPC supports multiple testing modes defined by the MODE variable:

benchmark_train

Purpose: Performance profiling and throughput measurement
Behavior: Downloads benchmark datasets (e.g., icdar2015_benchmark, ic15_data_benchmark), runs training with profiler enabled, parses logs for IPS (images per second) metrics
Configuration: Uses expanded datasets (e.g., 10x duplication via test_tipc/prepare.sh155-157)
Output: Performance logs in benchmark_log/train_log/, speed metrics in benchmark_log/index/

lite_train_lite_infer

Purpose: Quick validation with minimal data and short training
Behavior: Downloads lite datasets (icdar2015_lite, ic15_data with lite ground truth files), trains for 1-2 epochs, tests inference
Dataset size: ~2,000 samples for detection, ~160,000 for recognition
Use case: CI/CD pipelines, rapid iteration

lite_train_whole_infer

Purpose: Quick training followed by comprehensive inference testing
Behavior: Lite training with full inference matrix (all precision/device combinations)
Use case: Validating inference optimizations without full training

whole_train_whole_infer

Purpose: Complete training and inference validation
Behavior: Full datasets, complete epoch counts, all inference configurations
Use case: Release validation, model acceptance testing

whole_infer

Purpose: Inference-only testing with pre-trained models
Behavior: Downloads inference models, skips training, runs full inference matrix
Use case: Deployment validation, inference optimization testing

Sources: test_tipc/prepare.sh6-11 test_tipc/test_train_inference_python.sh5-6

Script Functions and Workflow

prepare.sh - Environment Setup

Key functions and workflow:

Mode parsing (test_tipc/prepare.sh10-22):
Conditional dataset preparation - Uses model name pattern matching to download appropriate resources:
- Detection models: ICDAR2015 dataset from paddleocr.bj.bcebos.com/dataset/
- Recognition models: IC15 data with ground truth files
- Table models: PubTabNet dataset
- KIE models: XFUND dataset with special requirements (test_tipc/prepare.sh169-181)
Pretrained model download - Model-specific pretrained weights:

test_train_inference_python.sh - Testing Execution

Main execution flow (test_tipc/test_train_inference_python.sh216-343):

Training phase:
- Parse GPU configuration: single GPU, multi-GPU, or multi-machine
- Set up distributed launch with paddle.distributed.launch
- Configure AMP (Automatic Mixed Precision) if specified
- Execute training command: ${python} ${run_train} ${set_use_gpu} ${set_save_model} ...
- Log output to ${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}/train.log
Evaluation phase (test_tipc/test_train_inference_python.sh308-316):
- Run evaluation script if eval_py != "null"
- Use trained checkpoint from training phase
- Generate evaluation log
Export phase (test_tipc/test_train_inference_python.sh318-326):
- Execute tools/export_model.py to convert training checkpoint to inference model
- Save to ${save_log}/ directory
Inference phase - Call func_inference() function (test_tipc/test_train_inference_python.sh99-179):
- CPU inference loop: Iterate over use_mkldnn × cpu_threads × batch_size × precision
  - Skip invalid combinations (e.g., fp16 without MKL-DNN)
  - Build command: ${python} ${inference_py} use_gpu=False --enable_mkldnn=${use_mkldnn} --cpu_threads=${threads} ...
- GPU inference loop: Iterate over use_trt × precision × batch_size
  - Skip invalid combinations (e.g., int8 without TensorRT when not quantized)
  - Build command with TensorRT and precision flags
Status checking - After each phase, call status_check() to verify exit status and log results to results_python.log

benchmark_train.sh - Performance Profiling

Specialized workflow for benchmarking (test_tipc/benchmark_train.sh1-295):

Configuration modification (test_tipc/benchmark_train.sh145-148):
Dynamic epoch calculation (test_tipc/benchmark_train.sh18-28) - Adjusts epoch count based on device number (e.g., 4 GPUs = 4× epochs)
Profiling execution (test_tipc/benchmark_train.sh204-220):
- Set profile_option parameter: batch_range=[10,20];state=GPU;timer_only=True
- Run with 5-minute timeout: timeout 5m bash test_tipc/test_train_inference_python.sh ...
- Generate profiling logs in profiling_log/
Speed measurement (test_tipc/benchmark_train.sh221-236) - Run without profiler for accurate throughput:
Log parsing (test_tipc/benchmark_train.sh238-252) - Call external analysis script:

Sources: test_tipc/prepare.sh1-202 test_tipc/test_train_inference_python.sh1-344 test_tipc/benchmark_train.sh1-295

PDF2Word Document Conversion Tool

PDF2Word is a Windows application developed by PaddleOCR community member whjdark that converts PDF documents to editable Word format using PP-StructureV2 layout analysis and recovery models.

PDF2Word Processing Pipeline

The diagram shows how PDF2Word processes documents through PP-StructureV2's layout analysis, OCR, and table recognition modules before reconstructing formatted Word output.

Application Variants

Three distribution versions (ppstructure/pdf2word/README.md8-9):

Lite Version: Downloads dependencies and models during installation via internet; requires stable connection; longer installation time
Serve Version: Pre-packages dependencies; shorter installation time; larger download size
Full Version (v0.2+): Includes PDF parsing functionality; bundles all dependencies and models to minimize installation failures

Usage Modes

Windows Application (ppstructure/pdf2word/README.md8-12):

Download and run pdf2word.exe
Select document language (Chinese/English) - this determines which PP-StructureV2 model variant to use
Click convert to process PDF to Word
Click "Show Results" button to open output folder

Python Script (ppstructure/pdf2word/README.md24-31):

Launches GUI interface from Python environment.

Technical Implementation

PDF2Word is built on PP-StructureV2 (see PP-StructureV2 System for details):

Processing Pipeline:

Layout Detection: Identifies text blocks, tables, images, formulas
OCR Extraction: Runs text detection and recognition on each text region
Table Recognition: Uses SLANet model to extract table structure and cells
Layout Recovery: Reconstructs document formatting with proper spacing, fonts, and structure
Word Generation: Outputs formatted .docx file

Packaging: Uses QPT (Quick Python Tools) framework to package Python application as Windows executable.

System Requirements and Notes

Supported OS: Official Windows 10, Windows 11 (genuine versions only)
Initial startup time: 1-2 minutes on first launch depending on device for model loading
Compatibility: Office and WPS may render results differently; Office recommended for best results
Installation issues: Pirated Windows systems may encounter errors; use paddleocr whl package directly instead (ppstructure/pdf2word/README.md17-22)
Model downloads: Lite version downloads PP-StructureV2 models from paddleocr.bj.bcebos.com on first run

Sources: ppstructure/pdf2word/README.md1-50

Example Usage

Running benchmark tests:

Running lite training and inference tests:

Inference-only testing:

Sources: test_tipc/docs/benchmark_train.md10-28

Performance Metrics and Output

TIPC generates structured performance reports in JSON format. Example benchmark output from test_tipc/docs/benchmark_train.md38-40:

Log Directory Structure (test_tipc/docs/benchmark_train.md42-53):

benchmark_log/
├── index/                          # Speed metric files
│   ├── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C1_speed
│   └── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C4_speed
├── profiling_log/                  # Profiler output
│   └── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C1_profiling
└── train_log/                      # Raw training logs
    ├── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C1_log
    └── PaddleOCR_det_mv3_db_v2_0_bs8_fp32_SingleP_DP_N1C4_log

Performance Benchmarks

Sample performance data for various models on single NVIDIA V100 16G GPU (test_tipc/docs/benchmark_train.md59-76):

Model	Config File	Large Dataset FP32 FPS	Small Dataset FP32 FPS	Large Dataset FP16 FPS	Small Dataset FP16 FPS
ch_ppocr_mobile_v2.0_det	config	53.836	53.343 / 53.914 / 52.785	45.574	45.57 / 46.292 / 46.213
ch_ppocr_mobile_v2.0_rec	config	2083.311	2043.194 / 2066.372 / 2093.317	2153.261	2167.561 / 2165.726 / 2155.614
ch_PP-OCRv2_det	config	13.87	13.386 / 13.529 / 13.428	17.847	17.746 / 17.908 / 17.96
det_mv3_db_v2.0	config	61.802	62.078 / 61.802 / 62.008	82.947	84.294 / 84.457 / 84.005

Sources: test_tipc/docs/benchmark_train.md1-77

PDF2Word Document Conversion Tool

PDF2Word is a Windows application developed by PaddleOCR community member whjdark that converts PDF documents to editable Word format using PP-StructureV2 layout analysis and recovery models.

Application Variants

Three versions are available (ppstructure/pdf2word/README.md8-9):

Lite Version: Downloads dependencies and models during installation; requires stable internet connection; longer installation time
Serve Version: Pre-packages dependencies; shorter installation time; larger download size
Full Version (v0.2+): Includes PDF parsing functionality; bundles all dependencies and models to minimize installation failures

Usage Modes

Windows Application (ppstructure/pdf2word/README.md8-12):

Download and run pdf2word.exe
Select document language (Chinese/English) for optimal OCR model selection
Convert PDF to Word
Click "Show Results" to open output folder

Python Script (ppstructure/pdf2word/README.md24-31):

Technical Implementation

PDF2Word is built on PP-StructureV2 (legacy version, see Legacy Systems Reference for more details):

Layout detection identifies document structure
OCR extracts text from each region
Table recognition preserves table structure
Layout recovery reconstructs document formatting
Output generation creates formatted Word document

The application is packaged using QPT (Quick Python Tools) for cross-platform distribution.

System Requirements and Notes

Supported OS: Official Windows 10, Windows 11 (genuine versions only)
Initial startup time: 1-2 minutes on first launch depending on device
Compatibility: Office and WPS may render results differently; Office recommended
Installation issues: If errors occur on pirated Windows, use paddleocr whl package directly (ppstructure/pdf2word/README.md17-22)

Sources: ppstructure/pdf2word/README.md1-50

PPOCRLabel Annotation Tool

For detailed documentation on PPOCRLabel functionality, see PPOCRLabel Annotation Tool.

Key features:

Interactive bounding box annotation for text detection
Text transcription for recognition data
Support for multiple annotation formats
Auto-labeling using PP-OCRv3/v4 models to accelerate annotation
Keyboard shortcuts for efficient workflow
Direct integration with training pipeline via dataset format compatibility

Sources: General knowledge from system overview

Document Analysis and Conversion Utilities

Beyond the primary tools, PaddleOCR provides additional utilities for document processing:

Layout Recovery Utilities

Tools for converting structured OCR results back into formatted documents:

Markdown generation from layout analysis results
Cross-page table merging for multi-page tables
Format conversion helpers for various output formats

YAML Configuration Management

Training and inference configurations use YAML files with specific structure (test_tipc/configs/layoutxlm_ser/ser_layoutxlm_xfund_zh.yml1-15):

These YAML files define:

Global training parameters (epochs, GPU usage, save directories)
Model architecture components (Transform, Backbone, Neck, Head)
Loss functions and metrics
Optimizer configuration with learning rate schedules
Dataset paths and data augmentation pipelines

For more details on model configuration structure, see Model Configuration Files.

Sources: test_tipc/configs/layoutxlm_ser/ser_layoutxlm_xfund_zh.yml1-123 test_tipc/configs/det_r50_dcn_fce_ctw_v2_0/det_r50_vd_dcn_fce_ctw.yml1-140

Summary

PaddleOCR's development tools provide comprehensive infrastructure for the complete development lifecycle:

TIPC Framework validates models across 5+ testing modes, multiple hardware configurations, and precision levels with automated benchmarking
PDF2Word offers accessible document conversion for Windows users without environment setup
PPOCRLabel streamlines dataset preparation with GUI annotation and auto-labeling
Configuration System uses structured text and YAML files to define testing matrices and model parameters

Development Tools

Overview of Development Tools

Profiler Integration

Architecture and Components

TIPC Testing Framework

Architecture and Components

Configuration File Format

Testing Modes

Script Functions and Workflow

PDF2Word Document Conversion Tool

Application Variants

Usage Modes

Technical Implementation

System Requirements and Notes

Example Usage

Performance Metrics and Output

PDF2Word Document Conversion Tool

Application Variants

Usage Modes

Technical Implementation

System Requirements and Notes

PPOCRLabel Annotation Tool

Document Analysis and Conversion Utilities

Layout Recovery Utilities

YAML Configuration Management

Summary

On this page

Development Tools

Overview of Development Tools

Profiler Integration

Architecture and Components

TIPC Testing Framework

Architecture and Components

Configuration File Format

Testing Modes

Script Functions and Workflow

PDF2Word Document Conversion Tool

Application Variants

Usage Modes

Technical Implementation

System Requirements and Notes

Example Usage

Performance Metrics and Output

PDF2Word Document Conversion Tool

Application Variants

Usage Modes

Technical Implementation

System Requirements and Notes

PPOCRLabel Annotation Tool

Document Analysis and Conversion Utilities

Layout Recovery Utilities

YAML Configuration Management

Summary

On this page