PP-ChatOCRv4 Intelligent Document Understanding

Relevant source files

PP-ChatOCRv4 (also referred to as PP-ChatOCRv4-doc) is an intelligent document analysis pipeline that combines Large Language Models (LLMs), Multimodal LLMs (MLLMs), and OCR technology to extract structured information from complex documents. It addresses challenges such as layout analysis, rare characters, multi-page PDFs, tables, formulas, and seal text recognition by integrating ERNIE Bot (ERNIE 4.5) with document parsing capabilities. The pipeline enables question-answering interactions with documents.

Installation note: PP-ChatOCRv4 requires the ie optional dependency. Install with:
See pyproject.toml for the full dependency set: ie = ["paddlex[ie]>=3.4.0,<3.5.0"].

For basic text recognition without LLM integration, see page 2.1 (PP-OCRv5 Universal Text Recognition). For document parsing without intelligent extraction, see page 2.3 (PP-StructureV3 Document Parsing). For multilingual document parsing using vision-language models, see page 2.2 (PaddleOCR-VL Vision-Language Model).

System Architecture

PP-ChatOCRv4 operates as a three-layer architecture that processes documents through parsing, analysis, and intelligent extraction stages.

High-Level Architecture

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-25 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md1-24 pyproject.toml57-61

Component Integration

Module-to-code mapping for PP-ChatOCRv4:

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md15-26 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md13-24 pyproject.toml58-61

Core Components

Document Parsing Layer

The document parsing layer leverages PP-StructureV3 capabilities to convert raw documents into structured representations. This layer consists of nine configurable modules that can be enabled or disabled based on requirements.

Layout Detection Module

The layout detection module identifies document structure using PP-DocLayout_plus-L, which recognizes 20 common layout categories including document titles, paragraph titles, text blocks, tables, formulas, images, seals, and charts.

Model	mAP(0.5)	GPU Time (ms)	CPU Time (ms)	Size (MB)	Categories
PP-DocLayout_plus-L	83.2%	53.03 / 17.23	634.62 / 378.32	126.01	20 layout types

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md86-112 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md87-111

OCR Modules

The OCR subsystem comprises text detection and recognition components:

Text Detection: Uses PP-OCRv5_server_det (83.8% Hmean, 89.55ms GPU inference) Text Recognition: Uses PP-OCRv5_server_rec (86.38% accuracy, 8.46ms GPU inference)

These models support five text types: Simplified Chinese, Traditional Chinese, English, Japanese, and Pinyin in a single model.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md424-458 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md426-460

Specialized Recognition Modules

Table Structure Recognition: SLANeXt_wired and SLANeXt_wireless models handle wired and wireless tables separately (69.65% accuracy, 85.92ms GPU inference).

Formula Recognition: Multiple models supported including LatexOCR (76.9% BLEU) and UniMERNet_base (74.7% CDM, 78.7% ExpRate).

Seal Text Detection: PP-OCRv4_server_seal_det model specialized for curved seal text (98.0% detection Hmean).

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md608-664 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md608-664

LLM Integration Layer

The LLM integration layer connects document parsing results with ERNIE 4.5 to enable intelligent information extraction and question-answering.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-975 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md866-975

Information Extraction Layer

The extraction layer processes LLM responses to provide precise answers to user queries about document content.

Pipeline Workflow

End-to-End Processing Flow

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-1047 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md866-1047

Module Execution Order

The pipeline executes modules in a specific sequence to optimize processing efficiency:

Optional Preprocessing: Document orientation classification → Text unwarping
Layout Analysis: Layout detection identifies regions of interest
Parallel Recognition: Text, table, formula, and seal recognition run on respective regions
Text Processing: Text detection → Textline orientation → Text recognition
LLM Processing: Vector embedding → Similarity search → LLM query → Answer generation

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md15-26

Configuration System

Key Configuration Parameters

PP-ChatOCRv4 uses a hierarchical configuration system inherited from PaddleX. Key options:

Parameter	Type	Default	Description
`layout_model_name`	str	`"PP-DocLayout_plus-L"`	Layout detection model
`text_detection_model_name`	str	`"PP-OCRv5_server_det"`	Text detection model
`text_recognition_model_name`	str	`"PP-OCRv5_server_rec"`	Text recognition model
`table_structure_model_name`	str	`"SLANeXt_wired"`	Table structure model
`formula_recognition_model_name`	str	`"LatexOCR"`	Formula recognition model
`use_doc_orientation_classify`	bool	`False`	Enable document orientation
`use_doc_unwarping`	bool	`False`	Enable document unwarping
`use_textline_orientation`	bool	`False`	Enable textline orientation
`use_seal_text_detection`	bool	`False`	Enable seal detection
`use_table_recognition`	bool	`True`	Enable table recognition
`use_formula_recognition`	bool	`True`	Enable formula recognition
`llm_name`	str	`"ernie-4.5-8k-preview"`	LLM model selection
`api_type`	str	`"qianfan"`	API service type

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-975 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md866-975

Configuration Management

Configuration flow through the PaddleX/PaddleOCR layers:

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md803-865 docs/version3.x/paddleocr_and_paddlex.en.md1-40

API and Usage Patterns

Python API

The PP-ChatOCRv4 pipeline is accessed through the PaddleX pipeline system. The PaddleX registration name is PP-ChatOCRv4. The pipeline wrapper class is located in paddleocr/_pipelines/pp_chatocrv4.py.

Key initialization parameters and an example invocation pattern:

Parameter	Type	Default	Description
`layout_model_name`	str	`"PP-DocLayout_plus-L"`	Layout detection model
`text_detection_model_name`	str	`"PP-OCRv5_server_det"`	Text detection model
`text_recognition_model_name`	str	`"PP-OCRv5_server_rec"`	Text recognition model
`use_table_recognition`	bool	`True`	Enable table recognition
`use_formula_recognition`	bool	`True`	Enable formula recognition
`use_seal_text_detection`	bool	`False`	Enable seal detection
`llm_name`	str	`"ernie-4.5-8k-preview"`	LLM model name
`api_type`	str	`"qianfan"`	API service type
`api_key`	str	—	Qianfan API key
`secret_key`	str	—	Qianfan secret key

Refer to docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md for full parameter lists and usage examples.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md976-1047 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md976-1047

Command-Line Interface

The paddleocr CLI entry point (defined in pyproject.toml) exposes the pipeline as pp_chatocrv4:

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md803-865 pyproject.toml54-55

Key Methods

Method	Signature	Description
`predict()`	`predict(input, query, vector_path=None, ...)`	Main prediction method combining OCR + LLM
`save_vector()`	`save_vector(vector_path)`	Persist document embeddings to disk
`load_vector()`	`load_vector(vector_path)`	Load previously persisted embeddings
`save_visual_info_list()`	`save_visual_info_list(path)`	Save visual bounding box / layout information
`load_visual_info_list()`	`load_visual_info_list(path)`	Load previously saved visual information

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md866-1047

Vector Database Integration

Vector Store Architecture

PP-ChatOCRv4 implements a vector database system for efficient document retrieval and context management when processing queries.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-1047

Vector Store Operations

The vector database enables persistent storage of document embeddings for repeated queries without reprocessing:

First-time processing:

Parse document structure
Generate embeddings for text chunks
Save to vector_path using save_vector()

Subsequent queries:

Load vectors using load_vector(vector_path)
Perform similarity search with query
Retrieve relevant chunks for LLM context

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md976-1047

ERNIE 4.5 Integration

Authentication and API Configuration

PP-ChatOCRv4 connects to ERNIE Bot services through the Qianfan API:

Parameter	Required	Description
`api_type`	Yes	API service type ("qianfan")
`api_key`	Yes	Qianfan API key
`secret_key`	Yes	Qianfan secret key
`llm_name`	No	Model name (default: "ernie-4.5-8k-preview")
`llm_params`	No	Additional LLM parameters

The authentication mechanism uses the API key and secret key to obtain access tokens for making LLM API calls.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-975

Supported LLM Models

PP-ChatOCRv4 supports multiple ERNIE model variants:

ernie-4.5-8k-preview: Default model with 8K context window
ernie-4.5-turbo-8k: Faster variant for lower latency
ernie-4.5-128k-preview: Extended context for long documents
ernie-3.5: Previous generation model

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-975

Prompt Construction

The pipeline constructs prompts by combining:

System instructions for information extraction
Document context from vector database retrieval
Visual information (bounding boxes, layout structure)
User query

This structured prompt enables ERNIE 4.5 to understand document structure and provide accurate, contextually relevant answers.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-25

Module Dependencies

Optional vs Required Modules

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md15-26 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md13-24

Performance Characteristics

Inference Performance

The table below shows inference times for core models in PP-ChatOCRv4:

Module	Model	GPU Time (ms)	CPU Time (ms)	Accuracy
Layout Detection	PP-DocLayout_plus-L	53.03 / 17.23	634.62 / 378.32	83.2% mAP
Text Detection	PP-OCRv5_server_det	89.55 / 70.19	383.15 / 383.15	83.8% Hmean
Text Recognition	PP-OCRv5_server_rec	8.46 / 2.36	31.21 / 31.21	86.38% Avg
Table Structure	SLANeXt_wired	85.92 / 85.92	- / 501.66	69.65%
Formula	LatexOCR	358.34 / 358.34	- / 1620.76	76.9% BLEU

Times shown as [Standard Mode / High-Performance Mode]

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md86-664

Accuracy Improvements

Key improvements in PP-ChatOCRv4 over prior versions:

Enhanced layout detection with PP-DocLayout_plus-L (20 layout categories vs. fewer in older models)
Improved OCR with PP-OCRv5_server_det / PP-OCRv5_server_rec (PP-OCRv5 is ~13% better than PP-OCRv4 on varied scenes)
Improved table recognition with SLANeXt_wired / SLANeXt_wireless (separate wired and wireless models)
Integration with ERNIE 4.5 LLM for richer context understanding
Vector database (save_vector / load_vector) for efficient document chunking and retrieval

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-9 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md1-9

Deployment Considerations

Hardware Requirements

PP-ChatOCRv4 can be deployed on various hardware configurations:

Minimum (CPU-only):

8GB RAM
Multi-core CPU for parallel processing
Storage for models (~500MB total)

Recommended (GPU):

NVIDIA GPU with 8GB+ VRAM
CUDA 11.8 or 12.6
16GB+ system RAM
SSD storage for vector database

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1048-1227

Model Selection Strategy

Choose models based on deployment constraints:

High Accuracy: Use server models (PP-OCRv5_server, PP-DocLayout_plus-L) Fast Inference: Use mobile models (PP-OCRv5_mobile, PP-DocLayout-S) Memory Constrained: Disable optional modules (formula, seal detection)

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md86-664

Service Deployment

PP-ChatOCRv4 supports multiple deployment modes:

Local Python library: Direct calls through the PaddleX pipeline system
HTTP Service: REST API via PaddleX serving infrastructure with Docker deployment
High-performance inference: Enable with enable_hpi=True and TensorRT/ONNX backends

The pipeline integrates with PaddleX for production deployment; refer to docs/version3.x/paddleocr_and_paddlex.en.md for the PaddleOCR–PaddleX relationship and pipeline registration names.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-9 docs/version3.x/paddleocr_and_paddlex.en.md1-40

PP-ChatOCRv4 vs PP-StructureV3

Aspect	PP-ChatOCRv4	PP-StructureV3
Primary Goal	Intelligent Q&A + Extraction	Document Parsing
LLM Integration	Yes (ERNIE 4.5)	No
Vector Database	Yes	No
Query Support	Yes	No
Output Format	Answers + JSON/Markdown	JSON/Markdown only
Use Case	Information extraction from docs	Document structure conversion

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-9 docs/version3.x/pipeline_usage/PP-StructureV3.md1-9

PP-ChatOCRv4 vs PaddleOCR-VL

Aspect	PP-ChatOCRv4	PaddleOCR-VL
Architecture	Pipeline-based	Vision-Language Model
Language Support	Via translation	Native 109+ languages
Customization	Modular components	End-to-end VLM
Training	Per-module fine-tuning	Unified VLM training
LLM Dependency	External (ERNIE)	Integrated (ERNIE-4.5-0.3B)

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-9 README.md61-66

PP-ChatOCRv4 Intelligent Document Understanding

Relevant source files

Installation note: PP-ChatOCRv4 requires the ie optional dependency. Install with:
See pyproject.toml for the full dependency set: ie = ["paddlex[ie]>=3.4.0,<3.5.0"].

System Architecture

PP-ChatOCRv4 operates as a three-layer architecture that processes documents through parsing, analysis, and intelligent extraction stages.

High-Level Architecture

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-25 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md1-24 pyproject.toml57-61

Component Integration

Module-to-code mapping for PP-ChatOCRv4:

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md15-26 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md13-24 pyproject.toml58-61

Core Components

Document Parsing Layer

Layout Detection Module

Model	mAP(0.5)	GPU Time (ms)	CPU Time (ms)	Size (MB)	Categories
PP-DocLayout_plus-L	83.2%	53.03 / 17.23	634.62 / 378.32	126.01	20 layout types

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md86-112 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md87-111

OCR Modules

The OCR subsystem comprises text detection and recognition components:

Text Detection: Uses PP-OCRv5_server_det (83.8% Hmean, 89.55ms GPU inference) Text Recognition: Uses PP-OCRv5_server_rec (86.38% accuracy, 8.46ms GPU inference)

These models support five text types: Simplified Chinese, Traditional Chinese, English, Japanese, and Pinyin in a single model.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md424-458 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md426-460

Specialized Recognition Modules

Table Structure Recognition: SLANeXt_wired and SLANeXt_wireless models handle wired and wireless tables separately (69.65% accuracy, 85.92ms GPU inference).

Formula Recognition: Multiple models supported including LatexOCR (76.9% BLEU) and UniMERNet_base (74.7% CDM, 78.7% ExpRate).

Seal Text Detection: PP-OCRv4_server_seal_det model specialized for curved seal text (98.0% detection Hmean).

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md608-664 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md608-664

LLM Integration Layer

The LLM integration layer connects document parsing results with ERNIE 4.5 to enable intelligent information extraction and question-answering.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-975 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md866-975

Information Extraction Layer

The extraction layer processes LLM responses to provide precise answers to user queries about document content.

Pipeline Workflow

End-to-End Processing Flow

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-1047 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md866-1047

Module Execution Order

The pipeline executes modules in a specific sequence to optimize processing efficiency:

Optional Preprocessing: Document orientation classification → Text unwarping
Layout Analysis: Layout detection identifies regions of interest
Parallel Recognition: Text, table, formula, and seal recognition run on respective regions
Text Processing: Text detection → Textline orientation → Text recognition
LLM Processing: Vector embedding → Similarity search → LLM query → Answer generation

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md15-26

Configuration System

Key Configuration Parameters

PP-ChatOCRv4 uses a hierarchical configuration system inherited from PaddleX. Key options:

Parameter	Type	Default	Description
`layout_model_name`	str	`"PP-DocLayout_plus-L"`	Layout detection model
`text_detection_model_name`	str	`"PP-OCRv5_server_det"`	Text detection model
`text_recognition_model_name`	str	`"PP-OCRv5_server_rec"`	Text recognition model
`table_structure_model_name`	str	`"SLANeXt_wired"`	Table structure model
`formula_recognition_model_name`	str	`"LatexOCR"`	Formula recognition model
`use_doc_orientation_classify`	bool	`False`	Enable document orientation
`use_doc_unwarping`	bool	`False`	Enable document unwarping
`use_textline_orientation`	bool	`False`	Enable textline orientation
`use_seal_text_detection`	bool	`False`	Enable seal detection
`use_table_recognition`	bool	`True`	Enable table recognition
`use_formula_recognition`	bool	`True`	Enable formula recognition
`llm_name`	str	`"ernie-4.5-8k-preview"`	LLM model selection
`api_type`	str	`"qianfan"`	API service type

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-975 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md866-975

Configuration Management

Configuration flow through the PaddleX/PaddleOCR layers:

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md803-865 docs/version3.x/paddleocr_and_paddlex.en.md1-40

API and Usage Patterns

Python API

Key initialization parameters and an example invocation pattern:

Parameter	Type	Default	Description
`layout_model_name`	str	`"PP-DocLayout_plus-L"`	Layout detection model
`text_detection_model_name`	str	`"PP-OCRv5_server_det"`	Text detection model
`text_recognition_model_name`	str	`"PP-OCRv5_server_rec"`	Text recognition model
`use_table_recognition`	bool	`True`	Enable table recognition
`use_formula_recognition`	bool	`True`	Enable formula recognition
`use_seal_text_detection`	bool	`False`	Enable seal detection
`llm_name`	str	`"ernie-4.5-8k-preview"`	LLM model name
`api_type`	str	`"qianfan"`	API service type
`api_key`	str	—	Qianfan API key
`secret_key`	str	—	Qianfan secret key

Refer to docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md for full parameter lists and usage examples.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md976-1047 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md976-1047

Command-Line Interface

The paddleocr CLI entry point (defined in pyproject.toml) exposes the pipeline as pp_chatocrv4:

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md803-865 pyproject.toml54-55

Key Methods

Method	Signature	Description
`predict()`	`predict(input, query, vector_path=None, ...)`	Main prediction method combining OCR + LLM
`save_vector()`	`save_vector(vector_path)`	Persist document embeddings to disk
`load_vector()`	`load_vector(vector_path)`	Load previously persisted embeddings
`save_visual_info_list()`	`save_visual_info_list(path)`	Save visual bounding box / layout information
`load_visual_info_list()`	`load_visual_info_list(path)`	Load previously saved visual information

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md866-1047

Vector Database Integration

Vector Store Architecture

PP-ChatOCRv4 implements a vector database system for efficient document retrieval and context management when processing queries.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-1047

Vector Store Operations

The vector database enables persistent storage of document embeddings for repeated queries without reprocessing:

First-time processing:

Parse document structure
Generate embeddings for text chunks
Save to vector_path using save_vector()

Subsequent queries:

Load vectors using load_vector(vector_path)
Perform similarity search with query
Retrieve relevant chunks for LLM context

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md976-1047

ERNIE 4.5 Integration

Authentication and API Configuration

PP-ChatOCRv4 connects to ERNIE Bot services through the Qianfan API:

Parameter	Required	Description
`api_type`	Yes	API service type ("qianfan")
`api_key`	Yes	Qianfan API key
`secret_key`	Yes	Qianfan secret key
`llm_name`	No	Model name (default: "ernie-4.5-8k-preview")
`llm_params`	No	Additional LLM parameters

The authentication mechanism uses the API key and secret key to obtain access tokens for making LLM API calls.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-975

Supported LLM Models

PP-ChatOCRv4 supports multiple ERNIE model variants:

ernie-4.5-8k-preview: Default model with 8K context window
ernie-4.5-turbo-8k: Faster variant for lower latency
ernie-4.5-128k-preview: Extended context for long documents
ernie-3.5: Previous generation model

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md866-975

Prompt Construction

The pipeline constructs prompts by combining:

System instructions for information extraction
Document context from vector database retrieval
Visual information (bounding boxes, layout structure)
User query

This structured prompt enables ERNIE 4.5 to understand document structure and provide accurate, contextually relevant answers.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-25

Module Dependencies

Optional vs Required Modules

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md15-26 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md13-24

Performance Characteristics

Inference Performance

The table below shows inference times for core models in PP-ChatOCRv4:

Module	Model	GPU Time (ms)	CPU Time (ms)	Accuracy
Layout Detection	PP-DocLayout_plus-L	53.03 / 17.23	634.62 / 378.32	83.2% mAP
Text Detection	PP-OCRv5_server_det	89.55 / 70.19	383.15 / 383.15	83.8% Hmean
Text Recognition	PP-OCRv5_server_rec	8.46 / 2.36	31.21 / 31.21	86.38% Avg
Table Structure	SLANeXt_wired	85.92 / 85.92	- / 501.66	69.65%
Formula	LatexOCR	358.34 / 358.34	- / 1620.76	76.9% BLEU

Times shown as [Standard Mode / High-Performance Mode]

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md86-664

Accuracy Improvements

Key improvements in PP-ChatOCRv4 over prior versions:

Enhanced layout detection with PP-DocLayout_plus-L (20 layout categories vs. fewer in older models)
Improved OCR with PP-OCRv5_server_det / PP-OCRv5_server_rec (PP-OCRv5 is ~13% better than PP-OCRv4 on varied scenes)
Improved table recognition with SLANeXt_wired / SLANeXt_wireless (separate wired and wireless models)
Integration with ERNIE 4.5 LLM for richer context understanding
Vector database (save_vector / load_vector) for efficient document chunking and retrieval

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-9 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md1-9

Deployment Considerations

Hardware Requirements

PP-ChatOCRv4 can be deployed on various hardware configurations:

Minimum (CPU-only):

8GB RAM
Multi-core CPU for parallel processing
Storage for models (~500MB total)

Recommended (GPU):

NVIDIA GPU with 8GB+ VRAM
CUDA 11.8 or 12.6
16GB+ system RAM
SSD storage for vector database

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1048-1227

Model Selection Strategy

Choose models based on deployment constraints:

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md86-664

Service Deployment

PP-ChatOCRv4 supports multiple deployment modes:

Local Python library: Direct calls through the PaddleX pipeline system
HTTP Service: REST API via PaddleX serving infrastructure with Docker deployment
High-performance inference: Enable with enable_hpi=True and TensorRT/ONNX backends

The pipeline integrates with PaddleX for production deployment; refer to docs/version3.x/paddleocr_and_paddlex.en.md for the PaddleOCR–PaddleX relationship and pipeline registration names.

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-9 docs/version3.x/paddleocr_and_paddlex.en.md1-40

PP-ChatOCRv4 vs PP-StructureV3

Aspect	PP-ChatOCRv4	PP-StructureV3
Primary Goal	Intelligent Q&A + Extraction	Document Parsing
LLM Integration	Yes (ERNIE 4.5)	No
Vector Database	Yes	No
Query Support	Yes	No
Output Format	Answers + JSON/Markdown	JSON/Markdown only
Use Case	Information extraction from docs	Document structure conversion

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-9 docs/version3.x/pipeline_usage/PP-StructureV3.md1-9

PP-ChatOCRv4 vs PaddleOCR-VL

Aspect	PP-ChatOCRv4	PaddleOCR-VL
Architecture	Pipeline-based	Vision-Language Model
Language Support	Via translation	Native 109+ languages
Customization	Modular components	End-to-end VLM
Training	Per-module fine-tuning	Unified VLM training
LLM Dependency	External (ERNIE)	Integrated (ERNIE-4.5-0.3B)

Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md1-9 README.md61-66

PP-ChatOCRv4 Intelligent Document Understanding

System Architecture

High-Level Architecture

Component Integration

Core Components

Document Parsing Layer

Layout Detection Module

OCR Modules

Specialized Recognition Modules

LLM Integration Layer

Information Extraction Layer

Pipeline Workflow

End-to-End Processing Flow

Module Execution Order

Configuration System

Key Configuration Parameters

Configuration Management

API and Usage Patterns

Python API

Command-Line Interface

Key Methods

Vector Database Integration

Vector Store Architecture

Vector Store Operations

ERNIE 4.5 Integration

Authentication and API Configuration

Supported LLM Models

Prompt Construction

Module Dependencies

Optional vs Required Modules

Performance Characteristics

Inference Performance

Accuracy Improvements

Deployment Considerations

Hardware Requirements

Model Selection Strategy

Service Deployment

Comparison with Related Pipelines

PP-ChatOCRv4 vs PP-StructureV3

PP-ChatOCRv4 vs PaddleOCR-VL

On this page

PP-ChatOCRv4 Intelligent Document Understanding

System Architecture

High-Level Architecture

Component Integration

Core Components

Document Parsing Layer

Layout Detection Module

OCR Modules

Specialized Recognition Modules

LLM Integration Layer

Information Extraction Layer

Pipeline Workflow

End-to-End Processing Flow

Module Execution Order

Configuration System

Key Configuration Parameters

Configuration Management

API and Usage Patterns

Python API

Command-Line Interface

Key Methods

Vector Database Integration

Vector Store Architecture

Vector Store Operations

ERNIE 4.5 Integration

Authentication and API Configuration

Supported LLM Models

Prompt Construction

Module Dependencies

Optional vs Required Modules

Performance Characteristics

Inference Performance

Accuracy Improvements

Deployment Considerations

Hardware Requirements

Model Selection Strategy

Service Deployment

Comparison with Related Pipelines

PP-ChatOCRv4 vs PP-StructureV3