Overview

Relevant source files

This page introduces PaddleOCR — its purpose, major capabilities, primary pipelines, and the architecture that relates them. For details on system architecture and inter-component wiring, see System Architecture and Major Pipelines. For installation and setup, see Installation and Environment Setup. For breaking changes from 2.x, see Version Migration Guide.

What Is PaddleOCR

PaddleOCR is an OCR and document intelligence toolkit built on the PaddlePaddle deep learning framework. It converts document images and PDFs into structured, machine-readable output (JSON, Markdown) with a focus on production deployment.

The project is distributed as the paddleocr Python package and is available under the Apache 2.0 license. The current major version is 3.x (latest: 3.4.0, released January 2026). PaddleOCR 3.x is not backward-compatible with 2.x — see Version Migration Guide.

Property	Value
Python versions	3.8 – 3.12
Platforms	Linux, Windows, macOS
Hardware	CPU, NVIDIA GPU, Kunlunxin XPU, Ascend NPU
Languages supported	100+
Framework dependency	PaddlePaddle ≥ 3.0
License	Apache 2.0

Sources: README.md18-22

Major Pipelines

PaddleOCR 3.x organizes its capabilities into pipelines — end-to-end processing units that combine multiple models. Each pipeline is accessible through a Python class, a CLI subcommand, and optionally a service endpoint.

Pipeline	Python Class	CLI Subcommand	Primary Output
PP-OCRv5	`PaddleOCR`	`paddleocr ocr`	Text with bounding boxes (JSON)
PP-StructureV3	`PPStructureV3`	`paddleocr pp_structurev3`	Markdown, JSON
PP-ChatOCRv4	`PPChatOCRv4`	—	Key-value extraction via LLM
PaddleOCR-VL	`PaddleOCRVL`	`paddleocr paddleocr_vl`	Markdown, JSON (VLM-based)
PP-DocTranslation	`PPDocTranslation`	—	Translated Markdown

Sources: docs/quick_start.en.md39-87 README.md61-76 mkdocs.yml297-353

Pipeline Summaries

PP-OCRv5 — Universal text recognition. A single model handles Simplified Chinese, Traditional Chinese, English, Japanese, and Pinyin. Accuracy improved 13% over PP-OCRv4. See PP-OCRv5 Universal Text Recognition.

PP-StructureV3 — Complex document parsing. Extracts layout, tables, formulas, seals, and charts, then outputs structured Markdown and JSON. See PP-StructureV3 Document Parsing.

PP-ChatOCRv4 — LLM-integrated information extraction. Uses ERNIE 4.5 to answer natural-language queries over document content. Accuracy improved 15% over v3. See PP-ChatOCRv4 Intelligent Document Understanding.

PaddleOCR-VL — Vision-language model pipeline. The core model PaddleOCR-VL-0.9B (1.5 version: PaddleOCR-VL-1.5-0.9B) combines a NaViT-style visual encoder with ERNIE-4.5-0.3B. Achieves 94.5% on OmniDocBench v1.5. Supports 111 languages. See PaddleOCR-VL Vision-Language Model.

PP-DocTranslation — Document translation. Combines PP-StructureV3 and ERNIE 4.5 to translate PDFs and document images, outputting translated Markdown. See PP-DocTranslation Pipeline.

Sources: README.md61-76 docs/index.en.md27-43

Architecture Overview

Diagram: PaddleOCR Layered Architecture

Sources: README.md77-81 docs/quick_start.en.md39-196 mkdocs.yml317-352

Python API Entry Points

The top-level paddleocr package exposes classes directly importable for use. The primary usage pattern is instantiate → call predict() → iterate results.

Diagram: Python API Classes to Pipeline Mapping

Example calling pattern (from docs/quick_start.en.md66-87):

PaddleOCR(use_doc_orientation_classify=False, ...) — constructs the OCR pipeline
.predict("image.png") — runs inference, returns an iterable of result objects
res.save_to_json("output") / res.save_to_img("output") — persist results

Sources: docs/quick_start.en.md63-196

CLI Structure

The paddleocr CLI dispatches to individual pipeline or module runners. Key subcommands:

Subcommand	Equivalent Python class	Description
`paddleocr ocr`	`PaddleOCR`	Run full PP-OCRv5 OCR pipeline
`paddleocr pp_structurev3`	`PPStructureV3`	Run PP-StructureV3 document parsing
`paddleocr text_detection`	`TextDetection`	Run text detection module only
`paddleocr text_recognition`	`TextRecognition`	Run text recognition module only

Sources: docs/quick_start.en.md39-60

Hardware and Deployment Options

PaddleOCR 3.x supports multiple inference backends and hardware targets. See Deployment and Inference for full details.

Deployment Mode	Description
Python inference	Default; uses Paddle Inference backend
High-performance inference	TensorRT or ONNX Runtime; CUDA 12 supported
C++ inference	CMake-based build; Linux and Windows
Service deployment	HTTP/gRPC with Docker image support
MCP server	Stdio and Streamable HTTP modes for LLM agent integration
Mobile / edge	Paddle-Lite for Android and embedded devices
Parallel inference	Multi-device pipeline instances

Hardware accelerators: NVIDIA GPU (CUDA 11.8, CUDA 12), Kunlunxin XPU, Huawei Ascend NPU, Hygon DCU, MetaX GPU, Iluvatar GPU, Apple Silicon.

Sources: README.md119-146 mkdocs.yml317-328

PaddleX Integration

PaddleOCR 3.x is built on top of PaddleX, Baidu's unified low-code AI development platform. PaddleX provides:

The pipeline execution engine used by all paddleocr pipelines
YAML-based configuration system (exportable via export_paddlex_config_to_yaml)
Model download and management (default source: HuggingFace; configurable via PADDLE_PDX_MODEL_SOURCE)
Training orchestration and data tools

The relationship between the two: paddleocr is a domain-specific layer on top of PaddleX that packages OCR-specific pipelines, pre-trained models, and documentation. See Package Structure and PaddleX Integration.

Sources: mkdocs.yml358-365 README.md192-199

Notable Integrations

PaddleOCR is used as a dependency or integrated component in several external projects:

MinerU — document extraction
RAGFlow — retrieval-augmented generation
OmniParser — UI parsing
Umi-OCR — desktop OCR application
cherry-studio — LLM client

The MCP server introduced in 3.1.0 (docs/version3.x/deployment/mcp_server.md) allows LLM agents such as Claude Desktop to call PaddleOCR pipelines directly. It supports three modes: local Python library, AIStudio cloud service, and self-hosted service.

Sources: README.md49 docs/index.en.md16 README.md175-178

Version History Summary

Version	Date	Key Additions
3.4.0	2026-01-29	PaddleOCR-VL-1.5 (111 languages, OmniDocBench 94.5%)
3.3.0	2025-10-16	PaddleOCR-VL (109 languages, 0.9B VLM)
3.2.0	2025-08-21	C++ deployment upgrade, CUDA 12, ONNX Runtime backend
3.1.0	2025-06-29	PP-OCRv5 multilingual (37 langs), PP-DocTranslation, MCP server
3.0.0	2025-05-20	PP-OCRv5, PP-StructureV3, PP-ChatOCRv4, PaddlePaddle 3.0 base

For migration guidance from 2.x to 3.x, see Version Migration Guide.

Sources: README.md85-183 docs/update/update.en.md1-65

Overview

Relevant source files

What Is PaddleOCR

Property	Value
Python versions	3.8 – 3.12
Platforms	Linux, Windows, macOS
Hardware	CPU, NVIDIA GPU, Kunlunxin XPU, Ascend NPU
Languages supported	100+
Framework dependency	PaddlePaddle ≥ 3.0
License	Apache 2.0

Sources: README.md18-22

Major Pipelines

Pipeline	Python Class	CLI Subcommand	Primary Output
PP-OCRv5	`PaddleOCR`	`paddleocr ocr`	Text with bounding boxes (JSON)
PP-StructureV3	`PPStructureV3`	`paddleocr pp_structurev3`	Markdown, JSON
PP-ChatOCRv4	`PPChatOCRv4`	—	Key-value extraction via LLM
PaddleOCR-VL	`PaddleOCRVL`	`paddleocr paddleocr_vl`	Markdown, JSON (VLM-based)
PP-DocTranslation	`PPDocTranslation`	—	Translated Markdown

Sources: docs/quick_start.en.md39-87 README.md61-76 mkdocs.yml297-353

Pipeline Summaries

PP-StructureV3 — Complex document parsing. Extracts layout, tables, formulas, seals, and charts, then outputs structured Markdown and JSON. See PP-StructureV3 Document Parsing.

PP-DocTranslation — Document translation. Combines PP-StructureV3 and ERNIE 4.5 to translate PDFs and document images, outputting translated Markdown. See PP-DocTranslation Pipeline.

Sources: README.md61-76 docs/index.en.md27-43

Architecture Overview

Diagram: PaddleOCR Layered Architecture

Sources: README.md77-81 docs/quick_start.en.md39-196 mkdocs.yml317-352

Python API Entry Points

The top-level paddleocr package exposes classes directly importable for use. The primary usage pattern is instantiate → call predict() → iterate results.

Diagram: Python API Classes to Pipeline Mapping

Example calling pattern (from docs/quick_start.en.md66-87):

PaddleOCR(use_doc_orientation_classify=False, ...) — constructs the OCR pipeline
.predict("image.png") — runs inference, returns an iterable of result objects
res.save_to_json("output") / res.save_to_img("output") — persist results

Sources: docs/quick_start.en.md63-196

CLI Structure

The paddleocr CLI dispatches to individual pipeline or module runners. Key subcommands:

Subcommand	Equivalent Python class	Description
`paddleocr ocr`	`PaddleOCR`	Run full PP-OCRv5 OCR pipeline
`paddleocr pp_structurev3`	`PPStructureV3`	Run PP-StructureV3 document parsing
`paddleocr text_detection`	`TextDetection`	Run text detection module only
`paddleocr text_recognition`	`TextRecognition`	Run text recognition module only

Sources: docs/quick_start.en.md39-60

Hardware and Deployment Options

PaddleOCR 3.x supports multiple inference backends and hardware targets. See Deployment and Inference for full details.

Deployment Mode	Description
Python inference	Default; uses Paddle Inference backend
High-performance inference	TensorRT or ONNX Runtime; CUDA 12 supported
C++ inference	CMake-based build; Linux and Windows
Service deployment	HTTP/gRPC with Docker image support
MCP server	Stdio and Streamable HTTP modes for LLM agent integration
Mobile / edge	Paddle-Lite for Android and embedded devices
Parallel inference	Multi-device pipeline instances

Hardware accelerators: NVIDIA GPU (CUDA 11.8, CUDA 12), Kunlunxin XPU, Huawei Ascend NPU, Hygon DCU, MetaX GPU, Iluvatar GPU, Apple Silicon.

Sources: README.md119-146 mkdocs.yml317-328

PaddleX Integration

PaddleOCR 3.x is built on top of PaddleX, Baidu's unified low-code AI development platform. PaddleX provides:

The pipeline execution engine used by all paddleocr pipelines
YAML-based configuration system (exportable via export_paddlex_config_to_yaml)
Model download and management (default source: HuggingFace; configurable via PADDLE_PDX_MODEL_SOURCE)
Training orchestration and data tools

Sources: mkdocs.yml358-365 README.md192-199

Notable Integrations

PaddleOCR is used as a dependency or integrated component in several external projects:

MinerU — document extraction
RAGFlow — retrieval-augmented generation
OmniParser — UI parsing
Umi-OCR — desktop OCR application
cherry-studio — LLM client

Sources: README.md49 docs/index.en.md16 README.md175-178

Version History Summary

Version	Date	Key Additions
3.4.0	2026-01-29	PaddleOCR-VL-1.5 (111 languages, OmniDocBench 94.5%)
3.3.0	2025-10-16	PaddleOCR-VL (109 languages, 0.9B VLM)
3.2.0	2025-08-21	C++ deployment upgrade, CUDA 12, ONNX Runtime backend
3.1.0	2025-06-29	PP-OCRv5 multilingual (37 langs), PP-DocTranslation, MCP server
3.0.0	2025-05-20	PP-OCRv5, PP-StructureV3, PP-ChatOCRv4, PaddlePaddle 3.0 base

For migration guidance from 2.x to 3.x, see Version Migration Guide.

Sources: README.md85-183 docs/update/update.en.md1-65

Overview

What Is PaddleOCR

Major Pipelines

Pipeline Summaries

Architecture Overview

Python API Entry Points

CLI Structure

Hardware and Deployment Options

PaddleX Integration

Notable Integrations

Version History Summary

On this page

Overview

What Is PaddleOCR

Major Pipelines

Pipeline Summaries

Architecture Overview

Python API Entry Points

CLI Structure

Hardware and Deployment Options

PaddleX Integration

Notable Integrations

Version History Summary

On this page