This page introduces PaddleOCR — its purpose, major capabilities, primary pipelines, and the architecture that relates them. For details on system architecture and inter-component wiring, see System Architecture and Major Pipelines. For installation and setup, see Installation and Environment Setup. For breaking changes from 2.x, see Version Migration Guide.
PaddleOCR is an OCR and document intelligence toolkit built on the PaddlePaddle deep learning framework. It converts document images and PDFs into structured, machine-readable output (JSON, Markdown) with a focus on production deployment.
The project is distributed as the paddleocr Python package and is available under the Apache 2.0 license. The current major version is 3.x (latest: 3.4.0, released January 2026). PaddleOCR 3.x is not backward-compatible with 2.x — see Version Migration Guide.
| Property | Value |
|---|---|
| Python versions | 3.8 – 3.12 |
| Platforms | Linux, Windows, macOS |
| Hardware | CPU, NVIDIA GPU, Kunlunxin XPU, Ascend NPU |
| Languages supported | 100+ |
| Framework dependency | PaddlePaddle ≥ 3.0 |
| License | Apache 2.0 |
Sources: README.md18-22
PaddleOCR 3.x organizes its capabilities into pipelines — end-to-end processing units that combine multiple models. Each pipeline is accessible through a Python class, a CLI subcommand, and optionally a service endpoint.
| Pipeline | Python Class | CLI Subcommand | Primary Output |
|---|---|---|---|
| PP-OCRv5 | PaddleOCR | paddleocr ocr | Text with bounding boxes (JSON) |
| PP-StructureV3 | PPStructureV3 | paddleocr pp_structurev3 | Markdown, JSON |
| PP-ChatOCRv4 | PPChatOCRv4 | — | Key-value extraction via LLM |
| PaddleOCR-VL | PaddleOCRVL | paddleocr paddleocr_vl | Markdown, JSON (VLM-based) |
| PP-DocTranslation | PPDocTranslation | — | Translated Markdown |
Sources: docs/quick_start.en.md39-87 README.md61-76 mkdocs.yml297-353
PP-OCRv5 — Universal text recognition. A single model handles Simplified Chinese, Traditional Chinese, English, Japanese, and Pinyin. Accuracy improved 13% over PP-OCRv4. See PP-OCRv5 Universal Text Recognition.
PP-StructureV3 — Complex document parsing. Extracts layout, tables, formulas, seals, and charts, then outputs structured Markdown and JSON. See PP-StructureV3 Document Parsing.
PP-ChatOCRv4 — LLM-integrated information extraction. Uses ERNIE 4.5 to answer natural-language queries over document content. Accuracy improved 15% over v3. See PP-ChatOCRv4 Intelligent Document Understanding.
PaddleOCR-VL — Vision-language model pipeline. The core model PaddleOCR-VL-0.9B (1.5 version: PaddleOCR-VL-1.5-0.9B) combines a NaViT-style visual encoder with ERNIE-4.5-0.3B. Achieves 94.5% on OmniDocBench v1.5. Supports 111 languages. See PaddleOCR-VL Vision-Language Model.
PP-DocTranslation — Document translation. Combines PP-StructureV3 and ERNIE 4.5 to translate PDFs and document images, outputting translated Markdown. See PP-DocTranslation Pipeline.
Sources: README.md61-76 docs/index.en.md27-43
Diagram: PaddleOCR Layered Architecture
Sources: README.md77-81 docs/quick_start.en.md39-196 mkdocs.yml317-352
The top-level paddleocr package exposes classes directly importable for use. The primary usage pattern is instantiate → call predict() → iterate results.
Diagram: Python API Classes to Pipeline Mapping
Example calling pattern (from docs/quick_start.en.md66-87):
PaddleOCR(use_doc_orientation_classify=False, ...) — constructs the OCR pipeline.predict("image.png") — runs inference, returns an iterable of result objectsres.save_to_json("output") / res.save_to_img("output") — persist resultsSources: docs/quick_start.en.md63-196
The paddleocr CLI dispatches to individual pipeline or module runners. Key subcommands:
| Subcommand | Equivalent Python class | Description |
|---|---|---|
paddleocr ocr | PaddleOCR | Run full PP-OCRv5 OCR pipeline |
paddleocr pp_structurev3 | PPStructureV3 | Run PP-StructureV3 document parsing |
paddleocr text_detection | TextDetection | Run text detection module only |
paddleocr text_recognition | TextRecognition | Run text recognition module only |
Sources: docs/quick_start.en.md39-60
PaddleOCR 3.x supports multiple inference backends and hardware targets. See Deployment and Inference for full details.
| Deployment Mode | Description |
|---|---|
| Python inference | Default; uses Paddle Inference backend |
| High-performance inference | TensorRT or ONNX Runtime; CUDA 12 supported |
| C++ inference | CMake-based build; Linux and Windows |
| Service deployment | HTTP/gRPC with Docker image support |
| MCP server | Stdio and Streamable HTTP modes for LLM agent integration |
| Mobile / edge | Paddle-Lite for Android and embedded devices |
| Parallel inference | Multi-device pipeline instances |
Hardware accelerators: NVIDIA GPU (CUDA 11.8, CUDA 12), Kunlunxin XPU, Huawei Ascend NPU, Hygon DCU, MetaX GPU, Iluvatar GPU, Apple Silicon.
Sources: README.md119-146 mkdocs.yml317-328
PaddleOCR 3.x is built on top of PaddleX, Baidu's unified low-code AI development platform. PaddleX provides:
paddleocr pipelinesexport_paddlex_config_to_yaml)PADDLE_PDX_MODEL_SOURCE)The relationship between the two: paddleocr is a domain-specific layer on top of PaddleX that packages OCR-specific pipelines, pre-trained models, and documentation. See Package Structure and PaddleX Integration.
Sources: mkdocs.yml358-365 README.md192-199
PaddleOCR is used as a dependency or integrated component in several external projects:
The MCP server introduced in 3.1.0 (docs/version3.x/deployment/mcp_server.md) allows LLM agents such as Claude Desktop to call PaddleOCR pipelines directly. It supports three modes: local Python library, AIStudio cloud service, and self-hosted service.
Sources: README.md49 docs/index.en.md16 README.md175-178
| Version | Date | Key Additions |
|---|---|---|
| 3.4.0 | 2026-01-29 | PaddleOCR-VL-1.5 (111 languages, OmniDocBench 94.5%) |
| 3.3.0 | 2025-10-16 | PaddleOCR-VL (109 languages, 0.9B VLM) |
| 3.2.0 | 2025-08-21 | C++ deployment upgrade, CUDA 12, ONNX Runtime backend |
| 3.1.0 | 2025-06-29 | PP-OCRv5 multilingual (37 langs), PP-DocTranslation, MCP server |
| 3.0.0 | 2025-05-20 | PP-OCRv5, PP-StructureV3, PP-ChatOCRv4, PaddlePaddle 3.0 base |
For migration guidance from 2.x to 3.x, see Version Migration Guide.
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.