This page covers the paddleocr-mcp package, which exposes PaddleOCR pipeline capabilities as Model Context Protocol (MCP) tools for integration with LLM agent applications such as Claude for Desktop and VS Code Copilot. It is a separate, independently installable package located in mcp_server/.
For information about the underlying PaddleOCR Python API used in local mode, see 3.3 Python API Usage. For service deployment that the self-hosted mode connects to, see 5.4 Service Deployment.
The paddleocr-mcp package (mcp_server/) is a lightweight server built on FastMCP v2. It bridges PaddleOCR inference capabilities and LLM agent frameworks by registering PaddleOCR pipelines as callable MCP tools.
The package's entry point is paddleocr_mcp.__main__:main (see mcp_server/pyproject.toml28-29) and the CLI command is paddleocr_mcp.
Exposed MCP tools by pipeline:
| Pipeline Name | MCP Tool | Output |
|---|---|---|
OCR | ocr | Plain text or JSON with bounding boxes |
PP-StructureV3 | pp_structure_v3 | Markdown document |
PaddleOCR-VL | paddleocr_vl | Markdown document |
PaddleOCR-VL-1.5 | paddleocr_vl (v1.5) | Markdown document |
Sources: mcp_server/pyproject.toml mcp_server/paddleocr_mcp/__main__.py mcp_server/paddleocr_mcp/pipelines.py docs/version3.x/deployment/mcp_server.en.md
Diagram: MCP Server Component Architecture
Sources: mcp_server/paddleocr_mcp/__main__.py mcp_server/paddleocr_mcp/pipelines.py mcp_server/pyproject.toml
Diagram: Pipeline Handler Class Hierarchy
Sources: mcp_server/paddleocr_mcp/pipelines.py130-340 mcp_server/paddleocr_mcp/pipelines.py524-663 mcp_server/paddleocr_mcp/pipelines.py665-900
Diagram: Request Processing Flow
Sources: mcp_server/paddleocr_mcp/pipelines.py344-398 mcp_server/paddleocr_mcp/pipelines.py89-128
The server supports four ppocr_source values, each routing inference differently:
| Mode | ppocr_source value | Description | Auth Required |
|---|---|---|---|
| Local Python Library | local | Runs PaddleOCR in-process | None |
| PaddleOCR Official Service | aistudio | Calls Baidu AI Studio hosted API | PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN |
| Qianfan Platform | qianfan | Calls Baidu Qianfan cloud API | PADDLEOCR_MCP_QIANFAN_API_KEY |
| Self-hosted Service | self_hosted | Calls user-deployed PaddleOCR HTTP server | None (URL required) |
In local mode, PipelineHandler.__init__ calls _create_local_engine() at construction time, which instantiates either PaddleOCR, PPStructureV3, or PaddleOCRVL from the paddleocr package. The engine is then wrapped in _EngineWrapper, which runs inference in a dedicated background thread to avoid blocking the asyncio event loop.
In service modes, _call_service() sends HTTP POST requests via httpx.AsyncClient to a base URL plus an endpoint path (e.g., /ocr or /layout-parsing).
Qianfan platform support is limited to PP-StructureV3 and PaddleOCR-VL pipelines — this is enforced in _validate_args() in mcp_server/paddleocr_mcp/__main__.py140-145
Sources: mcp_server/paddleocr_mcp/pipelines.py156-180 mcp_server/paddleocr_mcp/pipelines.py467-505 mcp_server/paddleocr_mcp/__main__.py104-145
The server supports two MCP transport protocols:
--http. Default host 127.0.0.1, default port 8000.The FastMCP instance is configured in async_main() and started with either mcp.run_async() (stdio) or mcp.run_async(transport="streamable-http", ...) (HTTP).
Sources: mcp_server/paddleocr_mcp/__main__.py184-194
Optional extras are defined in mcp_server/pyproject.toml19-26 The local extra pulls in paddleocr[doc-parser]>=3.4; local-cpu additionally adds paddlepaddle>=3.2.1.
Verify installation:
Sources: mcp_server/pyproject.toml docs/version3.x/deployment/mcp_server.en.md90-123
Locate claude_desktop_config.json:
| OS | Path |
|---|---|
| macOS | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Windows | %APPDATA%\Claude\claude_desktop_config.json |
| Linux | ~/.config/Claude/claude_desktop_config.json |
Local mode:
PaddleOCR official service (aistudio) mode:
Self-hosted service mode:
uvx (No Manual Install)uvx can launch the server without a prior pip install. Use --from paddleocr-mcp paddleocr_mcp or --from paddleocr_mcp[local-cpu] paddleocr_mcp as the command/args pair.
Sources: docs/version3.x/deployment/mcp_server.en.md139-380
All parameters can be set as environment variables or CLI arguments. Environment variables take the form PADDLEOCR_MCP_<NAME>.
| Environment Variable | CLI Argument | Type | Default | Description |
|---|---|---|---|---|
PADDLEOCR_MCP_PIPELINE | --pipeline | str | "OCR" | Pipeline to run. Values: "OCR", "PP-StructureV3", "PaddleOCR-VL", "PaddleOCR-VL-1.5" |
PADDLEOCR_MCP_PPOCR_SOURCE | --ppocr_source | str | "local" | Source backend. Values: "local", "aistudio", "qianfan", "self_hosted" |
PADDLEOCR_MCP_SERVER_URL | --server_url | str | None | Base URL (required for non-local modes) |
PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN | --aistudio_access_token | str | None | AI Studio token (required for aistudio mode) |
PADDLEOCR_MCP_QIANFAN_API_KEY | --qianfan_api_key | str | None | Qianfan API key (required for qianfan mode) |
PADDLEOCR_MCP_TIMEOUT | --timeout | int | 60 | HTTP read timeout (seconds) |
PADDLEOCR_MCP_DEVICE | --device | str | None | Inference device (local mode only) |
PADDLEOCR_MCP_PIPELINE_CONFIG | --pipeline_config | str | None | Path to PaddleOCR pipeline YAML config (local mode only) |
| — | --http | bool | False | Use Streamable HTTP transport instead of stdio |
| — | --host | str | "127.0.0.1" | HTTP host address |
| — | --port | int | 8000 | HTTP port |
| — | --verbose | bool | False | Enable verbose logging |
Argument parsing and validation is implemented in _parse_args() and _validate_args() in mcp_server/paddleocr_mcp/__main__.py27-145
Sources: mcp_server/paddleocr_mcp/__main__.py27-101 docs/version3.x/deployment/mcp_server.en.md408-423
The ocr tool (registered in OCRHandler.register_tools) accepts an output_mode parameter:
output_mode | Format | Use case |
|---|---|---|
"simple" (default) | Plain text string + confidence summary | General text extraction |
"detailed" | JSON with text, confidence, bbox per line | When coordinates are needed |
_format_output() in OCRHandler mcp_server/paddleocr_mcp/pipelines.py638-663 produces the final string or JSON. Bounding boxes come from rec_boxes in local mode, and from rec_boxes in the service response's prunedResult.
Layout parsing tools (PP-StructureV3, PaddleOCR-VL) produce Markdown output (text) or a mixed list of TextContent and ImageContent objects when return_images=True. The Markdown-to-image interleaving is handled by _parse_markdown_with_images() in _LayoutParsingHandler mcp_server/paddleocr_mcp/pipelines.py808-840
Sources: mcp_server/paddleocr_mcp/pipelines.py524-663 mcp_server/paddleocr_mcp/pipelines.py767-806
For the PP-StructureV3 pipeline in local mode, disabling unused sub-modules substantially reduces memory and latency. The following shows the approach using PPStructureV3 with lightweight models, then exporting the config:
Set PADDLEOCR_MCP_PIPELINE_CONFIG to the exported YAML path to apply this configuration.
Note: PaddleOCR-VL series pipelines are not recommended for CPU inference due to large VLM model sizes.
Sources: docs/version3.x/deployment/mcp_server.en.md173-203
Sources: docs/version3.x/deployment/mcp_server.en.md382-405
local mode, the tools cannot process Base64-encoded PDF inputs — only Base64-encoded images are supported. This limitation is noted in _process_input_for_local() mcp_server/paddleocr_mcp/pipelines.py399-421local mode, file type is not inferred from the model's file_type prompt, so some complex URLs may fail.PP-StructureV3 and PaddleOCR-VL pipelines, results containing embedded images will significantly increase LLM token consumption. Use prompt instructions to exclude image content when not needed.Sources: docs/version3.x/deployment/mcp_server.en.md425-429 mcp_server/paddleocr_mcp/pipelines.py399-421
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.