PaddleOCR-VL Service Deployment

Relevant source files

This page documents how to deploy PaddleOCR-VL as a network-accessible API service, covering the two deployment methods (Docker Compose and manual), service architecture, client invocation patterns, and pipeline configuration tuning.

For background on PaddleOCR-VL's internal architecture and configuration parameters, see 2.2.1. For setting up inference acceleration frameworks (vLLM, SGLang, FastDeploy) that act as the VLM backend for the service, see 2.2.2. For general service deployment patterns shared across PaddleOCR pipelines, see 5.4.

Service Architecture

PaddleOCR-VL service deployment uses a two-tier architecture. The outer layer is the pipeline service (the PaddleOCR-VL API), which handles request routing, layout detection, preprocessing, and result aggregation. The inner layer is the VLM inference service, which performs the heavy Vision-Language Model inference and is typically backed by an acceleration framework.

Two-Tier Service Architecture

The PaddleOCRVL class in paddleocr/_pipelines/paddleocr_vl.py37-96 connects the two tiers via the vl_rec_server_url parameter. The pipeline sends image crops to the VLM service using the backend specified by vl_rec_backend.

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md183-217 paddleocr/_pipelines/paddleocr_vl.py27-96

Hardware Support for Service Deployment

Not all hardware variants support Docker Compose deployment. Manual deployment is always available where the hardware supports at least one inference method.

Hardware	Docker Compose	Manual Deployment	VLM Backend
NVIDIA GPU (standard)	✅	✅	vLLM, SGLang
NVIDIA Blackwell (sm120)	✅	✅	vLLM, SGLang
MetaX GPU	✅	✅	FastDeploy
Iluvatar GPU	✅	✅	FastDeploy
Hygon DCU	✅	✅	vLLM
Huawei Ascend NPU	✅	✅	vLLM
Apple Silicon	❌	✅	MLX-VLM
x64 CPU	❌	✅	llama.cpp

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md90-92 docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md183-193 docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.en.md1-20

Method 1: Deploy Using Docker Compose (Recommended)

Docker Compose is the recommended deployment method for hardware that supports it. It starts two containers automatically — one for the VLM inference service and one for the pipeline API — with no manual dependency installation required after pulling the images.

Docker Compose Startup Flow

Steps

Download compose.yaml and .env from the appropriate accelerator directory under deploy/paddleocr_vl_docker/accelerators/. For example, for standard NVIDIA GPU use nvidia-gpu/, for Blackwell use nvidia-gpu-sm120/.
In the directory containing both files, run:

The server will listen on port 8080 by default. A successful start produces:

paddleocr-vl-api  | INFO:     Application startup complete.
paddleocr-vl-api  | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

Environment Variables (`.env` File)

Variable	Description
`API_IMAGE_TAG_SUFFIX`	Image tag suffix for the pipeline service container
`VLM_BACKEND`	VLM inference backend (`vllm`, `sglang`, `fastdeploy`)
`VLM_IMAGE_TAG_SUFFIX`	Image tag suffix for the VLM inference service container

Customizing `compose.yaml`

Change the API service port — edit paddleocr-vl-api.ports:

Specify which GPU to use — edit the device_ids in deploy.resources.reservations.devices for both containers:

The same change must be applied to the paddleocr-vlm-server block.

Supply a custom VLM backend config — create a YAML config file (see 2.2.2 for VLM backend parameters), then mount it and reference it:

Offline Deployment

To deploy in an environment without internet access:

On an internet-connected machine, pull all images referenced in compose.yaml.
Save each image: docker save <image> -o <file>.tar
Transfer the .tar files to the offline machine.
Load them: docker load -i <file>.tar
Run docker compose up normally.

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md193-297 docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md32-35

Method 2: Manual Deployment

Manual deployment requires a running environment (either the official Docker image from section 1.1, or a manually installed environment from section 1.2 of the usage tutorial). It also requires a VLM inference service already running (see 2.2.2).

Starting the Pipeline Service

The pipeline service is started using paddleocr serving:

This starts a FastAPI-based HTTP server that exposes the PaddleOCR-VL pipeline as a REST API. The vl_rec_server_url points at an already-running VLM inference service.

Connection Between Pipeline and VLM Service

The PaddleOCRVL.__init__ method in paddleocr/_pipelines/paddleocr_vl.py38-88 accepts all the service connection parameters, which are forwarded to the underlying PaddleX pipeline.

Supported `vl_rec_backend` Values

Backend Identifier	Description
`native`	Local PaddlePaddle inference (no separate service)
`vllm-server`	Remote vLLM inference service
`sglang-server`	Remote SGLang inference service
`fastdeploy-server`	Remote FastDeploy inference service
`mlx-vlm-server`	Remote MLX-VLM inference service (Apple Silicon)
`llama-cpp-server`	Remote llama.cpp inference service

These values are defined in _SUPPORTED_VL_BACKENDS at paddleocr/_pipelines/paddleocr_vl.py27-34

Sources: paddleocr/_pipelines/paddleocr_vl.py27-96 docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md90-96

Client-Side Invocation

Once the service is running on port 8080, it can be called via HTTP from any client. The service accepts image files, PDF files, and URLs.

Python Client

curl Example

Response Format

The service response is a JSON object matching the structured output of the PaddleOCRVL.predict() call. Key fields include:

Field	Description
`input_path`	Path or URL of the input
`page_index`	Page number (for PDFs)
`model_settings`	Active pipeline flags
`layout_det_res`	Layout detection bounding boxes
`markdown`	Extracted content in Markdown format

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md647-656 docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md303-305

Pipeline Configuration Adjustment

The pipeline service supports runtime configuration via a PaddleX YAML configuration file. This controls model paths, preprocessing toggles, and other pipeline-level settings.

Exporting the Default Configuration

Key Configurable Parameters

Parameter	Default	Purpose
`use_doc_orientation_classify`	`False`	Enable document orientation detection
`use_doc_unwarping`	`False`	Enable geometric correction
`use_layout_detection`	`True`	Enable layout region detection
`use_chart_recognition`	`False`	Enable chart parsing
`use_seal_recognition`	`False`	Enable stamp/seal recognition
`use_ocr_for_image_block`	`False`	Run OCR on image regions
`layout_threshold`	`0.5`	Confidence threshold for layout boxes
`merge_layout_blocks`	`True`	Merge cross-column detection boxes

The exported YAML can be passed to the service at startup using the --paddlex_config parameter.

Using a Custom Config with Docker Compose

Mount the config file and supply it as an environment variable in compose.yaml:

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md268-644 docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md293-297 paddleocr/_pipelines/paddleocr_vl.py38-88

Integration with MCP Server

The PaddleOCR-VL service can act as a backend for the MCP (Model Context Protocol) server, enabling LLM agents to call document parsing via the self-hosted mode.

The PipelineHandler base class in mcp_server/paddleocr_mcp/pipelines.py130-244 handles both local and service-based execution. When ppocr_source is set to self_hosted, the MCP server forwards requests to the PaddleOCR-VL HTTP service via server_url.

MCP configuration for self-hosted PaddleOCR-VL:

For MCP server setup and additional operating modes, see 3.4.

Sources: docs/version3.x/deployment/mcp_server.en.md298-326 mcp_server/paddleocr_mcp/pipelines.py130-180 mcp_server/paddleocr_mcp/__main__.py34-46

Hardware-Specific Deployment Notes

Each hardware variant has its own Docker image tag. The table below lists the relevant image names and compose file paths.

Hardware	Pipeline Image Tag	VLM Image Tag	Compose Path
NVIDIA GPU (standard)	`latest-nvidia-gpu`	`latest-nvidia-gpu` (vllm/sglang)	`deploy/paddleocr_vl_docker/accelerators/nvidia-gpu/`
NVIDIA Blackwell (sm120)	`latest-nvidia-gpu-sm120`	`latest-nvidia-gpu-sm120`	`deploy/paddleocr_vl_docker/accelerators/nvidia-gpu-sm120/`
MetaX GPU	`latest-metax-gpu`	— (fastdeploy)	`deploy/paddleocr_vl_docker/accelerators/metax-gpu/`
Iluvatar GPU	`latest-iluvatar-gpu`	— (fastdeploy)	`deploy/paddleocr_vl_docker/accelerators/iluvatar-gpu/`
Huawei Ascend NPU	`latest-huawei-ascend-npu`	— (vllm)	hardware-specific compose
Apple Silicon	Not available	`mlx-vlm-server` (manual)	manual only

To use a specific PaddleOCR version instead of latest, substitute the tag prefix: paddleocr<major>.<minor>, e.g., paddleocr3.3.

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md143-157 docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md40-52 docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.en.md33-47 docs/version3.x/pipeline_usage/PaddleOCR-VL-MetaX-GPU.en.md20-43 docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md90-92

PaddleOCR-VL Service Deployment

Relevant source files

Service Architecture

Two-Tier Service Architecture

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md183-217 paddleocr/_pipelines/paddleocr_vl.py27-96

Hardware Support for Service Deployment

Not all hardware variants support Docker Compose deployment. Manual deployment is always available where the hardware supports at least one inference method.

Hardware	Docker Compose	Manual Deployment	VLM Backend
NVIDIA GPU (standard)	✅	✅	vLLM, SGLang
NVIDIA Blackwell (sm120)	✅	✅	vLLM, SGLang
MetaX GPU	✅	✅	FastDeploy
Iluvatar GPU	✅	✅	FastDeploy
Hygon DCU	✅	✅	vLLM
Huawei Ascend NPU	✅	✅	vLLM
Apple Silicon	❌	✅	MLX-VLM
x64 CPU	❌	✅	llama.cpp

Method 1: Deploy Using Docker Compose (Recommended)

Docker Compose Startup Flow

Steps

Download compose.yaml and .env from the appropriate accelerator directory under deploy/paddleocr_vl_docker/accelerators/. For example, for standard NVIDIA GPU use nvidia-gpu/, for Blackwell use nvidia-gpu-sm120/.
In the directory containing both files, run:

The server will listen on port 8080 by default. A successful start produces:

paddleocr-vl-api  | INFO:     Application startup complete.
paddleocr-vl-api  | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

Environment Variables (`.env` File)

Variable	Description
`API_IMAGE_TAG_SUFFIX`	Image tag suffix for the pipeline service container
`VLM_BACKEND`	VLM inference backend (`vllm`, `sglang`, `fastdeploy`)
`VLM_IMAGE_TAG_SUFFIX`	Image tag suffix for the VLM inference service container

Customizing `compose.yaml`

Change the API service port — edit paddleocr-vl-api.ports:

Specify which GPU to use — edit the device_ids in deploy.resources.reservations.devices for both containers:

The same change must be applied to the paddleocr-vlm-server block.

Supply a custom VLM backend config — create a YAML config file (see 2.2.2 for VLM backend parameters), then mount it and reference it:

Offline Deployment

To deploy in an environment without internet access:

On an internet-connected machine, pull all images referenced in compose.yaml.
Save each image: docker save <image> -o <file>.tar
Transfer the .tar files to the offline machine.
Load them: docker load -i <file>.tar
Run docker compose up normally.

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md193-297 docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md32-35

Method 2: Manual Deployment

Starting the Pipeline Service

The pipeline service is started using paddleocr serving:

This starts a FastAPI-based HTTP server that exposes the PaddleOCR-VL pipeline as a REST API. The vl_rec_server_url points at an already-running VLM inference service.

Connection Between Pipeline and VLM Service

The PaddleOCRVL.__init__ method in paddleocr/_pipelines/paddleocr_vl.py38-88 accepts all the service connection parameters, which are forwarded to the underlying PaddleX pipeline.

Supported `vl_rec_backend` Values

Backend Identifier	Description
`native`	Local PaddlePaddle inference (no separate service)
`vllm-server`	Remote vLLM inference service
`sglang-server`	Remote SGLang inference service
`fastdeploy-server`	Remote FastDeploy inference service
`mlx-vlm-server`	Remote MLX-VLM inference service (Apple Silicon)
`llama-cpp-server`	Remote llama.cpp inference service

These values are defined in _SUPPORTED_VL_BACKENDS at paddleocr/_pipelines/paddleocr_vl.py27-34

Sources: paddleocr/_pipelines/paddleocr_vl.py27-96 docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md90-96

Client-Side Invocation

Once the service is running on port 8080, it can be called via HTTP from any client. The service accepts image files, PDF files, and URLs.

Python Client

curl Example

Response Format

The service response is a JSON object matching the structured output of the PaddleOCRVL.predict() call. Key fields include:

Field	Description
`input_path`	Path or URL of the input
`page_index`	Page number (for PDFs)
`model_settings`	Active pipeline flags
`layout_det_res`	Layout detection bounding boxes
`markdown`	Extracted content in Markdown format

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md647-656 docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md303-305

Pipeline Configuration Adjustment

The pipeline service supports runtime configuration via a PaddleX YAML configuration file. This controls model paths, preprocessing toggles, and other pipeline-level settings.

Exporting the Default Configuration

Key Configurable Parameters

Parameter	Default	Purpose
`use_doc_orientation_classify`	`False`	Enable document orientation detection
`use_doc_unwarping`	`False`	Enable geometric correction
`use_layout_detection`	`True`	Enable layout region detection
`use_chart_recognition`	`False`	Enable chart parsing
`use_seal_recognition`	`False`	Enable stamp/seal recognition
`use_ocr_for_image_block`	`False`	Run OCR on image regions
`layout_threshold`	`0.5`	Confidence threshold for layout boxes
`merge_layout_blocks`	`True`	Merge cross-column detection boxes

The exported YAML can be passed to the service at startup using the --paddlex_config parameter.

Using a Custom Config with Docker Compose

Mount the config file and supply it as an environment variable in compose.yaml:

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md268-644 docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md293-297 paddleocr/_pipelines/paddleocr_vl.py38-88

Integration with MCP Server

The PaddleOCR-VL service can act as a backend for the MCP (Model Context Protocol) server, enabling LLM agents to call document parsing via the self-hosted mode.

MCP configuration for self-hosted PaddleOCR-VL:

For MCP server setup and additional operating modes, see 3.4.

Sources: docs/version3.x/deployment/mcp_server.en.md298-326 mcp_server/paddleocr_mcp/pipelines.py130-180 mcp_server/paddleocr_mcp/__main__.py34-46

Hardware-Specific Deployment Notes

Each hardware variant has its own Docker image tag. The table below lists the relevant image names and compose file paths.

Hardware	Pipeline Image Tag	VLM Image Tag	Compose Path
NVIDIA GPU (standard)	`latest-nvidia-gpu`	`latest-nvidia-gpu` (vllm/sglang)	`deploy/paddleocr_vl_docker/accelerators/nvidia-gpu/`
NVIDIA Blackwell (sm120)	`latest-nvidia-gpu-sm120`	`latest-nvidia-gpu-sm120`	`deploy/paddleocr_vl_docker/accelerators/nvidia-gpu-sm120/`
MetaX GPU	`latest-metax-gpu`	— (fastdeploy)	`deploy/paddleocr_vl_docker/accelerators/metax-gpu/`
Iluvatar GPU	`latest-iluvatar-gpu`	— (fastdeploy)	`deploy/paddleocr_vl_docker/accelerators/iluvatar-gpu/`
Huawei Ascend NPU	`latest-huawei-ascend-npu`	— (vllm)	hardware-specific compose
Apple Silicon	Not available	`mlx-vlm-server` (manual)	manual only

To use a specific PaddleOCR version instead of latest, substitute the tag prefix: paddleocr<major>.<minor>, e.g., paddleocr3.3.

PaddleOCR-VL Service Deployment

Service Architecture

Hardware Support for Service Deployment

Method 1: Deploy Using Docker Compose (Recommended)

Steps

Environment Variables (.env File)

Customizing compose.yaml

Offline Deployment

Method 2: Manual Deployment

Starting the Pipeline Service

Supported vl_rec_backend Values

Client-Side Invocation

Python Client

curl Example

Response Format

Pipeline Configuration Adjustment

Exporting the Default Configuration

Key Configurable Parameters

Using a Custom Config with Docker Compose

Integration with MCP Server

Hardware-Specific Deployment Notes

On this page

PaddleOCR-VL Service Deployment

Service Architecture

Hardware Support for Service Deployment

Method 1: Deploy Using Docker Compose (Recommended)

Steps

Environment Variables (.env File)

Customizing compose.yaml

Offline Deployment

Method 2: Manual Deployment

Starting the Pipeline Service

Supported vl_rec_backend Values

Client-Side Invocation

Python Client

curl Example

Response Format

Pipeline Configuration Adjustment

Exporting the Default Configuration

Key Configurable Parameters

Using a Custom Config with Docker Compose

Integration with MCP Server

Hardware-Specific Deployment Notes

On this page

Environment Variables (`.env` File)

Customizing `compose.yaml`

Supported `vl_rec_backend` Values

Environment Variables (`.env` File)

Customizing `compose.yaml`

Supported `vl_rec_backend` Values