Alternative Accelerators

Relevant source files

Purpose and Scope

This page documents support for non-NVIDIA GPU accelerators in PaddleOCR, including Chinese domestic AI chips and other hardware platforms. It covers device-specific requirements, installation procedures, inference backend compatibility, and deployment patterns for Kunlunxin XPU, Huawei Ascend NPU, Cambricon MLU, Hygon DCU, MetaX GPU, Iluvatar GPU, and Apple Silicon.

For NVIDIA GPU support and TensorRT optimization, see 7.1. For CPU-specific optimizations, see 7.3.

Supported Alternative Accelerators

PaddleOCR supports inference and deployment on the following alternative accelerator platforms:

Accelerator Type	Device Identifier	PaddlePaddle Plugin	Docker Support	Production Status
Kunlunxin XPU	`xpu`	`paddle-kunlunxin-xpu`	✅	✅ Verified on XPU R200
Huawei Ascend NPU	`npu`	`paddle-custom-npu`	✅	✅ Verified on 910B
Cambricon MLU	`mlu`	`paddle-custom-mlu`	✅	✅ Verified on MLU370
Hygon DCU	`dcu`	`paddle-custom-dcu`	✅	✅ Verified on Z100
MetaX GPU	`metax_gpu`	`paddle-metax-gpu`	✅	✅ Verified on C550
Iluvatar GPU	`iluvatar_gpu`	`paddle-iluvatar-gpu`	✅	✅ Verified on BI-V150
Apple Silicon	`cpu`	Native PaddlePaddle	❌	✅ Verified on M4

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md41-155 docs/version3.x/pipeline_usage/PaddleOCR-VL.md41-155

Inference Backend Compatibility Matrix

Different accelerators support different inference acceleration frameworks. The following table shows compatibility for PaddleOCR-VL (similar patterns apply to other pipelines):

Inference Backend Compatibility for PaddleOCR-VL

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md45-127 docs/version3.x/pipeline_usage/PaddleOCR-VL.md45-127

Device-Specific Requirements

Kunlunxin XPU

Hardware: Kunlunxin XPU R200 and compatible devices

Installation:

Device Specification:

Docker Images:

Base: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-kunlunxin-xpu
FastDeploy Server: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-kunlunxin-xpu

Supported Backends: PaddlePaddle, FastDeploy (experimental vLLM support in progress)

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md148-149 Installation commands from XPU tutorial pattern

Huawei Ascend NPU

Hardware: Huawei Ascend 910B and compatible NPUs

Installation:

Docker Run Requirements:

Important Note: Native PaddlePaddle inference is limited on NPU. Use vLLM backend for production:

Supported Backends: vLLM (recommended), limited PaddlePaddle support

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-Huawei-Ascend-NPU.en.md1-96 docs/version3.x/pipeline_usage/PaddleOCR-VL-Huawei-Ascend-NPU.md1-96

Hygon DCU

Hardware: Hygon DCU Z100 and compatible devices

Installation:

Device Specification:

Docker Images:

Base: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-hygon-dcu
vLLM Server: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-hygon-dcu

Supported Backends: PaddlePaddle, vLLM

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md150

MetaX GPU

Hardware: MetaX C550 and compatible GPUs

Installation:

Docker Run Requirements:

Device Specification:

Supported Backends: PaddlePaddle, FastDeploy

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-MetaX-GPU.en.md1-92 docs/version3.x/pipeline_usage/PaddleOCR-VL-MetaX-GPU.md1-92

Iluvatar GPU

Hardware: Iluvatar BI-V150 (天数天垓 150) and compatible GPUs

Installation:

Docker Run Requirements:

Device Specification:

Supported Backends: PaddlePaddle, FastDeploy

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.en.md1-96 docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.md1-96

Apple Silicon

Hardware: Apple M1, M2, M3, M4 chips

Installation:

Device Specification:

MLX-VLM Acceleration (Recommended):

Supported Backends: PaddlePaddle (CPU mode), MLX-VLM (recommended for VLM models)

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md1-109 docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.md1-109

Device Selection and Initialization Flow

Device Initialization and Backend Selection Flow

The device parameter flows through the system as follows:

CLI Specification: --device parameter in command line
Python API Specification: device parameter in pipeline constructor
Internal Processing: Device string is parsed and validated
- Format: {device_type} or {device_type}:{device_id}
- Examples: xpu, xpu:0, dcu:1, iluvatar_gpu:0
PaddlePaddle Device Setting: Internally calls paddle.set_device()
- Location: Device setup occurs during pipeline initialization
- Effect: All subsequent model loading and inference uses specified device

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md580-595 docs/version3.x/pipeline_usage/PaddleOCR-VL.md552-569

Multi-Device and Parallel Inference

Alternative accelerators support parallel inference across multiple devices:

Parallel Execution Pattern:

Multi-Device Parallel Processing Architecture

When multiple devices are specified:

Separate pipeline instance created per device
Input data automatically distributed across instances
Results aggregated in input order
Queue-based processing for PDFs and directories

Sources: docs/version3.x/pipeline_usage/instructions/parallel_inference.en.md1-32

Docker Compose Deployment Patterns

For production deployment on alternative accelerators, PaddleOCR provides Docker Compose configurations that combine:

VLM inference acceleration server (vLLM/FastDeploy)
PaddleOCR API server (pipeline orchestration)

Deployment Architecture

Docker Compose Deployment Architecture for Alternative Accelerators

Example: Kunlunxin XPU Deployment

Compose File Location: deploy/paddleocr_vl_docker/accelerators/kunlunxin-xpu/

Key Environment Variables (.env):

Starting the Service:

Service Endpoints:

API Server: http://localhost:8080/predict
VLM Server (internal): http://localhost:8118/v1/chat/completions

Example: Hygon DCU Deployment

Compose File Location: deploy/paddleocr_vl_docker/accelerators/hygon-dcu/

Key Difference: Uses vLLM backend instead of FastDeploy

Device Selection in compose.yaml:

Common Customizations

Change Service Port:

Adjust Device Assignment:

Mount Custom Configuration:

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md193-292 docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.en.md135-233

Image Naming Convention

Docker images for alternative accelerators follow a consistent naming pattern:

ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/{image-type}:{version}-{accelerator}-{offline}

Components:

{image-type}: paddleocr-vl, paddleocr-genai-vllm-server, paddleocr-genai-fastdeploy-server
{version}: latest or paddleocr{major}.{minor} (e.g., paddleocr3.3)
{accelerator}: kunlunxin-xpu, hygon-dcu, metax-gpu, iluvatar-gpu, huawei-npu
{offline}: Optional -offline suffix for images with bundled models

Examples:

paddleocr-vl:latest-kunlunxin-xpu (online, requires internet for model download)
paddleocr-vl:latest-kunlunxin-xpu-offline (offline, models included)
paddleocr-vl:paddleocr3.3-hygon-dcu (specific version)
paddleocr-genai-vllm-server:latest-hygon-dcu-offline (vLLM server, offline)

Image Size Reference:

Accelerator	Base Image Size	Offline Image Size	VLM Server (Offline)
Kunlunxin XPU	~12 GB	~14 GB	~15 GB
Hygon DCU	~10 GB	~12 GB	~15 GB
MetaX GPU	~32 GB	~34 GB	~39 GB
Iluvatar GPU	~37 GB	~39 GB	~40 GB
Huawei NPU	~28 GB	~30 GB	~20 GB

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md46-51 Various accelerator tutorial files

Best Practices and Recommendations

Device Selection Guidelines

For Training:
- Most alternative accelerators support training through PaddlePaddle
- Check device plugin version compatibility with PaddlePaddle version
- Use native PaddlePaddle backend for training
For Production Inference:
- Kunlunxin XPU: Use FastDeploy for VLM acceleration
- Hygon DCU: Use vLLM for best performance
- Huawei NPU: Use vLLM (native PaddlePaddle has limitations)
- MetaX/Iluvatar GPU: Use FastDeploy for acceleration
- Apple Silicon: Use MLX-VLM for VLM models
For Development/Testing:
- Use native PaddlePaddle backend for compatibility
- Docker images simplify environment setup
- Online images allow faster iteration (smaller size)

Performance Considerations

Backend Selection Impact:

Accelerator	Native PaddlePaddle	Acceleration Backend	Speedup Factor
Kunlunxin XPU	Baseline	FastDeploy	~2-3x
Hygon DCU	Baseline	vLLM	~3-5x
Huawei NPU	Limited	vLLM	Required
MetaX GPU	Baseline	FastDeploy	~2-4x
Iluvatar GPU	Baseline	FastDeploy	~2-4x
Apple Silicon	Baseline	MLX-VLM	~5-10x

Memory Optimization:

Use offline images only for air-gapped environments (larger size)
Configure --shm-size appropriately (recommend 64g for VLM models)
Adjust VLM server memory settings via backend config files

Common Issues and Solutions

Issue: Device Plugin Not Found

Issue: Docker Container Cannot Access Device

Issue: vLLM/FastDeploy Version Incompatibility

Issue: Out of Memory on Accelerator

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md139-141 docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md109-120

Version Compatibility

PaddlePaddle Version Requirements:

Minimum: 3.2.0 (for most devices)
Recommended: 3.2.1+
Device plugins must match PaddlePaddle major.minor version

Checking Compatibility:

Image Tag Versioning:

latest-*: Tracks latest PaddleOCR release
paddleocr{major}.{minor}-*: Pinned to specific PaddleOCR version
Always use version-pinned tags for production

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md36 docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.en.md69

Alternative Accelerators

Relevant source files

Purpose and Scope

For NVIDIA GPU support and TensorRT optimization, see 7.1. For CPU-specific optimizations, see 7.3.

Supported Alternative Accelerators

PaddleOCR supports inference and deployment on the following alternative accelerator platforms:

Accelerator Type	Device Identifier	PaddlePaddle Plugin	Docker Support	Production Status
Kunlunxin XPU	`xpu`	`paddle-kunlunxin-xpu`	✅	✅ Verified on XPU R200
Huawei Ascend NPU	`npu`	`paddle-custom-npu`	✅	✅ Verified on 910B
Cambricon MLU	`mlu`	`paddle-custom-mlu`	✅	✅ Verified on MLU370
Hygon DCU	`dcu`	`paddle-custom-dcu`	✅	✅ Verified on Z100
MetaX GPU	`metax_gpu`	`paddle-metax-gpu`	✅	✅ Verified on C550
Iluvatar GPU	`iluvatar_gpu`	`paddle-iluvatar-gpu`	✅	✅ Verified on BI-V150
Apple Silicon	`cpu`	Native PaddlePaddle	❌	✅ Verified on M4

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md41-155 docs/version3.x/pipeline_usage/PaddleOCR-VL.md41-155

Inference Backend Compatibility Matrix

Different accelerators support different inference acceleration frameworks. The following table shows compatibility for PaddleOCR-VL (similar patterns apply to other pipelines):

Inference Backend Compatibility for PaddleOCR-VL

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md45-127 docs/version3.x/pipeline_usage/PaddleOCR-VL.md45-127

Device-Specific Requirements

Kunlunxin XPU

Hardware: Kunlunxin XPU R200 and compatible devices

Installation:

Device Specification:

Docker Images:

Base: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-kunlunxin-xpu
FastDeploy Server: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-kunlunxin-xpu

Supported Backends: PaddlePaddle, FastDeploy (experimental vLLM support in progress)

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md148-149 Installation commands from XPU tutorial pattern

Huawei Ascend NPU

Hardware: Huawei Ascend 910B and compatible NPUs

Installation:

Docker Run Requirements:

Important Note: Native PaddlePaddle inference is limited on NPU. Use vLLM backend for production:

Supported Backends: vLLM (recommended), limited PaddlePaddle support

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-Huawei-Ascend-NPU.en.md1-96 docs/version3.x/pipeline_usage/PaddleOCR-VL-Huawei-Ascend-NPU.md1-96

Hygon DCU

Hardware: Hygon DCU Z100 and compatible devices

Installation:

Device Specification:

Docker Images:

Base: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-hygon-dcu
vLLM Server: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-hygon-dcu

Supported Backends: PaddlePaddle, vLLM

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md150

MetaX GPU

Hardware: MetaX C550 and compatible GPUs

Installation:

Docker Run Requirements:

Device Specification:

Supported Backends: PaddlePaddle, FastDeploy

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-MetaX-GPU.en.md1-92 docs/version3.x/pipeline_usage/PaddleOCR-VL-MetaX-GPU.md1-92

Iluvatar GPU

Hardware: Iluvatar BI-V150 (天数天垓 150) and compatible GPUs

Installation:

Docker Run Requirements:

Device Specification:

Supported Backends: PaddlePaddle, FastDeploy

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.en.md1-96 docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.md1-96

Apple Silicon

Hardware: Apple M1, M2, M3, M4 chips

Installation:

Device Specification:

MLX-VLM Acceleration (Recommended):

Supported Backends: PaddlePaddle (CPU mode), MLX-VLM (recommended for VLM models)

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md1-109 docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.md1-109

Device Selection and Initialization Flow

Device Initialization and Backend Selection Flow

The device parameter flows through the system as follows:

CLI Specification: --device parameter in command line
Python API Specification: device parameter in pipeline constructor
Internal Processing: Device string is parsed and validated
- Format: {device_type} or {device_type}:{device_id}
- Examples: xpu, xpu:0, dcu:1, iluvatar_gpu:0
PaddlePaddle Device Setting: Internally calls paddle.set_device()
- Location: Device setup occurs during pipeline initialization
- Effect: All subsequent model loading and inference uses specified device

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md580-595 docs/version3.x/pipeline_usage/PaddleOCR-VL.md552-569

Multi-Device and Parallel Inference

Alternative accelerators support parallel inference across multiple devices:

Parallel Execution Pattern:

Multi-Device Parallel Processing Architecture

When multiple devices are specified:

Separate pipeline instance created per device
Input data automatically distributed across instances
Results aggregated in input order
Queue-based processing for PDFs and directories

Sources: docs/version3.x/pipeline_usage/instructions/parallel_inference.en.md1-32

Docker Compose Deployment Patterns

For production deployment on alternative accelerators, PaddleOCR provides Docker Compose configurations that combine:

VLM inference acceleration server (vLLM/FastDeploy)
PaddleOCR API server (pipeline orchestration)

Deployment Architecture

Docker Compose Deployment Architecture for Alternative Accelerators

Example: Kunlunxin XPU Deployment

Compose File Location: deploy/paddleocr_vl_docker/accelerators/kunlunxin-xpu/

Key Environment Variables (.env):

Starting the Service:

Service Endpoints:

API Server: http://localhost:8080/predict
VLM Server (internal): http://localhost:8118/v1/chat/completions

Example: Hygon DCU Deployment

Compose File Location: deploy/paddleocr_vl_docker/accelerators/hygon-dcu/

Key Difference: Uses vLLM backend instead of FastDeploy

Device Selection in compose.yaml:

Common Customizations

Change Service Port:

Adjust Device Assignment:

Mount Custom Configuration:

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md193-292 docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.en.md135-233

Image Naming Convention

Docker images for alternative accelerators follow a consistent naming pattern:

ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/{image-type}:{version}-{accelerator}-{offline}

Components:

{image-type}: paddleocr-vl, paddleocr-genai-vllm-server, paddleocr-genai-fastdeploy-server
{version}: latest or paddleocr{major}.{minor} (e.g., paddleocr3.3)
{accelerator}: kunlunxin-xpu, hygon-dcu, metax-gpu, iluvatar-gpu, huawei-npu
{offline}: Optional -offline suffix for images with bundled models

Examples:

paddleocr-vl:latest-kunlunxin-xpu (online, requires internet for model download)
paddleocr-vl:latest-kunlunxin-xpu-offline (offline, models included)
paddleocr-vl:paddleocr3.3-hygon-dcu (specific version)
paddleocr-genai-vllm-server:latest-hygon-dcu-offline (vLLM server, offline)

Image Size Reference:

Accelerator	Base Image Size	Offline Image Size	VLM Server (Offline)
Kunlunxin XPU	~12 GB	~14 GB	~15 GB
Hygon DCU	~10 GB	~12 GB	~15 GB
MetaX GPU	~32 GB	~34 GB	~39 GB
Iluvatar GPU	~37 GB	~39 GB	~40 GB
Huawei NPU	~28 GB	~30 GB	~20 GB

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md46-51 Various accelerator tutorial files

Best Practices and Recommendations

Device Selection Guidelines

For Training:
- Most alternative accelerators support training through PaddlePaddle
- Check device plugin version compatibility with PaddlePaddle version
- Use native PaddlePaddle backend for training
For Production Inference:
- Kunlunxin XPU: Use FastDeploy for VLM acceleration
- Hygon DCU: Use vLLM for best performance
- Huawei NPU: Use vLLM (native PaddlePaddle has limitations)
- MetaX/Iluvatar GPU: Use FastDeploy for acceleration
- Apple Silicon: Use MLX-VLM for VLM models
For Development/Testing:
- Use native PaddlePaddle backend for compatibility
- Docker images simplify environment setup
- Online images allow faster iteration (smaller size)

Performance Considerations

Backend Selection Impact:

Accelerator	Native PaddlePaddle	Acceleration Backend	Speedup Factor
Kunlunxin XPU	Baseline	FastDeploy	~2-3x
Hygon DCU	Baseline	vLLM	~3-5x
Huawei NPU	Limited	vLLM	Required
MetaX GPU	Baseline	FastDeploy	~2-4x
Iluvatar GPU	Baseline	FastDeploy	~2-4x
Apple Silicon	Baseline	MLX-VLM	~5-10x

Memory Optimization:

Use offline images only for air-gapped environments (larger size)
Configure --shm-size appropriately (recommend 64g for VLM models)
Adjust VLM server memory settings via backend config files

Common Issues and Solutions

Issue: Device Plugin Not Found

Issue: Docker Container Cannot Access Device

Issue: vLLM/FastDeploy Version Incompatibility

Issue: Out of Memory on Accelerator

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md139-141 docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md109-120

Version Compatibility

PaddlePaddle Version Requirements:

Minimum: 3.2.0 (for most devices)
Recommended: 3.2.1+
Device plugins must match PaddlePaddle major.minor version

Checking Compatibility:

Image Tag Versioning:

latest-*: Tracks latest PaddleOCR release
paddleocr{major}.{minor}-*: Pinned to specific PaddleOCR version
Always use version-pinned tags for production

Sources: docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md36 docs/version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.en.md69

Alternative Accelerators

Purpose and Scope

Supported Alternative Accelerators

Inference Backend Compatibility Matrix

Device-Specific Requirements

Kunlunxin XPU

Huawei Ascend NPU

Hygon DCU

MetaX GPU

Iluvatar GPU

Apple Silicon

Device Selection and Initialization Flow

Multi-Device and Parallel Inference

Docker Compose Deployment Patterns

Deployment Architecture

Example: Kunlunxin XPU Deployment

Example: Hygon DCU Deployment

Common Customizations

Image Naming Convention

Best Practices and Recommendations

Device Selection Guidelines

Performance Considerations

Common Issues and Solutions

Version Compatibility

On this page

Alternative Accelerators

Purpose and Scope

Supported Alternative Accelerators

Inference Backend Compatibility Matrix

Device-Specific Requirements

Kunlunxin XPU

Huawei Ascend NPU

Hygon DCU

MetaX GPU

Iluvatar GPU

Apple Silicon

Device Selection and Initialization Flow

Multi-Device and Parallel Inference

Docker Compose Deployment Patterns

Deployment Architecture

Example: Kunlunxin XPU Deployment

Example: Hygon DCU Deployment

Common Customizations

Image Naming Convention

Best Practices and Recommendations

Device Selection Guidelines

Performance Considerations

Common Issues and Solutions

Version Compatibility

On this page