Hardware Acceleration

Relevant source files

This page documents MinerU's hardware acceleration infrastructure, which enables GPU/NPU-accelerated inference for Vision-Language Models (VLM) and other deep learning components. The system provides a unified abstraction layer supporting NVIDIA GPUs, 11 domestic Chinese accelerator cards, Apple Silicon, and CPU-only execution.

For information about the VLM backend that uses these accelerators, see VLM Backend. For deployment patterns with acceleration cards, see Deployment.

Purpose and Scope

Hardware acceleration in MinerU serves two primary functions:

VLM Inference Acceleration: Vision-language models like Qwen2-VL benefit significantly from GPU/NPU acceleration, reducing inference time from minutes to seconds per page.
Pipeline Component Acceleration: Traditional models (layout detection, OCR, formula recognition) can also leverage hardware acceleration for improved throughput.

The system is designed to:

Automatically detect available hardware capabilities
Select appropriate inference engines based on device type
Apply device-specific optimizations and configurations
Provide fallback to CPU when accelerators are unavailable

Device Detection and Abstraction

Device Detection Hierarchy

MinerU implements a priority-based device detection system in mineru/utils/config_reader.py75-108:

Detection Priority:

Environment variable MINERU_DEVICE_MODE (highest priority - manual override)
NVIDIA CUDA (torch.cuda.is_available())
Apple MPS (torch.backends.mps.is_available())
Huawei Ascend NPU (torch_npu.npu.is_available())
Enflame GCU (torch.gcu.is_available())
MooreThreads MUSA (torch.musa.is_available())
Cambricon MLU (torch.mlu.is_available())
Tecorigin SDAA (torch.sdaa.is_available())
CPU (fallback)

Sources: mineru/utils/config_reader.py75-108

VRAM Detection and Configuration

The get_vram() function in mineru/utils/model_utils.py450-486 provides unified VRAM querying:

Device Type	Detection Method	Unit
CUDA	`torch.cuda.get_device_properties(device).total_memory`	GB (converted from bytes)
NPU (Ascend)	`torch_npu.npu.get_device_properties(device).total_memory`	GB
GCU (Enflame)	`torch.gcu.get_device_properties(device).total_memory`	GB
MUSA (MooreThreads)	`torch.musa.get_device_properties(device).total_memory`	GB
MLU (Cambricon)	`torch.mlu.get_device_properties(device).total_memory`	GB
SDAA (Tecorigin)	`torch.sdaa.get_device_properties(device).total_memory`	GB
Manual Override	`MINERU_VIRTUAL_VRAM_SIZE` env var	GB

Memory Management:

The clean_memory() function mineru/utils/model_utils.py416-438 provides device-specific cache clearing
Each device type has a corresponding .empty_cache() method
Garbage collection is triggered when VRAM drops below configurable thresholds

Sources: mineru/utils/model_utils.py416-486

Supported Hardware Platforms

Overview Table

Vendor	Product	Device Type	vLLM	LMDeploy	Notes
NVIDIA	Volta+, CUDA 12.8+	`cuda`	✅	✅	Full support, preferred platform
Huawei	Ascend 910B2, Atlas A2/A3	`npu`	✅	✅	Full support, data parallel supported
METAX	C500	`maca`	✅	✅	Full support
T-Head	ZW810E	`ppu`	✅	✅	Full support
Enflame	S60	`gcu`	✅	❌	vLLM only
Kunlunxin	P800	`kxpu`	✅	❌	vLLM only
Hygon	BW200	`dcu`	✅	❌	vLLM only
IluvatarCorex	BI-V150	`corex`	✅	❌	vLLM only, memory leak issue
Cambricon	MLU590	`mlu`	⚠️	⚠️	vLLM v0 limitations, async engine issues
MooreThreads	MTT S4000	`musa`	⚠️	❌	vLLM v0 limitations, async engine issues
Tecorigin	T100	`sdaa`	✅	❌	vLLM only
Apple	M1/M2/M3	`mps`	❌	❌	MLX engine for VLM, MPS for pipeline

Legend: ✅ Full Support, ⚠️ Partial Support, ❌ Not Supported

Sources: docs/zh/usage/acceleration_cards/Ascend.md97-175 docs/zh/usage/acceleration_cards/Cambricon.md73-155 docs/zh/usage/acceleration_cards/METAX.md70-148

NVIDIA GPU Support

Requirements:

Compute Capability: 7.0+ (Volta architecture or newer)
CUDA Version: 12.8+ recommended
Driver: Compatible with CUDA version

Inference Engine Support:

Device Selection:

Environment variable: CUDA_VISIBLE_DEVICES
Example: CUDA_VISIBLE_DEVICES=0,1 mineru --backend vlm-auto-engine pdf/sample.pdf

Sources: mineru/backend/vlm/utils.py11-56 mineru/backend/vlm/utils.py59-79

Domestic Chinese Accelerators

Huawei Ascend NPU

Supported Models:

Atlas A2 training series (800T A2, 900 A2 PoD, 200T A2 Box16, 300T A2)
Atlas 800I A2 inference series
Atlas A3 training series (800T A3, 900 A3 SuperPoD, 9000 A3 SuperPoD)
Atlas 800I A3 inference series
Atlas 300I Duo (Experimental)

Docker Configuration:

Device Selection:

Environment variable: ASCEND_RT_VISIBLE_DEVICES
Monitoring: npu-smi info

Special Considerations for Atlas 300I Duo:

Does not support graph mode or BF16 precision
Must add --enforce-eager --dtype float16 to all vLLM commands

Sources: docs/zh/usage/acceleration_cards/Ascend.md1-179

Other Accelerators

Cambricon MLU:

Device variable: MLU_VISIBLE_DEVICES
Monitoring: cnmon
Limitations: vLLM v0 engine only, async engine compatibility issues
Known issue: Batch processing limitations with LMDeploy

METAX GPU (MACA):

Device variable: CUDA_VISIBLE_DEVICES (uses CUDA-like API)
Monitoring: mx-smi
Full support for both vLLM and LMDeploy

T-Head PPU:

Device mappings: --device=/dev/alixpu, --device=/dev/alixpu_ctl
Monitoring: ppu-smi
Full support for both inference engines

Enflame GCU:

Device variable: TOPS_VISIBLE_DEVICES
Monitoring: efsmi
vLLM only

Kunlunxin XPU:

Device variable: XPU_VISIBLE_DEVICES
Monitoring: xpu-smi
vLLM only, requires custom GELU activation patch

Sources: docs/zh/usage/acceleration_cards/Cambricon.md1-160 docs/zh/usage/acceleration_cards/METAX.md1-152 docs/zh/usage/acceleration_cards/THead.md1-143 docs/zh/usage/acceleration_cards/Enflame.md1-110 docs/zh/usage/acceleration_cards/Kunlunxin.md1-124

Apple Silicon Support

Supported Models: M1, M2, M3 (macOS 13.5+)

Backend Selection:

MLX Engine Requirements:

macOS version check: mineru/utils/check_sys_env.py validates platform.mac_ver() >= 13.5
Model loading: mineru/backend/vlm/vlm_analyze.py85-93 uses mlx_vlm.load()
Only available for VLM backend (not for pipeline components)

Sources: mineru/backend/vlm/vlm_analyze.py85-93 mineru/utils/config_reader.py82-83

Inference Engine Configuration

Engine Selection Logic

The ModelSingleton class in mineru/backend/vlm/vlm_analyze.py23-219 initializes models with device-specific configurations:

Sources: mineru/backend/vlm/vlm_analyze.py23-219

vLLM Configuration

Device-Specific Parameter Injection

The mod_kwargs_by_device_type() function mineru/backend/vlm/utils.py172-233 applies device-specific optimizations:

Device Configurations:

Device	Configuration Parameters
`corex` (IluvatarCorex)	`compilation_config_dict`: `{"cudagraph_mode": "FULL_DECODE_ONLY", "level": 0}`
`kxpu` (Kunlunxin)	`compilation_config_dict`: Custom splitting ops `block_size`: 128 `dtype`: "float16" `distributed_executor_backend`: "mp" `enable_chunked_prefill`: False `enable_prefix_caching`: False

Sources: mineru/backend/vlm/utils.py111-233

GPU Memory Utilization

The set_default_gpu_memory_utilization() function mineru/backend/vlm/utils.py82-89 adapts to available VRAM:

vLLM Version	VRAM Size	Memory Utilization
>= 0.11.0	<= 8 GB	0.7 (70%)
>= 0.11.0	> 8 GB	0.5 (50%)
< 0.11.0	Any	0.5 (50%)

Override: Set gpu_memory_utilization in kwargs explicitly to override defaults.

Sources: mineru/backend/vlm/utils.py82-89

Custom Logits Processors

The enable_custom_logits_processors() function mineru/backend/vlm/utils.py11-56 determines eligibility:

Requirements (all must be met):

Device has CUDA/NPU/GCU/MUSA/MLU/SDAA support
VLLM_USE_V1 != 0 (v1 engine enabled)
vLLM version >= 0.10.1
Compute Capability >= 8.0 (or vLLM >= 0.10.2 for CC < 8.0)

Processor Registration:

Sources: mineru/backend/vlm/utils.py11-56 mineru/backend/vlm/vlm_analyze.py118-120

LMDeploy Configuration

Backend Selection

The set_lmdeploy_backend() function mineru/backend/vlm/utils.py59-79 chooses between pytorch and turbomind:

Device Type	Preferred Backend	Fallback Conditions
`ascend`, `maca`, `camb`	`pytorch`	Always pytorch
`cuda` (Windows)	`turbomind`	Always turbomind
`cuda` (Linux, CC >= 8.0)	`pytorch`	Ampere+ architecture
`cuda` (Linux, CC < 8.0)	`turbomind`	Volta/Turing architecture

Device Type Configuration:

Sources: mineru/backend/vlm/vlm_analyze.py156-202 mineru/backend/vlm/utils.py59-79

Engine Initialization

Sources: mineru/backend/vlm/vlm_analyze.py156-201

Docker Deployment for Acceleration Cards

Dockerfile Structure

Each acceleration card has a dedicated Dockerfile in docker/china/:

Example: Ascend NPU Dockerfile docker/china/npu.Dockerfile

Example: Cambricon MLU Dockerfile docker/china/mlu.Dockerfile1-43

Conditional base image selection (vLLM or LMDeploy)
ARG-based backend switching with sed commands

Example: Enflame GCU Dockerfile docker/china/gcu.Dockerfile1-30

Custom mirror sources for package installation
Pre-configured vLLM environment

Sources: docker/china/npu.Dockerfile docker/china/mlu.Dockerfile1-43 docker/china/gcu.Dockerfile1-30

Common Docker Run Patterns

Volume and Device Mappings

Accelerator	Device Mappings	Volume Mounts
Ascend NPU	`/dev/davinci*`, `/dev/davinci_manager`, `/dev/devmm_svm`, `/dev/hisi_hdc`	`/var/log/npu/`, `/usr/local/dcmi`, `/usr/local/Ascend/driver`, `/usr/local/bin/npu-smi`
Cambricon MLU	`-v /dev:/dev`	`/lib/modules`, `/usr/bin/cnmon`
METAX GPU	`/dev/mem`, `/dev/dri`, `/dev/mxcd`, `/dev/infiniband`	`/datapool` (workspace)
T-Head PPU	`/dev/alixpu`, `/dev/alixpu_ctl`	`/mnt`, `/datapool`
IluvatarCorex	`-v /dev:/dev`	`/usr/src`, `/lib/modules`
Kunlunxin XPU	`/dev/xpu*`, `/dev/xpuctrl`	`/usr/local/bin/xpu-smi`
Hygon DCU	`/dev/kfd`, `/dev/mkfd`, `/dev/dri`	`/opt/hyhal`

Common Flags:

--privileged: Required for most accelerators
--ipc=host / --network=host: Shared memory and networking
--shm-size: Large shared memory (often 100-500GB for multi-card setups)
--ulimit memlock=-1: Unlimited locked memory

Sources: docs/zh/usage/acceleration_cards/Ascend.md54-94 docs/zh/usage/acceleration_cards/Cambricon.md32-63 docs/zh/usage/acceleration_cards/METAX.md38-66

Environment Variables

Variable	Purpose	Example Values
`MINERU_MODEL_SOURCE`	Skip model downloads	`local`
`MINERU_LMDEPLOY_DEVICE`	Set LMDeploy device	`ascend`, `maca`, `camb`
`MINERU_VLLM_DEVICE`	Set vLLM device-specific config	`corex`, `kxpu`, `musa`
`MINERU_DEVICE_MODE`	Override device detection	`cuda`, `npu`, `mps`
`MINERU_VIRTUAL_VRAM_SIZE`	Mock VRAM size for testing	`8`, `16`, `32` (GB)
`VLLM_WORKER_MULTIPROC_METHOD`	vLLM multiprocessing method	`spawn` (required for Ascend)
`VLLM_USE_V1`	Enable vLLM v1 engine	`0`, `1`
`VLLM_ENFORCE_CUDA_GRAPH`	Force CUDA graph mode	`1` (IluvatarCorex)
`OMP_NUM_THREADS`	OpenMP thread count	`1` (set by vLLM/LMDeploy init)

Sources: mineru/backend/vlm/vlm_analyze.py95-96 mineru/model/vlm/vllm_server.py59-60

Performance Optimization

Batch Size Configuration

The set_default_batch_size() function mineru/backend/vlm/utils.py92-108 for transformers backend:

VRAM Size	Batch Size	Rationale
>= 16 GB	8	Full batch processing
>= 8 GB, < 16 GB	4	Moderate batching
< 8 GB	1	Sequential processing

Override: Pass batch_size in kwargs to ModelSingleton.get_model().

Sources: mineru/backend/vlm/utils.py92-108

Memory Management

Automatic Cache Clearing

The clean_memory() function mineru/utils/model_utils.py416-438 provides device-specific cleanup:

The clean_vram() wrapper mineru/utils/model_utils.py441-447 applies cleanup selectively:

Only runs when total_memory <= vram_threshold (default 8 GB)
Logs cleanup time for debugging

Sources: mineru/utils/model_utils.py416-447

Reading Order Optimization

Layout sorting uses LayoutLMv3 model on GPU when available mineru/utils/block_sort.py179-231:

BF16 Support Detection:

Sources: mineru/utils/block_sort.py179-231

Troubleshooting and Limitations

Known Issues by Platform

Cambricon MLU

Issue: vLLM v0 engine compatibility
Impact: Async engine unavailable, affects fastapi/gradio auto-engine modes
Workaround: Use http-client mode with mineru-openai-server
Status: Awaiting vLLM v1 support from vendor

MooreThreads MUSA

Issue: vLLM v0 engine compatibility
Impact: Same as Cambricon MLU
Additional: Precision alignment issues in some scenarios
Workaround: Use http-client mode

IluvatarCorex

Issue: Memory leak after service termination
Impact: VRAM not released properly when vLLM server stops
Workaround: Restart Docker container to clear VRAM
Environment: Set VLLM_ENFORCE_CUDA_GRAPH=1

Ascend NPU (Atlas 300I Duo)

Issue: Graph mode and BF16 not supported
Impact: Slower inference, higher memory usage
Required Flags: --enforce-eager --dtype float16 on all vLLM commands

Sources: docs/zh/usage/acceleration_cards/Cambricon.md66-78 docs/zh/usage/acceleration_cards/MooreThreads.md44-47 docs/zh/usage/acceleration_cards/IluvatarCorex.md50-51 docs/zh/usage/acceleration_cards/Ascend.md88-93

Device Selection Best Practices

Explicit Device Specification: Use environment variables to force specific devices when auto-detection fails
Multi-Device Systems: Use vendor-specific visibility variables
Memory Monitoring: Check accelerator usage before launching
- NVIDIA: nvidia-smi
- Ascend: npu-smi info
- Cambricon: cnmon
- METAX: mx-smi
- Enflame: efsmi
- Kunlunxin: xpu-smi
- Hygon: hy-smi
- IluvatarCorex: ixsmi
- T-Head: ppu-smi
- Tecorigin: teco-smi -c
- MooreThreads: mthreads-gmi

Sources: docs/zh/usage/acceleration_cards/Ascend.md177-179 docs/zh/usage/acceleration_cards/Cambricon.md157-160

Integration with MinerU Components

Pipeline Backend Acceleration

Models in the pipeline backend use the device from get_device():

Layout Detection: DocLayout-YOLO runs on detected device
OCR: PaddleOCR can utilize GPU acceleration
Formula Recognition: MFD/MFR models use GPU when available
Table Recognition: UNet/RapidTable models accelerated on GPU

Device Assignment: Automatic via device_map={"": device} in model loading

Sources: mineru/utils/config_reader.py75-108

VLM Backend Acceleration

The VLM backend's model initialization in ModelSingleton mineru/backend/vlm/vlm_analyze.py23-219 uses device-specific paths:

transformers backend: Direct device mapping to model
vLLM backends: Automatic device detection by vLLM
LMDeploy backend: Explicit device_type configuration
MLX backend: Automatic Apple Silicon detection

HTTP Client Mode: No local acceleration needed - connects to remote GPU server

Sources: mineru/backend/vlm/vlm_analyze.py23-219

Hybrid Backend Acceleration

Combines both acceleration strategies:

VLM components use GPU-accelerated inference engines
Pipeline components (OCR, formula) use same device as detected by get_device()
Optimal resource utilization when both model types are needed

Sources: See Hybrid Backend documentation

Summary of Key Functions and Files

Component	File	Key Functions
Device Detection	mineru/utils/config_reader.py75-108	`get_device()`
VRAM Detection	mineru/utils/model_utils.py450-486	`get_vram(device)`
Memory Cleanup	mineru/utils/model_utils.py416-447	`clean_memory(device)`, `clean_vram(device, threshold)`
vLLM Configuration	mineru/backend/vlm/utils.py11-233	`enable_custom_logits_processors()`, `set_default_gpu_memory_utilization()`, `mod_kwargs_by_device_type()`
LMDeploy Configuration	mineru/backend/vlm/utils.py59-79	`set_lmdeploy_backend(device_type)`
Model Initialization	mineru/backend/vlm/vlm_analyze.py23-219	`ModelSingleton.get_model()`
vLLM Server	mineru/model/vlm/vllm_server.py1-69	`main()` (server startup wrapper)
Batch Size Config	mineru/backend/vlm/utils.py92-108	`set_default_batch_size()`
Reading Order	mineru/utils/block_sort.py179-231	`model_init(model_name)`

Sources: Multiple files as listed above

Hardware Acceleration

Relevant source files

For information about the VLM backend that uses these accelerators, see VLM Backend. For deployment patterns with acceleration cards, see Deployment.

Purpose and Scope

Hardware acceleration in MinerU serves two primary functions:

VLM Inference Acceleration: Vision-language models like Qwen2-VL benefit significantly from GPU/NPU acceleration, reducing inference time from minutes to seconds per page.
Pipeline Component Acceleration: Traditional models (layout detection, OCR, formula recognition) can also leverage hardware acceleration for improved throughput.

The system is designed to:

Automatically detect available hardware capabilities
Select appropriate inference engines based on device type
Apply device-specific optimizations and configurations
Provide fallback to CPU when accelerators are unavailable

Device Detection and Abstraction

Device Detection Hierarchy

MinerU implements a priority-based device detection system in mineru/utils/config_reader.py75-108:

Detection Priority:

Environment variable MINERU_DEVICE_MODE (highest priority - manual override)
NVIDIA CUDA (torch.cuda.is_available())
Apple MPS (torch.backends.mps.is_available())
Huawei Ascend NPU (torch_npu.npu.is_available())
Enflame GCU (torch.gcu.is_available())
MooreThreads MUSA (torch.musa.is_available())
Cambricon MLU (torch.mlu.is_available())
Tecorigin SDAA (torch.sdaa.is_available())
CPU (fallback)

Sources: mineru/utils/config_reader.py75-108

VRAM Detection and Configuration

The get_vram() function in mineru/utils/model_utils.py450-486 provides unified VRAM querying:

Device Type	Detection Method	Unit
CUDA	`torch.cuda.get_device_properties(device).total_memory`	GB (converted from bytes)
NPU (Ascend)	`torch_npu.npu.get_device_properties(device).total_memory`	GB
GCU (Enflame)	`torch.gcu.get_device_properties(device).total_memory`	GB
MUSA (MooreThreads)	`torch.musa.get_device_properties(device).total_memory`	GB
MLU (Cambricon)	`torch.mlu.get_device_properties(device).total_memory`	GB
SDAA (Tecorigin)	`torch.sdaa.get_device_properties(device).total_memory`	GB
Manual Override	`MINERU_VIRTUAL_VRAM_SIZE` env var	GB

Memory Management:

The clean_memory() function mineru/utils/model_utils.py416-438 provides device-specific cache clearing
Each device type has a corresponding .empty_cache() method
Garbage collection is triggered when VRAM drops below configurable thresholds

Sources: mineru/utils/model_utils.py416-486

Supported Hardware Platforms

Overview Table

Vendor	Product	Device Type	vLLM	LMDeploy	Notes
NVIDIA	Volta+, CUDA 12.8+	`cuda`	✅	✅	Full support, preferred platform
Huawei	Ascend 910B2, Atlas A2/A3	`npu`	✅	✅	Full support, data parallel supported
METAX	C500	`maca`	✅	✅	Full support
T-Head	ZW810E	`ppu`	✅	✅	Full support
Enflame	S60	`gcu`	✅	❌	vLLM only
Kunlunxin	P800	`kxpu`	✅	❌	vLLM only
Hygon	BW200	`dcu`	✅	❌	vLLM only
IluvatarCorex	BI-V150	`corex`	✅	❌	vLLM only, memory leak issue
Cambricon	MLU590	`mlu`	⚠️	⚠️	vLLM v0 limitations, async engine issues
MooreThreads	MTT S4000	`musa`	⚠️	❌	vLLM v0 limitations, async engine issues
Tecorigin	T100	`sdaa`	✅	❌	vLLM only
Apple	M1/M2/M3	`mps`	❌	❌	MLX engine for VLM, MPS for pipeline

Legend: ✅ Full Support, ⚠️ Partial Support, ❌ Not Supported

Sources: docs/zh/usage/acceleration_cards/Ascend.md97-175 docs/zh/usage/acceleration_cards/Cambricon.md73-155 docs/zh/usage/acceleration_cards/METAX.md70-148

NVIDIA GPU Support

Requirements:

Compute Capability: 7.0+ (Volta architecture or newer)
CUDA Version: 12.8+ recommended
Driver: Compatible with CUDA version

Inference Engine Support:

Device Selection:

Environment variable: CUDA_VISIBLE_DEVICES
Example: CUDA_VISIBLE_DEVICES=0,1 mineru --backend vlm-auto-engine pdf/sample.pdf

Sources: mineru/backend/vlm/utils.py11-56 mineru/backend/vlm/utils.py59-79

Domestic Chinese Accelerators

Huawei Ascend NPU

Supported Models:

Atlas A2 training series (800T A2, 900 A2 PoD, 200T A2 Box16, 300T A2)
Atlas 800I A2 inference series
Atlas A3 training series (800T A3, 900 A3 SuperPoD, 9000 A3 SuperPoD)
Atlas 800I A3 inference series
Atlas 300I Duo (Experimental)

Docker Configuration:

Device Selection:

Environment variable: ASCEND_RT_VISIBLE_DEVICES
Monitoring: npu-smi info

Special Considerations for Atlas 300I Duo:

Does not support graph mode or BF16 precision
Must add --enforce-eager --dtype float16 to all vLLM commands

Sources: docs/zh/usage/acceleration_cards/Ascend.md1-179

Other Accelerators

Cambricon MLU:

Device variable: MLU_VISIBLE_DEVICES
Monitoring: cnmon
Limitations: vLLM v0 engine only, async engine compatibility issues
Known issue: Batch processing limitations with LMDeploy

METAX GPU (MACA):

Device variable: CUDA_VISIBLE_DEVICES (uses CUDA-like API)
Monitoring: mx-smi
Full support for both vLLM and LMDeploy

T-Head PPU:

Device mappings: --device=/dev/alixpu, --device=/dev/alixpu_ctl
Monitoring: ppu-smi
Full support for both inference engines

Enflame GCU:

Device variable: TOPS_VISIBLE_DEVICES
Monitoring: efsmi
vLLM only

Kunlunxin XPU:

Device variable: XPU_VISIBLE_DEVICES
Monitoring: xpu-smi
vLLM only, requires custom GELU activation patch

Apple Silicon Support

Supported Models: M1, M2, M3 (macOS 13.5+)

Backend Selection:

MLX Engine Requirements:

macOS version check: mineru/utils/check_sys_env.py validates platform.mac_ver() >= 13.5
Model loading: mineru/backend/vlm/vlm_analyze.py85-93 uses mlx_vlm.load()
Only available for VLM backend (not for pipeline components)

Sources: mineru/backend/vlm/vlm_analyze.py85-93 mineru/utils/config_reader.py82-83

Inference Engine Configuration

Engine Selection Logic

The ModelSingleton class in mineru/backend/vlm/vlm_analyze.py23-219 initializes models with device-specific configurations:

Sources: mineru/backend/vlm/vlm_analyze.py23-219

vLLM Configuration

Device-Specific Parameter Injection

The mod_kwargs_by_device_type() function mineru/backend/vlm/utils.py172-233 applies device-specific optimizations:

Device Configurations:

Device	Configuration Parameters
`corex` (IluvatarCorex)	`compilation_config_dict`: `{"cudagraph_mode": "FULL_DECODE_ONLY", "level": 0}`
`kxpu` (Kunlunxin)	`compilation_config_dict`: Custom splitting ops `block_size`: 128 `dtype`: "float16" `distributed_executor_backend`: "mp" `enable_chunked_prefill`: False `enable_prefix_caching`: False

Sources: mineru/backend/vlm/utils.py111-233

GPU Memory Utilization

The set_default_gpu_memory_utilization() function mineru/backend/vlm/utils.py82-89 adapts to available VRAM:

vLLM Version	VRAM Size	Memory Utilization
>= 0.11.0	<= 8 GB	0.7 (70%)
>= 0.11.0	> 8 GB	0.5 (50%)
< 0.11.0	Any	0.5 (50%)

Override: Set gpu_memory_utilization in kwargs explicitly to override defaults.

Sources: mineru/backend/vlm/utils.py82-89

Custom Logits Processors

The enable_custom_logits_processors() function mineru/backend/vlm/utils.py11-56 determines eligibility:

Requirements (all must be met):

Device has CUDA/NPU/GCU/MUSA/MLU/SDAA support
VLLM_USE_V1 != 0 (v1 engine enabled)
vLLM version >= 0.10.1
Compute Capability >= 8.0 (or vLLM >= 0.10.2 for CC < 8.0)

Processor Registration:

Sources: mineru/backend/vlm/utils.py11-56 mineru/backend/vlm/vlm_analyze.py118-120

LMDeploy Configuration

Backend Selection

The set_lmdeploy_backend() function mineru/backend/vlm/utils.py59-79 chooses between pytorch and turbomind:

Device Type	Preferred Backend	Fallback Conditions
`ascend`, `maca`, `camb`	`pytorch`	Always pytorch
`cuda` (Windows)	`turbomind`	Always turbomind
`cuda` (Linux, CC >= 8.0)	`pytorch`	Ampere+ architecture
`cuda` (Linux, CC < 8.0)	`turbomind`	Volta/Turing architecture

Device Type Configuration:

Sources: mineru/backend/vlm/vlm_analyze.py156-202 mineru/backend/vlm/utils.py59-79

Engine Initialization

Sources: mineru/backend/vlm/vlm_analyze.py156-201

Docker Deployment for Acceleration Cards

Dockerfile Structure

Each acceleration card has a dedicated Dockerfile in docker/china/:

Example: Ascend NPU Dockerfile docker/china/npu.Dockerfile

Example: Cambricon MLU Dockerfile docker/china/mlu.Dockerfile1-43

Conditional base image selection (vLLM or LMDeploy)
ARG-based backend switching with sed commands

Example: Enflame GCU Dockerfile docker/china/gcu.Dockerfile1-30

Custom mirror sources for package installation
Pre-configured vLLM environment

Sources: docker/china/npu.Dockerfile docker/china/mlu.Dockerfile1-43 docker/china/gcu.Dockerfile1-30

Common Docker Run Patterns

Volume and Device Mappings

Accelerator	Device Mappings	Volume Mounts
Ascend NPU	`/dev/davinci*`, `/dev/davinci_manager`, `/dev/devmm_svm`, `/dev/hisi_hdc`	`/var/log/npu/`, `/usr/local/dcmi`, `/usr/local/Ascend/driver`, `/usr/local/bin/npu-smi`
Cambricon MLU	`-v /dev:/dev`	`/lib/modules`, `/usr/bin/cnmon`
METAX GPU	`/dev/mem`, `/dev/dri`, `/dev/mxcd`, `/dev/infiniband`	`/datapool` (workspace)
T-Head PPU	`/dev/alixpu`, `/dev/alixpu_ctl`	`/mnt`, `/datapool`
IluvatarCorex	`-v /dev:/dev`	`/usr/src`, `/lib/modules`
Kunlunxin XPU	`/dev/xpu*`, `/dev/xpuctrl`	`/usr/local/bin/xpu-smi`
Hygon DCU	`/dev/kfd`, `/dev/mkfd`, `/dev/dri`	`/opt/hyhal`

Common Flags:

--privileged: Required for most accelerators
--ipc=host / --network=host: Shared memory and networking
--shm-size: Large shared memory (often 100-500GB for multi-card setups)
--ulimit memlock=-1: Unlimited locked memory

Sources: docs/zh/usage/acceleration_cards/Ascend.md54-94 docs/zh/usage/acceleration_cards/Cambricon.md32-63 docs/zh/usage/acceleration_cards/METAX.md38-66

Environment Variables

Variable	Purpose	Example Values
`MINERU_MODEL_SOURCE`	Skip model downloads	`local`
`MINERU_LMDEPLOY_DEVICE`	Set LMDeploy device	`ascend`, `maca`, `camb`
`MINERU_VLLM_DEVICE`	Set vLLM device-specific config	`corex`, `kxpu`, `musa`
`MINERU_DEVICE_MODE`	Override device detection	`cuda`, `npu`, `mps`
`MINERU_VIRTUAL_VRAM_SIZE`	Mock VRAM size for testing	`8`, `16`, `32` (GB)
`VLLM_WORKER_MULTIPROC_METHOD`	vLLM multiprocessing method	`spawn` (required for Ascend)
`VLLM_USE_V1`	Enable vLLM v1 engine	`0`, `1`
`VLLM_ENFORCE_CUDA_GRAPH`	Force CUDA graph mode	`1` (IluvatarCorex)
`OMP_NUM_THREADS`	OpenMP thread count	`1` (set by vLLM/LMDeploy init)

Sources: mineru/backend/vlm/vlm_analyze.py95-96 mineru/model/vlm/vllm_server.py59-60

Performance Optimization

Batch Size Configuration

The set_default_batch_size() function mineru/backend/vlm/utils.py92-108 for transformers backend:

VRAM Size	Batch Size	Rationale
>= 16 GB	8	Full batch processing
>= 8 GB, < 16 GB	4	Moderate batching
< 8 GB	1	Sequential processing

Override: Pass batch_size in kwargs to ModelSingleton.get_model().

Sources: mineru/backend/vlm/utils.py92-108

Memory Management

Automatic Cache Clearing

The clean_memory() function mineru/utils/model_utils.py416-438 provides device-specific cleanup:

The clean_vram() wrapper mineru/utils/model_utils.py441-447 applies cleanup selectively:

Only runs when total_memory <= vram_threshold (default 8 GB)
Logs cleanup time for debugging

Sources: mineru/utils/model_utils.py416-447

Reading Order Optimization

Layout sorting uses LayoutLMv3 model on GPU when available mineru/utils/block_sort.py179-231:

BF16 Support Detection:

Sources: mineru/utils/block_sort.py179-231

Troubleshooting and Limitations

Known Issues by Platform

Cambricon MLU

Issue: vLLM v0 engine compatibility
Impact: Async engine unavailable, affects fastapi/gradio auto-engine modes
Workaround: Use http-client mode with mineru-openai-server
Status: Awaiting vLLM v1 support from vendor

MooreThreads MUSA

Issue: vLLM v0 engine compatibility
Impact: Same as Cambricon MLU
Additional: Precision alignment issues in some scenarios
Workaround: Use http-client mode

IluvatarCorex

Issue: Memory leak after service termination
Impact: VRAM not released properly when vLLM server stops
Workaround: Restart Docker container to clear VRAM
Environment: Set VLLM_ENFORCE_CUDA_GRAPH=1

Ascend NPU (Atlas 300I Duo)

Issue: Graph mode and BF16 not supported
Impact: Slower inference, higher memory usage
Required Flags: --enforce-eager --dtype float16 on all vLLM commands

Device Selection Best Practices

Explicit Device Specification: Use environment variables to force specific devices when auto-detection fails
Multi-Device Systems: Use vendor-specific visibility variables
Memory Monitoring: Check accelerator usage before launching
- NVIDIA: nvidia-smi
- Ascend: npu-smi info
- Cambricon: cnmon
- METAX: mx-smi
- Enflame: efsmi
- Kunlunxin: xpu-smi
- Hygon: hy-smi
- IluvatarCorex: ixsmi
- T-Head: ppu-smi
- Tecorigin: teco-smi -c
- MooreThreads: mthreads-gmi

Sources: docs/zh/usage/acceleration_cards/Ascend.md177-179 docs/zh/usage/acceleration_cards/Cambricon.md157-160

Integration with MinerU Components

Pipeline Backend Acceleration

Models in the pipeline backend use the device from get_device():

Layout Detection: DocLayout-YOLO runs on detected device
OCR: PaddleOCR can utilize GPU acceleration
Formula Recognition: MFD/MFR models use GPU when available
Table Recognition: UNet/RapidTable models accelerated on GPU

Device Assignment: Automatic via device_map={"": device} in model loading

Sources: mineru/utils/config_reader.py75-108

VLM Backend Acceleration

The VLM backend's model initialization in ModelSingleton mineru/backend/vlm/vlm_analyze.py23-219 uses device-specific paths:

transformers backend: Direct device mapping to model
vLLM backends: Automatic device detection by vLLM
LMDeploy backend: Explicit device_type configuration
MLX backend: Automatic Apple Silicon detection

HTTP Client Mode: No local acceleration needed - connects to remote GPU server

Sources: mineru/backend/vlm/vlm_analyze.py23-219

Hybrid Backend Acceleration

Combines both acceleration strategies:

VLM components use GPU-accelerated inference engines
Pipeline components (OCR, formula) use same device as detected by get_device()
Optimal resource utilization when both model types are needed

Sources: See Hybrid Backend documentation

Summary of Key Functions and Files

Component	File	Key Functions
Device Detection	mineru/utils/config_reader.py75-108	`get_device()`
VRAM Detection	mineru/utils/model_utils.py450-486	`get_vram(device)`
Memory Cleanup	mineru/utils/model_utils.py416-447	`clean_memory(device)`, `clean_vram(device, threshold)`
vLLM Configuration	mineru/backend/vlm/utils.py11-233	`enable_custom_logits_processors()`, `set_default_gpu_memory_utilization()`, `mod_kwargs_by_device_type()`
LMDeploy Configuration	mineru/backend/vlm/utils.py59-79	`set_lmdeploy_backend(device_type)`
Model Initialization	mineru/backend/vlm/vlm_analyze.py23-219	`ModelSingleton.get_model()`
vLLM Server	mineru/model/vlm/vllm_server.py1-69	`main()` (server startup wrapper)
Batch Size Config	mineru/backend/vlm/utils.py92-108	`set_default_batch_size()`
Reading Order	mineru/utils/block_sort.py179-231	`model_init(model_name)`

Sources: Multiple files as listed above

Hardware Acceleration

Purpose and Scope

Device Detection and Abstraction

Device Detection Hierarchy

VRAM Detection and Configuration

Supported Hardware Platforms

Overview Table

NVIDIA GPU Support

Domestic Chinese Accelerators

Huawei Ascend NPU

Other Accelerators

Apple Silicon Support

Inference Engine Configuration

Engine Selection Logic

vLLM Configuration

Device-Specific Parameter Injection

GPU Memory Utilization

Custom Logits Processors

LMDeploy Configuration

Backend Selection

Engine Initialization

Docker Deployment for Acceleration Cards

Dockerfile Structure

Common Docker Run Patterns

Volume and Device Mappings

Environment Variables

Performance Optimization

Batch Size Configuration

Memory Management

Automatic Cache Clearing

Reading Order Optimization

Troubleshooting and Limitations

Known Issues by Platform

Cambricon MLU

MooreThreads MUSA

IluvatarCorex

Ascend NPU (Atlas 300I Duo)

Device Selection Best Practices

Integration with MinerU Components

Pipeline Backend Acceleration

VLM Backend Acceleration

Hybrid Backend Acceleration

Summary of Key Functions and Files

On this page

Hardware Acceleration

Purpose and Scope

Device Detection and Abstraction

Device Detection Hierarchy

VRAM Detection and Configuration

Supported Hardware Platforms

Overview Table

NVIDIA GPU Support

Domestic Chinese Accelerators

Huawei Ascend NPU

Other Accelerators

Apple Silicon Support

Inference Engine Configuration

Engine Selection Logic

vLLM Configuration

Device-Specific Parameter Injection

GPU Memory Utilization

Custom Logits Processors

LMDeploy Configuration

Backend Selection

Engine Initialization

Docker Deployment for Acceleration Cards

Dockerfile Structure

Common Docker Run Patterns

Volume and Device Mappings

Environment Variables

Performance Optimization

Batch Size Configuration

Memory Management

Automatic Cache Clearing

Reading Order Optimization

Troubleshooting and Limitations

Known Issues by Platform

Cambricon MLU

MooreThreads MUSA

IluvatarCorex