This page documents MinerU's hardware acceleration infrastructure, which enables GPU/NPU-accelerated inference for Vision-Language Models (VLM) and other deep learning components. The system provides a unified abstraction layer supporting NVIDIA GPUs, 11 domestic Chinese accelerator cards, Apple Silicon, and CPU-only execution.
For information about the VLM backend that uses these accelerators, see VLM Backend. For deployment patterns with acceleration cards, see Deployment.
Hardware acceleration in MinerU serves two primary functions:
The system is designed to:
MinerU implements a priority-based device detection system in mineru/utils/config_reader.py75-108:
Detection Priority:
MINERU_DEVICE_MODE (highest priority - manual override)torch.cuda.is_available())torch.backends.mps.is_available())torch_npu.npu.is_available())torch.gcu.is_available())torch.musa.is_available())torch.mlu.is_available())torch.sdaa.is_available())Sources: mineru/utils/config_reader.py75-108
The get_vram() function in mineru/utils/model_utils.py450-486 provides unified VRAM querying:
| Device Type | Detection Method | Unit |
|---|---|---|
| CUDA | torch.cuda.get_device_properties(device).total_memory | GB (converted from bytes) |
| NPU (Ascend) | torch_npu.npu.get_device_properties(device).total_memory | GB |
| GCU (Enflame) | torch.gcu.get_device_properties(device).total_memory | GB |
| MUSA (MooreThreads) | torch.musa.get_device_properties(device).total_memory | GB |
| MLU (Cambricon) | torch.mlu.get_device_properties(device).total_memory | GB |
| SDAA (Tecorigin) | torch.sdaa.get_device_properties(device).total_memory | GB |
| Manual Override | MINERU_VIRTUAL_VRAM_SIZE env var | GB |
Memory Management:
clean_memory() function mineru/utils/model_utils.py416-438 provides device-specific cache clearing.empty_cache() methodSources: mineru/utils/model_utils.py416-486
| Vendor | Product | Device Type | vLLM | LMDeploy | Notes |
|---|---|---|---|---|---|
| NVIDIA | Volta+, CUDA 12.8+ | cuda | ✅ | ✅ | Full support, preferred platform |
| Huawei | Ascend 910B2, Atlas A2/A3 | npu | ✅ | ✅ | Full support, data parallel supported |
| METAX | C500 | maca | ✅ | ✅ | Full support |
| T-Head | ZW810E | ppu | ✅ | ✅ | Full support |
| Enflame | S60 | gcu | ✅ | ❌ | vLLM only |
| Kunlunxin | P800 | kxpu | ✅ | ❌ | vLLM only |
| Hygon | BW200 | dcu | ✅ | ❌ | vLLM only |
| IluvatarCorex | BI-V150 | corex | ✅ | ❌ | vLLM only, memory leak issue |
| Cambricon | MLU590 | mlu | ⚠️ | ⚠️ | vLLM v0 limitations, async engine issues |
| MooreThreads | MTT S4000 | musa | ⚠️ | ❌ | vLLM v0 limitations, async engine issues |
| Tecorigin | T100 | sdaa | ✅ | ❌ | vLLM only |
| Apple | M1/M2/M3 | mps | ❌ | ❌ | MLX engine for VLM, MPS for pipeline |
Legend: ✅ Full Support, ⚠️ Partial Support, ❌ Not Supported
Sources: docs/zh/usage/acceleration_cards/Ascend.md97-175 docs/zh/usage/acceleration_cards/Cambricon.md73-155 docs/zh/usage/acceleration_cards/METAX.md70-148
Requirements:
Inference Engine Support:
Device Selection:
CUDA_VISIBLE_DEVICESCUDA_VISIBLE_DEVICES=0,1 mineru --backend vlm-auto-engine pdf/sample.pdfSources: mineru/backend/vlm/utils.py11-56 mineru/backend/vlm/utils.py59-79
Supported Models:
Docker Configuration:
Device Selection:
ASCEND_RT_VISIBLE_DEVICESnpu-smi infoSpecial Considerations for Atlas 300I Duo:
--enforce-eager --dtype float16 to all vLLM commandsSources: docs/zh/usage/acceleration_cards/Ascend.md1-179
Cambricon MLU:
MLU_VISIBLE_DEVICEScnmonMETAX GPU (MACA):
CUDA_VISIBLE_DEVICES (uses CUDA-like API)mx-smiT-Head PPU:
--device=/dev/alixpu, --device=/dev/alixpu_ctlppu-smiEnflame GCU:
TOPS_VISIBLE_DEVICESefsmiKunlunxin XPU:
XPU_VISIBLE_DEVICESxpu-smiSources: docs/zh/usage/acceleration_cards/Cambricon.md1-160 docs/zh/usage/acceleration_cards/METAX.md1-152 docs/zh/usage/acceleration_cards/THead.md1-143 docs/zh/usage/acceleration_cards/Enflame.md1-110 docs/zh/usage/acceleration_cards/Kunlunxin.md1-124
Supported Models: M1, M2, M3 (macOS 13.5+)
Backend Selection:
MLX Engine Requirements:
platform.mac_ver() >= 13.5mlx_vlm.load()Sources: mineru/backend/vlm/vlm_analyze.py85-93 mineru/utils/config_reader.py82-83
The ModelSingleton class in mineru/backend/vlm/vlm_analyze.py23-219 initializes models with device-specific configurations:
Sources: mineru/backend/vlm/vlm_analyze.py23-219
The mod_kwargs_by_device_type() function mineru/backend/vlm/utils.py172-233 applies device-specific optimizations:
Device Configurations:
| Device | Configuration Parameters |
|---|---|
corex (IluvatarCorex) | compilation_config_dict: {"cudagraph_mode": "FULL_DECODE_ONLY", "level": 0} |
kxpu (Kunlunxin) | compilation_config_dict: Custom splitting opsblock_size: 128dtype: "float16"distributed_executor_backend: "mp"enable_chunked_prefill: Falseenable_prefix_caching: False |
Sources: mineru/backend/vlm/utils.py111-233
The set_default_gpu_memory_utilization() function mineru/backend/vlm/utils.py82-89 adapts to available VRAM:
| vLLM Version | VRAM Size | Memory Utilization |
|---|---|---|
| >= 0.11.0 | <= 8 GB | 0.7 (70%) |
| >= 0.11.0 | > 8 GB | 0.5 (50%) |
| < 0.11.0 | Any | 0.5 (50%) |
Override: Set gpu_memory_utilization in kwargs explicitly to override defaults.
Sources: mineru/backend/vlm/utils.py82-89
The enable_custom_logits_processors() function mineru/backend/vlm/utils.py11-56 determines eligibility:
Requirements (all must be met):
VLLM_USE_V1 != 0 (v1 engine enabled)Processor Registration:
Sources: mineru/backend/vlm/utils.py11-56 mineru/backend/vlm/vlm_analyze.py118-120
The set_lmdeploy_backend() function mineru/backend/vlm/utils.py59-79 chooses between pytorch and turbomind:
| Device Type | Preferred Backend | Fallback Conditions |
|---|---|---|
ascend, maca, camb | pytorch | Always pytorch |
cuda (Windows) | turbomind | Always turbomind |
cuda (Linux, CC >= 8.0) | pytorch | Ampere+ architecture |
cuda (Linux, CC < 8.0) | turbomind | Volta/Turing architecture |
Device Type Configuration:
Sources: mineru/backend/vlm/vlm_analyze.py156-202 mineru/backend/vlm/utils.py59-79
Sources: mineru/backend/vlm/vlm_analyze.py156-201
Each acceleration card has a dedicated Dockerfile in docker/china/:
Example: Ascend NPU Dockerfile docker/china/npu.Dockerfile
Example: Cambricon MLU Dockerfile docker/china/mlu.Dockerfile1-43
sed commandsExample: Enflame GCU Dockerfile docker/china/gcu.Dockerfile1-30
Sources: docker/china/npu.Dockerfile docker/china/mlu.Dockerfile1-43 docker/china/gcu.Dockerfile1-30
| Accelerator | Device Mappings | Volume Mounts |
|---|---|---|
| Ascend NPU | /dev/davinci*, /dev/davinci_manager, /dev/devmm_svm, /dev/hisi_hdc | /var/log/npu/, /usr/local/dcmi, /usr/local/Ascend/driver, /usr/local/bin/npu-smi |
| Cambricon MLU | -v /dev:/dev | /lib/modules, /usr/bin/cnmon |
| METAX GPU | /dev/mem, /dev/dri, /dev/mxcd, /dev/infiniband | /datapool (workspace) |
| T-Head PPU | /dev/alixpu, /dev/alixpu_ctl | /mnt, /datapool |
| IluvatarCorex | -v /dev:/dev | /usr/src, /lib/modules |
| Kunlunxin XPU | /dev/xpu*, /dev/xpuctrl | /usr/local/bin/xpu-smi |
| Hygon DCU | /dev/kfd, /dev/mkfd, /dev/dri | /opt/hyhal |
Common Flags:
--privileged: Required for most accelerators--ipc=host / --network=host: Shared memory and networking--shm-size: Large shared memory (often 100-500GB for multi-card setups)--ulimit memlock=-1: Unlimited locked memorySources: docs/zh/usage/acceleration_cards/Ascend.md54-94 docs/zh/usage/acceleration_cards/Cambricon.md32-63 docs/zh/usage/acceleration_cards/METAX.md38-66
| Variable | Purpose | Example Values |
|---|---|---|
MINERU_MODEL_SOURCE | Skip model downloads | local |
MINERU_LMDEPLOY_DEVICE | Set LMDeploy device | ascend, maca, camb |
MINERU_VLLM_DEVICE | Set vLLM device-specific config | corex, kxpu, musa |
MINERU_DEVICE_MODE | Override device detection | cuda, npu, mps |
MINERU_VIRTUAL_VRAM_SIZE | Mock VRAM size for testing | 8, 16, 32 (GB) |
VLLM_WORKER_MULTIPROC_METHOD | vLLM multiprocessing method | spawn (required for Ascend) |
VLLM_USE_V1 | Enable vLLM v1 engine | 0, 1 |
VLLM_ENFORCE_CUDA_GRAPH | Force CUDA graph mode | 1 (IluvatarCorex) |
OMP_NUM_THREADS | OpenMP thread count | 1 (set by vLLM/LMDeploy init) |
Sources: mineru/backend/vlm/vlm_analyze.py95-96 mineru/model/vlm/vllm_server.py59-60
The set_default_batch_size() function mineru/backend/vlm/utils.py92-108 for transformers backend:
| VRAM Size | Batch Size | Rationale |
|---|---|---|
| >= 16 GB | 8 | Full batch processing |
| >= 8 GB, < 16 GB | 4 | Moderate batching |
| < 8 GB | 1 | Sequential processing |
Override: Pass batch_size in kwargs to ModelSingleton.get_model().
Sources: mineru/backend/vlm/utils.py92-108
The clean_memory() function mineru/utils/model_utils.py416-438 provides device-specific cleanup:
The clean_vram() wrapper mineru/utils/model_utils.py441-447 applies cleanup selectively:
total_memory <= vram_threshold (default 8 GB)Sources: mineru/utils/model_utils.py416-447
Layout sorting uses LayoutLMv3 model on GPU when available mineru/utils/block_sort.py179-231:
BF16 Support Detection:
Sources: mineru/utils/block_sort.py179-231
VLLM_ENFORCE_CUDA_GRAPH=1--enforce-eager --dtype float16 on all vLLM commandsSources: docs/zh/usage/acceleration_cards/Cambricon.md66-78 docs/zh/usage/acceleration_cards/MooreThreads.md44-47 docs/zh/usage/acceleration_cards/IluvatarCorex.md50-51 docs/zh/usage/acceleration_cards/Ascend.md88-93
Explicit Device Specification: Use environment variables to force specific devices when auto-detection fails
Multi-Device Systems: Use vendor-specific visibility variables
Memory Monitoring: Check accelerator usage before launching
nvidia-sminpu-smi infocnmonmx-smiefsmixpu-smihy-smiixsmippu-smiteco-smi -cmthreads-gmiSources: docs/zh/usage/acceleration_cards/Ascend.md177-179 docs/zh/usage/acceleration_cards/Cambricon.md157-160
Models in the pipeline backend use the device from get_device():
Device Assignment: Automatic via device_map={"": device} in model loading
Sources: mineru/utils/config_reader.py75-108
The VLM backend's model initialization in ModelSingleton mineru/backend/vlm/vlm_analyze.py23-219 uses device-specific paths:
HTTP Client Mode: No local acceleration needed - connects to remote GPU server
Sources: mineru/backend/vlm/vlm_analyze.py23-219
Combines both acceleration strategies:
get_device()Sources: See Hybrid Backend documentation
| Component | File | Key Functions |
|---|---|---|
| Device Detection | mineru/utils/config_reader.py75-108 | get_device() |
| VRAM Detection | mineru/utils/model_utils.py450-486 | get_vram(device) |
| Memory Cleanup | mineru/utils/model_utils.py416-447 | clean_memory(device), clean_vram(device, threshold) |
| vLLM Configuration | mineru/backend/vlm/utils.py11-233 | enable_custom_logits_processors(), set_default_gpu_memory_utilization(), mod_kwargs_by_device_type() |
| LMDeploy Configuration | mineru/backend/vlm/utils.py59-79 | set_lmdeploy_backend(device_type) |
| Model Initialization | mineru/backend/vlm/vlm_analyze.py23-219 | ModelSingleton.get_model() |
| vLLM Server | mineru/model/vlm/vllm_server.py1-69 | main() (server startup wrapper) |
| Batch Size Config | mineru/backend/vlm/utils.py92-108 | set_default_batch_size() |
| Reading Order | mineru/utils/block_sort.py179-231 | model_init(model_name) |
Sources: Multiple files as listed above
Refresh this wiki