This document describes MinerU's OpenAI-compatible server infrastructure and HTTP client backends, which enable distributed document parsing by separating GPU-intensive VLM inference from lightweight client operations. The server (mineru-openai-server) exposes VLM models through an OpenAI-compatible API, while client backends (vlm-http-client, hybrid-http-client) connect to remote inference servers over HTTP.
For general backend selection and routing logic, see Core Orchestration. For local VLM inference without networking, see VLM Backend. For hybrid processing details, see Hybrid Backend.
The OpenAI-compatible server and HTTP client system provides a client-server architecture for document parsing where:
Sources: mineru/cli/common.py240-305 mineru/cli/client.py60-70 pyproject.toml113-115 docker/compose.yaml2-29
MinerU provides three server command entry points defined in pyproject.toml113-115:
| Command | Function | Engine | Description |
|---|---|---|---|
mineru-openai-server | mineru.cli.vlm_server:openai_server | Auto-selected | Automatically chooses between vLLM and LMDeploy |
mineru-vllm-server | mineru.cli.vlm_server:vllm_server | vLLM | Explicitly uses vLLM engine |
mineru-lmdeploy-server | mineru.cli.vlm_server:lmdeploy_server | LMDeploy | Explicitly uses LMDeploy engine |
The openai_server function is the recommended entry point as it auto-detects the best available engine based on the environment.
Basic usage:
With vLLM-specific parameters:
All vLLM and LMDeploy parameters can be passed through to the underlying engine. See Advanced CLI Parameters for engine-specific parameter tuning.
Sources: docs/zh/quick_start/docker_deployment.md56-66 docker/compose.yaml10-16
Environment variables:
MINERU_MODEL_SOURCE: Set to local, huggingface, or modelscope to control model loading sourceMINERU_DEVICE_MODE: Override automatic device detection (e.g., cuda, cuda:0, npu)Sources: docker/compose.yaml9-16 mineru/cli/client.py169-181
The Docker images based on vllm/vllm-openai include the server pre-configured:
Start with Docker Compose:
Sources: docker/compose.yaml2-29 docs/zh/quick_start/docker_deployment.md56-66
The backend routing logic in mineru/cli/common.py414-484 determines whether to use local or remote inference:
In _process_vlm and _process_hybrid functions mineru/cli/common.py283-358:
Sources: mineru/cli/common.py414-484 mineru/cli/common.py267-305
The vlm-http-client backend connects to a remote VLM inference server for high-accuracy parsing of Chinese and English documents.
Usage from CLI:
Usage from Python:
Client requirements:
Sources: mineru/cli/client.py59-94 README.md162 docs/zh/quick_start/docker_deployment.md62-66
The hybrid-http-client backend combines remote VLM inference with local lightweight processing:
This allows for high accuracy while requiring less GPU memory on the server side (3GB VRAM vs 8GB for pure VLM).
Usage from CLI:
Client requirements:
The hybrid backend decision logic is in mineru/backend/hybrid/hybrid_analyze.py which determines whether to use VLM OCR or fall back to PaddleOCR for multi-language support.
Sources: mineru/cli/common.py308-358 mineru/cli/client.py68 README.md161
Key steps:
Client preparation: PDF bytes are prepared by _prepare_pdf_bytes() which handles page range extraction using convert_pdf_bytes_to_bytes_by_pypdfium2() mineru/cli/common.py54-82
Backend routing: The backend string is checked for "client" suffix to determine if server_url should be used mineru/cli/common.py286-287
Model initialization: ModelSingleton.get_model() creates an HTTP client when backend is "http-client" mineru/backend/vlm/vlm_analyze.py
Batch inference: Images are sent to the server in batches, with the server handling VLM inference
Response processing: Server responses are parsed into MinerU's internal format and converted to middle JSON
Sources: mineru/cli/common.py267-305 mineru/cli/common.py54-82 mineru/backend/hybrid/hybrid_analyze.py1-50
Advantages:
Configuration:
--data-parallel-size for multi-GPU parallelism docker/compose.yaml15server_url pointing to the central serverSources: docs/zh/quick_start/docker_deployment.md56-84 docker/compose.yaml10-29
The docker/compose.yaml1-73 file defines three service profiles:
Services can be configured to either use local VLM inference or connect to the openai-server as clients.
Sources: docker/compose.yaml1-73 docs/zh/quick_start/docker_deployment.md42-84
For clients with limited GPU resources (e.g., 6-10GB VRAM), the hybrid-http-client backend offloads VLM inference to a server while performing OCR and formula recognition locally:
This pattern is optimal when:
Sources: mineru/backend/hybrid/hybrid_analyze.py1-50 README.md161
Priority order (highest to lowest):
-u / --urlserver_url=...Example in Python API:
Sources: mineru/cli/client.py85-94 mineru/cli/common.py486-558
The Gradio interface mineru/cli/gradio_app.py199-257 includes an --enable-http-client flag to show HTTP client backend options:
UI elements dynamically show/hide based on backend selection mineru/cli/gradio_app.py356-369:
vlm-http-client or hybrid-http-client is selected, a "Server URL" input field appearsi18n("server_url") with info text about OpenAI-compatible serversSources: mineru/cli/gradio_app.py199-257 mineru/cli/gradio_app.py356-369
The FastAPI server mineru/cli/fast_api.py125-199 accepts backend and server_url form parameters:
Example API call:
Sources: mineru/cli/fast_api.py125-199 mineru/cli/fast_api.py173-176
Server-side optimization:
For multi-GPU servers, use data parallelism to increase throughput:
For single GPU with limited VRAM, reduce KV cache size:
Client-side optimization:
For hybrid-http-client, adjust batch sizes for local processing:
Note that inference engine parameters (like --gpu-memory-utilization) are only effective on the server side, not on HTTP clients.
Sources: docker/compose.yaml15-16 docs/zh/usage/advanced_cli_parameters.md1-35
Key environment variables for server and client:
| Variable | Scope | Purpose | Default |
|---|---|---|---|
MINERU_MODEL_SOURCE | Server | Model download source (huggingface/modelscope/local) | huggingface |
MINERU_DEVICE_MODE | Server | Force device type (cuda/cuda:0/npu) | auto-detected |
MINERU_LOG_LEVEL | Both | Logging verbosity (DEBUG/INFO/WARNING) | INFO |
MINERU_VLM_FORMULA_ENABLE | Server | Enable formula recognition in VLM | True |
MINERU_VLM_TABLE_ENABLE | Server | Enable table recognition in VLM | True |
These are set in the backend processing functions mineru/cli/common.py456-457 mineru/cli/common.py475-476 before calling the VLM/hybrid analyze functions.
Sources: mineru/cli/common.py456-476 docker/compose.yaml9-10 mineru/cli/client.py169-181
Refresh this wiki