The Hybrid Backend is MinerU's default and recommended processing backend that intelligently combines Vision-Language Model (VLM) capabilities with specialized computer vision models to achieve high accuracy across multiple languages. It adaptively chooses between two processing modes based on document characteristics.
This page covers the hybrid backend's architecture, decision logic, processing modes, and integration points. For information about:
Sources: mineru/backend/hybrid/hybrid_analyze.py1-527
Figure 1: Hybrid Backend Decision Tree and Processing Flow
Sources: mineru/backend/hybrid/hybrid_analyze.py384-454 mineru/backend/hybrid/hybrid_analyze.py456-526
_should_enable_vlm_ocr FunctionThe core decision function determines whether to use VLM-only processing or hybrid mode:
Figure 2: VLM-OCR Decision Logic
The function evaluates three primary conditions:
| Condition | Requirement | Rationale |
|---|---|---|
ocr_enable | Must be True | Document is image-based, needs OCR |
language | Must be "ch" or "en" | VLM OCR currently optimized for Chinese/English |
inline_formula_enable | Must be True | Formula processing requested |
Environment Variable Overrides:
MINERU_FORCE_VLM_OCR_ENABLE=1: Forces VLM-OCR mode regardless of conditionsMINERU_HYBRID_FORCE_PIPELINE_ENABLE=1: Forces hybrid mode regardless of conditionsSources: mineru/backend/hybrid/hybrid_analyze.py369-381
VLM-OCR mode is triggered when all conditions are met (or forced via environment variable). In this mode, the VLM performs complete end-to-end processing:
Figure 3: VLM-OCR Mode Data Flow
In this mode:
inline_formula_list and ocr_res_list are set to empty listshybrid_pipeline_model is set to NoneSources: mineru/backend/hybrid/hybrid_analyze.py416-420 mineru/backend/hybrid/hybrid_analyze.py488-492
Hybrid mode leverages VLM for layout detection while delegating specialized tasks to dedicated models:
Figure 4: Hybrid Mode Processing Pipeline
Sources: mineru/backend/hybrid/hybrid_analyze.py422-435 mineru/backend/hybrid/hybrid_analyze.py494-507
The mask_image_regions function prevents formula detection models from incorrectly identifying formulas within images, tables, or existing equation blocks:
Sources: mineru/backend/hybrid/hybrid_analyze.py169-189
Formula processing uses the MFD (Mathematical Formula Detection) and MFR (Mathematical Formula Recognition) models:
Figure 5: Formula Processing Flow
The MFD results are converted to a format compatible with OCR processing:
Sources: mineru/backend/hybrid/hybrid_analyze.py226-250
The ocr_det function handles text detection with two modes:
Processes each text block individually:
Figure 6: Non-Batch OCR Detection Flow
Sources: mineru/backend/hybrid/hybrid_analyze.py52-86
Groups images by resolution for efficient batch processing:
Figure 7: Batch OCR Detection Flow
Key optimization: Images are grouped by resolution and padded to common dimensions to maximize GPU utilization.
Sources: mineru/backend/hybrid/hybrid_analyze.py87-167
If _ocr_enable is True, cropped text regions undergo recognition:
Low-confidence results (score < OcrConfidence.min_confidence) are marked with category_id = 16.
Sources: mineru/backend/hybrid/hybrid_analyze.py263-300
The _normalize_bbox function converts polygon coordinates to normalized bounding boxes:
Sources: mineru/backend/hybrid/hybrid_analyze.py191-200 mineru/backend/hybrid/hybrid_analyze.py303-321
The get_batch_ratio function determines batch sizes based on GPU memory:
| GPU Memory (GB) | Batch Ratio | Formula Batch | OCR Det Batch |
|---|---|---|---|
| ≥ 32 GB | 16 | 256 | 256 |
| ≥ 16 GB | 8 | 128 | 128 |
| ≥ 12 GB | 4 | 64 | 64 |
| ≥ 8 GB | 2 | 32 | 32 |
| < 8 GB | 1 | 16 | 16 |
Formula batch size: batch_ratio * MFR_BASE_BATCH_SIZE (where MFR_BASE_BATCH_SIZE = 16)
OCR detection batch size: batch_ratio * OCR_DET_BASE_BATCH_SIZE (where OCR_DET_BASE_BATCH_SIZE = 16)
When set, this value is used directly instead of auto-detection. Recommended for client-server deployments to account for multiple concurrent clients.
Client-Server Deployment Guidelines:
| Single Client VRAM | Recommended MINERU_HYBRID_BATCH_RATIO |
|---|---|
| ≤ 2.5 GB | 1 |
| ≤ 3 GB | 2 |
| ≤ 4.5 GB | 4 |
| ≤ 6 GB | 8 |
Note: Reserve one client's worth of VRAM as overall redundancy when deploying multiple concurrent clients.
Sources: mineru/backend/hybrid/hybrid_analyze.py323-366
The HybridModelSingleton class manages hybrid-specific models with lazy initialization:
Figure 8: HybridModelSingleton Architecture
The singleton ensures models are loaded only once per configuration, reducing memory overhead and initialization time.
Sources: Referenced in mineru/backend/hybrid/hybrid_analyze.py220-224
result_to_middle_json FunctionThis function orchestrates the transformation of model outputs into the standardized middle.json format:
Figure 9: Middle JSON Generation Pipeline
np_img field, performs OCR recognitionSources: mineru/backend/hybrid/hybrid_model_output_to_middle_json.py135-212
doc_analyzeFigure 10: Synchronous doc_analyze Flow
Sources: mineru/backend/hybrid/hybrid_analyze.py384-454
aio_doc_analyzeMirrors doc_analyze but uses async operations:
await predictor.aio_batch_two_step_extract() instead of predictor.batch_two_step_extract()Sources: mineru/backend/hybrid/hybrid_analyze.py456-526
do_parseThe orchestration layer in common.py handles backend routing:
Figure 11: Orchestration Layer Backend Selection
Key differences from VLM backend:
MINERU_VLM_FORMULA_ENABLE is always set to "true" (formulas handled by specialized models)MINERU_VLM_TABLE_ENABLE is user-configurable via table_enable parameterinline_formula_enable parameter controls inline formula processing in hybrid modeSources: mineru/cli/common.py465-483 mineru/cli/common.py538-555
| Variable | Values | Default | Effect |
|---|---|---|---|
MINERU_FORCE_VLM_OCR_ENABLE | "1", "true", "yes" | Not set | Forces VLM-OCR mode |
MINERU_HYBRID_FORCE_PIPELINE_ENABLE | "1", "true", "yes" | Not set | Forces hybrid mode |
MINERU_HYBRID_BATCH_RATIO | Integer | Auto-detect | Overrides batch ratio calculation |
MINERU_VLM_TABLE_ENABLE | "true", "false" | "true" | Enables table recognition |
MINERU_VLM_FORMULA_ENABLE | "true", "false" | "true" | Always true in hybrid backend |
parse_method values:
"auto": Classify document automatically (text-based vs image-based)"txt": Assume text-based PDF, disable OCR"ocr": Force OCR processinglanguage values: See Multi-Language Support for complete list (e.g., "ch", "en", "latin", "arabic")
Sources: mineru/backend/hybrid/hybrid_analyze.py384-395
Advantages:
Limitations:
Advantages:
Overhead:
From system overview:
Sources: Referenced in Diagram 2 of system overview
Async/Sync Mismatch:
vllm-async-engine cannot be used in do_parse (sync mode)vllm-engine cannot be used in aio_do_parse (async mode)auto-engine for automatic selectionModel Loading Failures:
MINERU_MODEL_SOURCE environment variableGPU Memory Exhaustion:
MINERU_HYBRID_BATCH_RATIO to lower valueget_vram(device)Sources: mineru/cli/common.py468-470 mineru/cli/common.py541-542
Sources: mineru/cli/client.py60-69
The API automatically:
backend="hybrid-auto-engine" to hybrid processing{output_dir}/{pdf_name}/hybrid_{parse_method}/middle.json, Markdown, and optional debug outputsSources: mineru/cli/fast_api.py154-162 mineru/cli/fast_api.py286-289
The Gradio interface exposes hybrid backend selection:
Dynamic UI Updates:
Sources: mineru/cli/gradio_app.py411-415 mineru/cli/gradio_app.py326-344
The Hybrid Backend achieves high accuracy by:
middle.json compatible with both VLM and pipeline backendsDefault Choice: The hybrid backend is MinerU's recommended option for production use, balancing accuracy, language support, and performance.
Sources: mineru/backend/hybrid/hybrid_analyze.py1-527 mineru/backend/hybrid/hybrid_model_output_to_middle_json.py1-212
Refresh this wiki