The Pipeline Backend is MinerU's traditional computer vision-based document parsing system that processes PDFs through a sequential pipeline of specialized models. It provides CPU-friendly processing with competitive accuracy (82+) and is suitable for deployments where GPU resources are limited or unavailable. This backend uses batch processing techniques to optimize throughput across multiple specialized models including layout detection, OCR, formula recognition, and table structure recognition.
For the VLM-based approach using vision-language models, see VLM Backend. For the hybrid approach that combines VLM with Pipeline components, see Hybrid Backend. For the overall backend selection logic, see Core Orchestration.
The Pipeline Backend implements a sequential batch processing architecture where documents flow through multiple specialized models in a fixed order. Each stage processes all images in batch before passing results to the next stage, optimizing for throughput and memory efficiency.
Sources: mineru/backend/pipeline/batch_analyze.py25-436 mineru/backend/pipeline/model_init.py201-271
The BatchAnalyze class serves as the main orchestrator, managing the flow of image batches through each specialized model stage. It receives a list of images with metadata and returns layout results containing all detected elements (text, tables, formulas, images) with their bounding boxes, types, and content.
The Pipeline Backend uses a sophisticated singleton pattern to manage model instances and prevent redundant model loading across multiple parsing operations.
Sources: mineru/backend/pipeline/model_init.py121-198 mineru/backend/pipeline/model_init.py201-271
The AtomModelSingleton class mineru/backend/pipeline/model_init.py121-152 implements a cache keyed by model name and parameters:
| Model Type | Cache Key Components | Purpose |
|---|---|---|
| Layout, MFD, MFR, TableCls, ImgOrientationCls | atom_model_name only | Single global instance |
| OCR | (atom_model_name, det_db_box_thresh, lang, det_db_unclip_ratio, enable_merge_det_boxes) | Per-language and threshold instances |
| WiredTable, WirelessTable | (atom_model_name, lang) | Per-language instances |
This design ensures that:
Sources: mineru/backend/pipeline/model_init.py130-152
The Pipeline Backend uses configurable batch ratios to optimize throughput across different hardware configurations:
The BatchAnalyze.__init__() accepts a batch_ratio parameter that scales these base values mineru/backend/pipeline/batch_analyze.py26-31 For example, with batch_ratio=2, formula recognition processes 32 images per batch (2 × 16).
Sources: mineru/backend/pipeline/batch_analyze.py17-22
The Pipeline Backend employs several optimization strategies:
| Model Stage | Base Batch Size | Adjustable via batch_ratio |
|---|---|---|
| Layout Detection | 1 | No |
| Formula Detection | 1 | No |
| Formula Recognition | 16 | Yes |
| OCR Detection | 16 | Yes |
| Table Orientation | 16 | No |
| Table Classification | 16 | No |
Sources: mineru/backend/pipeline/batch_analyze.py17-22
The pipeline includes explicit memory cleanup at critical points mineru/backend/pipeline/batch_analyze.py74:
This prevents GPU memory accumulation during long processing runs by clearing caches when free VRAM falls below 8GB.
OCR detection groups images by resolution (rounded to 64-pixel increments) before batching, enabling efficient GPU kernel compilation and execution mineru/backend/pipeline/batch_analyze.py283-310
Both table OCR and text OCR group processing by language mineru/backend/pipeline/batch_analyze.py170-189 mineru/backend/pipeline/batch_analyze.py396-434 ensuring language-specific models are loaded once and reused across all images.
Sources: mineru/backend/pipeline/batch_analyze.py74 mineru/backend/pipeline/batch_analyze.py238-331
Layout detection identifies the structural elements in document pages using the DocLayout-YOLO model.
Sources: mineru/backend/pipeline/batch_analyze.py50-54
The layout model processes all images in a single batch call:
Each detected element includes:
poly: 8-element list [x1, y1, x2, y1, x2, y2, x1, y2] defining bounding boxcategory_id: Integer identifying element typescore: Detection confidence scoreThe layout results serve as the foundation for subsequent processing stages, determining which regions require OCR, which are formulas, and which are tables.
Sources: mineru/backend/pipeline/batch_analyze.py52-54 mineru/model/layout/doclayoutyolo.py (referenced)
After table processing, the pipeline performs OCR on text regions identified by layout detection.
Sources: mineru/backend/pipeline/batch_analyze.py238-434
The batch detection mode implements sophisticated optimization by grouping images by resolution mineru/backend/pipeline/batch_analyze.py239-331:
This approach enables efficient GPU utilization by processing same-sized images together, avoiding dynamic shape compilation overhead.
Batch detection is disabled on certain platforms mineru/backend/pipeline/model_init.py296-311:
Sources: mineru/backend/pipeline/batch_analyze.py239-331 mineru/backend/pipeline/model_init.py296-311
After detection, text boxes undergo several processing steps mineru/backend/pipeline/batch_analyze.py316-331:
sorted_boxes() orders boxes top-to-bottom, left-to-rightmerge_det_boxes() combines adjacent boxesupdate_det_boxes() adjusts boxes to avoid overlapping with formulasget_ocr_result_list() creates final OCR result entriesRecognition groups text crops by language and processes them in batches mineru/backend/pipeline/batch_analyze.py366-434:
Low-confidence results (score < OcrConfidence.min_confidence) are marked with category_id=16 to indicate potential OCR errors mineru/backend/pipeline/batch_analyze.py416-432
Sources: mineru/backend/pipeline/batch_analyze.py366-434
Formula processing uses a two-stage approach: detection followed by recognition.
Sources: mineru/backend/pipeline/batch_analyze.py56-74
The formula detection model (MFD - Mathematical Formula Detection) identifies formula regions across all images mineru/backend/pipeline/batch_analyze.py58-60:
The formula recognition model (MFR - Mathematical Formula Recognition) then processes these regions in larger batches mineru/backend/pipeline/batch_analyze.py63-67:
MinerU supports two MFR models, selected via the MINERU_FORMULA_CH_SUPPORT environment variable mineru/backend/pipeline/model_init.py21-28:
unimernet_small: Default, optimized for English/Western formulaspp_formulanet_plus_m: Supports Chinese formulasAfter formula processing, the system clears GPU memory to prevent OOM errors during subsequent stages mineru/backend/pipeline/batch_analyze.py74
Sources: mineru/backend/pipeline/batch_analyze.py56-74 mineru/backend/pipeline/model_init.py21-28
Table recognition is the most complex pipeline stage, involving orientation detection, wired/wireless classification, OCR extraction, and HTML generation.
Sources: mineru/backend/pipeline/batch_analyze.py111-236
Tables may be rotated in scanned documents. The orientation classifier detects portrait-oriented tables (aspect ratio > 1.2) and uses OCR detection to identify vertical text patterns mineru/model/ori_cls/paddle_ori_cls.py66-115:
The batch orientation prediction groups tables by resolution for efficient processing mineru/model/ori_cls/paddle_ori_cls.py174-260
Sources: mineru/backend/pipeline/batch_analyze.py114-130 mineru/model/ori_cls/paddle_ori_cls.py16-283
Tables are classified as either "wired" (with visible grid lines) or "wireless" (borderless) mineru/backend/pipeline/batch_analyze.py132-142:
The classification uses a 224×224 ONNX model trained to distinguish table types mineru/model/table/cls/paddle_table_cls.py15-149
Sources: mineru/backend/pipeline/batch_analyze.py132-142 mineru/model/table/cls/paddle_table_cls.py15-25
Tables require dedicated OCR processing to extract cell text mineru/backend/pipeline/batch_analyze.py144-189:
This approach optimizes for accuracy by using language-specific OCR models while maintaining reasonable performance through batch recognition.
Sources: mineru/backend/pipeline/batch_analyze.py144-189
The wireless table model uses SLANet+ architecture to predict table structure without relying on grid lines mineru/model/table/rec/slanet_plus/main.py149-212:
The model:
Sources: mineru/backend/pipeline/batch_analyze.py194-197 mineru/model/table/rec/slanet_plus/main.py36-212
The wired table model uses UNet architecture to detect table grid lines mineru/model/table/rec/unet_table/main.py257-350:
The wired model:
The selection logic compares:
Tables are processed with wired model if mineru/backend/pipeline/batch_analyze.py203-209:
Sources: mineru/backend/pipeline/batch_analyze.py199-226 mineru/model/table/rec/unet_table/main.py45-350
The Pipeline Backend returns images_layout_res, a list of layout results where each element corresponds to one input image mineru/backend/pipeline/batch_analyze.py436
Each image's layout result is a list of dictionaries with the following structure:
| Field | Type | Description |
|---|---|---|
poly | List[float] | 8-element list [x1,y1,x2,y1,x2,y2,x1,y2] defining bounding box |
category_id | int | Element type identifier (text, title, table, image, formula, etc.) |
score | float | Detection/recognition confidence score |
text | str | OCR text content (for text blocks) |
latex | str | LaTeX formula (for formula blocks) |
html | str | HTML table structure (for table blocks) |
Example categories:
category_id=15: Text requiring OCRcategory_id=16: Low-confidence OCR resultThe results from BatchAnalyze.__call__() are passed to the pipeline_result_to_middle_json() function for conversion to the standardized middle.json format. See middle.json Intermediate Format for details on the conversion process.
Sources: mineru/backend/pipeline/batch_analyze.py33-436
The Pipeline Backend adapts to different hardware platforms:
For Huawei Ascend NPUs, special handling ensures compatibility mineru/backend/pipeline/model_init.py329-338:
Model initialization also converts device strings to torch.device objects for NPU devices mineru/backend/pipeline/model_init.py75-76 mineru/backend/pipeline/model_init.py92-94
The ocr_det_batch_setting() function determines whether batch OCR detection is supported mineru/backend/pipeline/model_init.py296-311:
This ensures stable operation across diverse hardware environments.
Sources: mineru/backend/pipeline/model_init.py296-338
The Pipeline Backend integrates with MinerU's core orchestration layer through the MineruPipelineModel and HybridModelSingleton classes. See Core Orchestration for details on how backends are selected and invoked.
The BatchAnalyze class is instantiated by backend-specific code and called with image batches:
Refresh this wiki