Pipeline Backend

Relevant source files

Purpose and Scope

The Pipeline Backend is MinerU's traditional computer vision-based document parsing system that processes PDFs through a sequential pipeline of specialized models. It provides CPU-friendly processing with competitive accuracy (82+) and is suitable for deployments where GPU resources are limited or unavailable. This backend uses batch processing techniques to optimize throughput across multiple specialized models including layout detection, OCR, formula recognition, and table structure recognition.

For the VLM-based approach using vision-language models, see VLM Backend. For the hybrid approach that combines VLM with Pipeline components, see Hybrid Backend. For the overall backend selection logic, see Core Orchestration.

5.1 Pipeline Architecture and Flow

The Pipeline Backend implements a sequential batch processing architecture where documents flow through multiple specialized models in a fixed order. Each stage processes all images in batch before passing results to the next stage, optimizing for throughput and memory efficiency.

Pipeline Processing Flow

Sources: mineru/backend/pipeline/batch_analyze.py25-436 mineru/backend/pipeline/model_init.py201-271

The BatchAnalyze class serves as the main orchestrator, managing the flow of image batches through each specialized model stage. It receives a list of images with metadata and returns layout results containing all detected elements (text, tables, formulas, images) with their bounding boxes, types, and content.

5.2 Batch Processing and Model Orchestration

Model Management with Singleton Pattern

The Pipeline Backend uses a sophisticated singleton pattern to manage model instances and prevent redundant model loading across multiple parsing operations.

Model Singleton Architecture

Sources: mineru/backend/pipeline/model_init.py121-198 mineru/backend/pipeline/model_init.py201-271

The AtomModelSingleton class mineru/backend/pipeline/model_init.py121-152 implements a cache keyed by model name and parameters:

Model Type	Cache Key Components	Purpose
Layout, MFD, MFR, TableCls, ImgOrientationCls	`atom_model_name` only	Single global instance
OCR	`(atom_model_name, det_db_box_thresh, lang, det_db_unclip_ratio, enable_merge_det_boxes)`	Per-language and threshold instances
WiredTable, WirelessTable	`(atom_model_name, lang)`	Per-language instances

This design ensures that:

Models are loaded only once across multiple document parsing operations
Language-specific models (OCR, tables) are cached separately for different languages
Parameter variations (OCR thresholds) create distinct cached instances

Sources: mineru/backend/pipeline/model_init.py130-152

Batch Processing Configuration

The Pipeline Backend uses configurable batch ratios to optimize throughput across different hardware configurations:

The BatchAnalyze.__init__() accepts a batch_ratio parameter that scales these base values mineru/backend/pipeline/batch_analyze.py26-31 For example, with batch_ratio=2, formula recognition processes 32 images per batch (2 × 16).

Sources: mineru/backend/pipeline/batch_analyze.py17-22

Performance Optimization

The Pipeline Backend employs several optimization strategies:

Batch Size Configuration

Model Stage	Base Batch Size	Adjustable via `batch_ratio`
Layout Detection	1	No
Formula Detection	1	No
Formula Recognition	16	Yes
OCR Detection	16	Yes
Table Orientation	16	No
Table Classification	16	No

Sources: mineru/backend/pipeline/batch_analyze.py17-22

Memory Management

The pipeline includes explicit memory cleanup at critical points mineru/backend/pipeline/batch_analyze.py74:

This prevents GPU memory accumulation during long processing runs by clearing caches when free VRAM falls below 8GB.

Resolution Grouping

OCR detection groups images by resolution (rounded to 64-pixel increments) before batching, enabling efficient GPU kernel compilation and execution mineru/backend/pipeline/batch_analyze.py283-310

Language Grouping

Both table OCR and text OCR group processing by language mineru/backend/pipeline/batch_analyze.py170-189 mineru/backend/pipeline/batch_analyze.py396-434 ensuring language-specific models are loaded once and reused across all images.

Sources: mineru/backend/pipeline/batch_analyze.py74 mineru/backend/pipeline/batch_analyze.py238-331

5.3 Layout Detection with DocLayout-YOLO

Layout detection identifies the structural elements in document pages using the DocLayout-YOLO model.

Layout Detection Process

Sources: mineru/backend/pipeline/batch_analyze.py50-54

The layout model processes all images in a single batch call:

Each detected element includes:

poly: 8-element list [x1, y1, x2, y1, x2, y2, x1, y2] defining bounding box
category_id: Integer identifying element type
score: Detection confidence score

The layout results serve as the foundation for subsequent processing stages, determining which regions require OCR, which are formulas, and which are tables.

Sources: mineru/backend/pipeline/batch_analyze.py52-54 mineru/model/layout/doclayoutyolo.py (referenced)

5.4 OCR Processing with PaddleOCR

After table processing, the pipeline performs OCR on text regions identified by layout detection.

OCR Processing Architecture

Sources: mineru/backend/pipeline/batch_analyze.py238-434

OCR Detection with Resolution Grouping

The batch detection mode implements sophisticated optimization by grouping images by resolution mineru/backend/pipeline/batch_analyze.py239-331:

This approach enables efficient GPU utilization by processing same-sized images together, avoiding dynamic shape compilation overhead.

Batch detection is disabled on certain platforms mineru/backend/pipeline/model_init.py296-311:

PyTorch >= 2.8.0 (compatibility issues)
Apple Silicon MPS devices
CoreX devices

Sources: mineru/backend/pipeline/batch_analyze.py239-331 mineru/backend/pipeline/model_init.py296-311

OCR Text Processing

After detection, text boxes undergo several processing steps mineru/backend/pipeline/batch_analyze.py316-331:

Sorting: sorted_boxes() orders boxes top-to-bottom, left-to-right
Merging: merge_det_boxes() combines adjacent boxes
Formula-aware updating: update_det_boxes() adjusts boxes to avoid overlapping with formulas
Result construction: get_ocr_result_list() creates final OCR result entries

OCR Recognition by Language

Recognition groups text crops by language and processes them in batches mineru/backend/pipeline/batch_analyze.py366-434:

Low-confidence results (score < OcrConfidence.min_confidence) are marked with category_id=16 to indicate potential OCR errors mineru/backend/pipeline/batch_analyze.py416-432

Sources: mineru/backend/pipeline/batch_analyze.py366-434

5.5 Formula Detection and Recognition

Formula processing uses a two-stage approach: detection followed by recognition.

Formula Processing Pipeline

Sources: mineru/backend/pipeline/batch_analyze.py56-74

The formula detection model (MFD - Mathematical Formula Detection) identifies formula regions across all images mineru/backend/pipeline/batch_analyze.py58-60:

The formula recognition model (MFR - Mathematical Formula Recognition) then processes these regions in larger batches mineru/backend/pipeline/batch_analyze.py63-67:

MinerU supports two MFR models, selected via the MINERU_FORMULA_CH_SUPPORT environment variable mineru/backend/pipeline/model_init.py21-28:

unimernet_small: Default, optimized for English/Western formulas
pp_formulanet_plus_m: Supports Chinese formulas

After formula processing, the system clears GPU memory to prevent OOM errors during subsequent stages mineru/backend/pipeline/batch_analyze.py74

Sources: mineru/backend/pipeline/batch_analyze.py56-74 mineru/backend/pipeline/model_init.py21-28

5.6 Table Recognition (Wired and Wireless)

Table recognition is the most complex pipeline stage, involving orientation detection, wired/wireless classification, OCR extraction, and HTML generation.

Table Recognition Flow

Sources: mineru/backend/pipeline/batch_analyze.py111-236

Orientation Detection and Rotation

Tables may be rotated in scanned documents. The orientation classifier detects portrait-oriented tables (aspect ratio > 1.2) and uses OCR detection to identify vertical text patterns mineru/model/ori_cls/paddle_ori_cls.py66-115:

The batch orientation prediction groups tables by resolution for efficient processing mineru/model/ori_cls/paddle_ori_cls.py174-260

Sources: mineru/backend/pipeline/batch_analyze.py114-130 mineru/model/ori_cls/paddle_ori_cls.py16-283

Wired vs Wireless Classification

Tables are classified as either "wired" (with visible grid lines) or "wireless" (borderless) mineru/backend/pipeline/batch_analyze.py132-142:

The classification uses a 224×224 ONNX model trained to distinguish table types mineru/model/table/cls/paddle_table_cls.py15-149

Sources: mineru/backend/pipeline/batch_analyze.py132-142 mineru/model/table/cls/paddle_table_cls.py15-25

Table OCR Processing

Tables require dedicated OCR processing to extract cell text mineru/backend/pipeline/batch_analyze.py144-189:

Detection Phase: OCR detection runs on each table individually (no batching) mineru/backend/pipeline/batch_analyze.py152-167
Language Grouping: Detected text boxes are grouped by language mineru/backend/pipeline/batch_analyze.py157-167
Recognition Phase: OCR recognition runs in batches per language mineru/backend/pipeline/batch_analyze.py170-189

This approach optimizes for accuracy by using language-specific OCR models while maintaining reasonable performance through batch recognition.

Sources: mineru/backend/pipeline/batch_analyze.py144-189

Wireless Table Structure Recognition

The wireless table model uses SLANet+ architecture to predict table structure without relying on grid lines mineru/model/table/rec/slanet_plus/main.py149-212:

The model:

Predicts logical grid structure from cell positions
Matches OCR boxes to predicted cells
Generates HTML with proper cell merging

Sources: mineru/backend/pipeline/batch_analyze.py194-197 mineru/model/table/rec/slanet_plus/main.py36-212

Wired Table Structure Recognition

The wired table model uses UNet architecture to detect table grid lines mineru/model/table/rec/unet_table/main.py257-350:

The wired model:

Detects horizontal and vertical grid lines
Reconstructs logical cell structure from line intersections
Fills OCR results into detected cells, performing additional OCR for empty cells mineru/model/table/rec/unet_table/main.py168-237
Compares with wireless result and selects the better output mineru/model/table/rec/unet_table/main.py294-349

The selection logic compares:

Physical cell counts (merged cells count as one)
Non-blank cell counts
OCR text coverage

Tables are processed with wired model if mineru/backend/pipeline/batch_analyze.py203-209:

Classified as wired table, OR
Classified as wireless but with confidence < 0.9

Sources: mineru/backend/pipeline/batch_analyze.py199-226 mineru/model/table/rec/unet_table/main.py45-350

Output Format

The Pipeline Backend returns images_layout_res, a list of layout results where each element corresponds to one input image mineru/backend/pipeline/batch_analyze.py436

Layout Result Structure

Each image's layout result is a list of dictionaries with the following structure:

Field	Type	Description
`poly`	`List[float]`	8-element list `[x1,y1,x2,y1,x2,y2,x1,y2]` defining bounding box
`category_id`	`int`	Element type identifier (text, title, table, image, formula, etc.)
`score`	`float`	Detection/recognition confidence score
`text`	`str`	OCR text content (for text blocks)
`latex`	`str`	LaTeX formula (for formula blocks)
`html`	`str`	HTML table structure (for table blocks)

Example categories:

category_id=15: Text requiring OCR
category_id=16: Low-confidence OCR result
Formula and table category IDs vary by layout model

The results from BatchAnalyze.__call__() are passed to the pipeline_result_to_middle_json() function for conversion to the standardized middle.json format. See middle.json Intermediate Format for details on the conversion process.

Sources: mineru/backend/pipeline/batch_analyze.py33-436

Platform Compatibility

The Pipeline Backend adapts to different hardware platforms:

NPU Support

For Huawei Ascend NPUs, special handling ensures compatibility mineru/backend/pipeline/model_init.py329-338:

Model initialization also converts device strings to torch.device objects for NPU devices mineru/backend/pipeline/model_init.py75-76 mineru/backend/pipeline/model_init.py92-94

Batch Processing Compatibility

The ocr_det_batch_setting() function determines whether batch OCR detection is supported mineru/backend/pipeline/model_init.py296-311:

Disabled for PyTorch >= 2.8.0
Disabled for Apple MPS devices
Disabled for CoreX devices
Enabled otherwise

This ensures stable operation across diverse hardware environments.

Sources: mineru/backend/pipeline/model_init.py296-338

Integration with Core Orchestration

The Pipeline Backend integrates with MinerU's core orchestration layer through the MineruPipelineModel and HybridModelSingleton classes. See Core Orchestration for details on how backends are selected and invoked.

The BatchAnalyze class is instantiated by backend-specific code and called with image batches:

Sources: mineru/backend/pipeline/batch_analyze.py25-31

Pipeline Backend

Relevant source files

Purpose and Scope

5.1 Pipeline Architecture and Flow

Pipeline Processing Flow

Sources: mineru/backend/pipeline/batch_analyze.py25-436 mineru/backend/pipeline/model_init.py201-271

5.2 Batch Processing and Model Orchestration

Model Management with Singleton Pattern

The Pipeline Backend uses a sophisticated singleton pattern to manage model instances and prevent redundant model loading across multiple parsing operations.

Model Singleton Architecture

Sources: mineru/backend/pipeline/model_init.py121-198 mineru/backend/pipeline/model_init.py201-271

The AtomModelSingleton class mineru/backend/pipeline/model_init.py121-152 implements a cache keyed by model name and parameters:

Model Type	Cache Key Components	Purpose
Layout, MFD, MFR, TableCls, ImgOrientationCls	`atom_model_name` only	Single global instance
OCR	`(atom_model_name, det_db_box_thresh, lang, det_db_unclip_ratio, enable_merge_det_boxes)`	Per-language and threshold instances
WiredTable, WirelessTable	`(atom_model_name, lang)`	Per-language instances

This design ensures that:

Models are loaded only once across multiple document parsing operations
Language-specific models (OCR, tables) are cached separately for different languages
Parameter variations (OCR thresholds) create distinct cached instances

Sources: mineru/backend/pipeline/model_init.py130-152

Batch Processing Configuration

The Pipeline Backend uses configurable batch ratios to optimize throughput across different hardware configurations:

Sources: mineru/backend/pipeline/batch_analyze.py17-22

Performance Optimization

The Pipeline Backend employs several optimization strategies:

Batch Size Configuration

Model Stage	Base Batch Size	Adjustable via `batch_ratio`
Layout Detection	1	No
Formula Detection	1	No
Formula Recognition	16	Yes
OCR Detection	16	Yes
Table Orientation	16	No
Table Classification	16	No

Sources: mineru/backend/pipeline/batch_analyze.py17-22

Memory Management

The pipeline includes explicit memory cleanup at critical points mineru/backend/pipeline/batch_analyze.py74:

This prevents GPU memory accumulation during long processing runs by clearing caches when free VRAM falls below 8GB.

Resolution Grouping

OCR detection groups images by resolution (rounded to 64-pixel increments) before batching, enabling efficient GPU kernel compilation and execution mineru/backend/pipeline/batch_analyze.py283-310

Language Grouping

Sources: mineru/backend/pipeline/batch_analyze.py74 mineru/backend/pipeline/batch_analyze.py238-331

5.3 Layout Detection with DocLayout-YOLO

Layout detection identifies the structural elements in document pages using the DocLayout-YOLO model.

Layout Detection Process

Sources: mineru/backend/pipeline/batch_analyze.py50-54

The layout model processes all images in a single batch call:

Each detected element includes:

poly: 8-element list [x1, y1, x2, y1, x2, y2, x1, y2] defining bounding box
category_id: Integer identifying element type
score: Detection confidence score

The layout results serve as the foundation for subsequent processing stages, determining which regions require OCR, which are formulas, and which are tables.

Sources: mineru/backend/pipeline/batch_analyze.py52-54 mineru/model/layout/doclayoutyolo.py (referenced)

5.4 OCR Processing with PaddleOCR

After table processing, the pipeline performs OCR on text regions identified by layout detection.

OCR Processing Architecture

Sources: mineru/backend/pipeline/batch_analyze.py238-434

OCR Detection with Resolution Grouping

The batch detection mode implements sophisticated optimization by grouping images by resolution mineru/backend/pipeline/batch_analyze.py239-331:

This approach enables efficient GPU utilization by processing same-sized images together, avoiding dynamic shape compilation overhead.

Batch detection is disabled on certain platforms mineru/backend/pipeline/model_init.py296-311:

PyTorch >= 2.8.0 (compatibility issues)
Apple Silicon MPS devices
CoreX devices

Sources: mineru/backend/pipeline/batch_analyze.py239-331 mineru/backend/pipeline/model_init.py296-311

OCR Text Processing

After detection, text boxes undergo several processing steps mineru/backend/pipeline/batch_analyze.py316-331:

Sorting: sorted_boxes() orders boxes top-to-bottom, left-to-right
Merging: merge_det_boxes() combines adjacent boxes
Formula-aware updating: update_det_boxes() adjusts boxes to avoid overlapping with formulas
Result construction: get_ocr_result_list() creates final OCR result entries

OCR Recognition by Language

Recognition groups text crops by language and processes them in batches mineru/backend/pipeline/batch_analyze.py366-434:

Low-confidence results (score < OcrConfidence.min_confidence) are marked with category_id=16 to indicate potential OCR errors mineru/backend/pipeline/batch_analyze.py416-432

Sources: mineru/backend/pipeline/batch_analyze.py366-434

5.5 Formula Detection and Recognition

Formula processing uses a two-stage approach: detection followed by recognition.

Formula Processing Pipeline

Sources: mineru/backend/pipeline/batch_analyze.py56-74

The formula detection model (MFD - Mathematical Formula Detection) identifies formula regions across all images mineru/backend/pipeline/batch_analyze.py58-60:

The formula recognition model (MFR - Mathematical Formula Recognition) then processes these regions in larger batches mineru/backend/pipeline/batch_analyze.py63-67:

MinerU supports two MFR models, selected via the MINERU_FORMULA_CH_SUPPORT environment variable mineru/backend/pipeline/model_init.py21-28:

unimernet_small: Default, optimized for English/Western formulas
pp_formulanet_plus_m: Supports Chinese formulas

After formula processing, the system clears GPU memory to prevent OOM errors during subsequent stages mineru/backend/pipeline/batch_analyze.py74

Sources: mineru/backend/pipeline/batch_analyze.py56-74 mineru/backend/pipeline/model_init.py21-28

5.6 Table Recognition (Wired and Wireless)

Table recognition is the most complex pipeline stage, involving orientation detection, wired/wireless classification, OCR extraction, and HTML generation.

Table Recognition Flow

Sources: mineru/backend/pipeline/batch_analyze.py111-236

Orientation Detection and Rotation

The batch orientation prediction groups tables by resolution for efficient processing mineru/model/ori_cls/paddle_ori_cls.py174-260

Sources: mineru/backend/pipeline/batch_analyze.py114-130 mineru/model/ori_cls/paddle_ori_cls.py16-283

Wired vs Wireless Classification

Tables are classified as either "wired" (with visible grid lines) or "wireless" (borderless) mineru/backend/pipeline/batch_analyze.py132-142:

The classification uses a 224×224 ONNX model trained to distinguish table types mineru/model/table/cls/paddle_table_cls.py15-149

Sources: mineru/backend/pipeline/batch_analyze.py132-142 mineru/model/table/cls/paddle_table_cls.py15-25

Table OCR Processing

Tables require dedicated OCR processing to extract cell text mineru/backend/pipeline/batch_analyze.py144-189:

Detection Phase: OCR detection runs on each table individually (no batching) mineru/backend/pipeline/batch_analyze.py152-167
Language Grouping: Detected text boxes are grouped by language mineru/backend/pipeline/batch_analyze.py157-167
Recognition Phase: OCR recognition runs in batches per language mineru/backend/pipeline/batch_analyze.py170-189

This approach optimizes for accuracy by using language-specific OCR models while maintaining reasonable performance through batch recognition.

Sources: mineru/backend/pipeline/batch_analyze.py144-189

Wireless Table Structure Recognition

The wireless table model uses SLANet+ architecture to predict table structure without relying on grid lines mineru/model/table/rec/slanet_plus/main.py149-212:

The model:

Predicts logical grid structure from cell positions
Matches OCR boxes to predicted cells
Generates HTML with proper cell merging

Sources: mineru/backend/pipeline/batch_analyze.py194-197 mineru/model/table/rec/slanet_plus/main.py36-212

Wired Table Structure Recognition

The wired table model uses UNet architecture to detect table grid lines mineru/model/table/rec/unet_table/main.py257-350:

The wired model:

Detects horizontal and vertical grid lines
Reconstructs logical cell structure from line intersections
Fills OCR results into detected cells, performing additional OCR for empty cells mineru/model/table/rec/unet_table/main.py168-237
Compares with wireless result and selects the better output mineru/model/table/rec/unet_table/main.py294-349

The selection logic compares:

Physical cell counts (merged cells count as one)
Non-blank cell counts
OCR text coverage

Tables are processed with wired model if mineru/backend/pipeline/batch_analyze.py203-209:

Classified as wired table, OR
Classified as wireless but with confidence < 0.9

Sources: mineru/backend/pipeline/batch_analyze.py199-226 mineru/model/table/rec/unet_table/main.py45-350

Output Format

The Pipeline Backend returns images_layout_res, a list of layout results where each element corresponds to one input image mineru/backend/pipeline/batch_analyze.py436

Layout Result Structure

Each image's layout result is a list of dictionaries with the following structure:

Field	Type	Description
`poly`	`List[float]`	8-element list `[x1,y1,x2,y1,x2,y2,x1,y2]` defining bounding box
`category_id`	`int`	Element type identifier (text, title, table, image, formula, etc.)
`score`	`float`	Detection/recognition confidence score
`text`	`str`	OCR text content (for text blocks)
`latex`	`str`	LaTeX formula (for formula blocks)
`html`	`str`	HTML table structure (for table blocks)

Example categories:

category_id=15: Text requiring OCR
category_id=16: Low-confidence OCR result
Formula and table category IDs vary by layout model

Sources: mineru/backend/pipeline/batch_analyze.py33-436

Platform Compatibility

The Pipeline Backend adapts to different hardware platforms:

NPU Support

For Huawei Ascend NPUs, special handling ensures compatibility mineru/backend/pipeline/model_init.py329-338:

Model initialization also converts device strings to torch.device objects for NPU devices mineru/backend/pipeline/model_init.py75-76 mineru/backend/pipeline/model_init.py92-94

Batch Processing Compatibility

The ocr_det_batch_setting() function determines whether batch OCR detection is supported mineru/backend/pipeline/model_init.py296-311:

Disabled for PyTorch >= 2.8.0
Disabled for Apple MPS devices
Disabled for CoreX devices
Enabled otherwise

This ensures stable operation across diverse hardware environments.

Sources: mineru/backend/pipeline/model_init.py296-338

Integration with Core Orchestration

The BatchAnalyze class is instantiated by backend-specific code and called with image batches:

Sources: mineru/backend/pipeline/batch_analyze.py25-31

Pipeline Backend

Purpose and Scope

5.1 Pipeline Architecture and Flow

Pipeline Processing Flow

5.2 Batch Processing and Model Orchestration

Model Management with Singleton Pattern

Model Singleton Architecture

Batch Processing Configuration

Performance Optimization

Batch Size Configuration

Memory Management

Resolution Grouping

Language Grouping

5.3 Layout Detection with DocLayout-YOLO

Layout Detection Process

5.4 OCR Processing with PaddleOCR

OCR Processing Architecture

OCR Detection with Resolution Grouping

OCR Text Processing

OCR Recognition by Language

5.5 Formula Detection and Recognition

Formula Processing Pipeline

5.6 Table Recognition (Wired and Wireless)

Table Recognition Flow

Orientation Detection and Rotation

Wired vs Wireless Classification

Table OCR Processing

Wireless Table Structure Recognition

Wired Table Structure Recognition

Output Format

Layout Result Structure

Platform Compatibility

NPU Support

Batch Processing Compatibility

Integration with Core Orchestration

On this page

Pipeline Backend

Purpose and Scope

5.1 Pipeline Architecture and Flow

Pipeline Processing Flow

5.2 Batch Processing and Model Orchestration

Model Management with Singleton Pattern

Model Singleton Architecture

Batch Processing Configuration

Performance Optimization

Batch Size Configuration

Memory Management

Resolution Grouping

Language Grouping

5.3 Layout Detection with DocLayout-YOLO

Layout Detection Process

5.4 OCR Processing with PaddleOCR

OCR Processing Architecture

OCR Detection with Resolution Grouping

OCR Text Processing

OCR Recognition by Language

5.5 Formula Detection and Recognition

Formula Processing Pipeline

5.6 Table Recognition (Wired and Wireless)

Table Recognition Flow

Orientation Detection and Rotation

Wired vs Wireless Classification

Table OCR Processing

Wireless Table Structure Recognition

Wired Table Structure Recognition

Output Format

Layout Result Structure

Platform Compatibility

NPU Support

Batch Processing Compatibility

Integration with Core Orchestration

On this page