This page provides technical reference materials and specifications for the MinerU system. It documents the core type definitions, data structures, output formats, and configuration specifications that developers and integrators need when working with MinerU's APIs and output files.
For usage examples and quick-start guides, see Getting Started. For detailed architectural explanations, see System Architecture. For output file format specifications and examples, see Output File Formats.
This reference documentation covers:
MinerU uses a comprehensive type system defined in enumerations to classify document elements. These types flow through the entire processing pipeline from layout detection to final output generation.
The BlockType class defines all block-level structural elements recognized by MinerU. Blocks are rectangular regions on a page that contain cohesive content.
BlockType Definitions:
| BlockType | Value | Description | Introduced |
|---|---|---|---|
TEXT | 'text' | Regular paragraph text | Core |
TITLE | 'title' | Section headings (levels 1-4) | Core |
LIST | 'list' | Ordered or unordered lists | Core |
INDEX | 'index' | Index entries | Core |
IMAGE | 'image' | Image container block | Core |
IMAGE_BODY | 'image_body' | Actual image content | Core |
IMAGE_CAPTION | 'image_caption' | Image caption text | Core |
IMAGE_FOOTNOTE | 'image_footnote' | Image footnote text | Core |
TABLE | 'table' | Table container block | Core |
TABLE_BODY | 'table_body' | Actual table content | Core |
TABLE_CAPTION | 'table_caption' | Table caption text | Core |
TABLE_FOOTNOTE | 'table_footnote' | Table footnote text | Core |
INTERLINE_EQUATION | 'interline_equation' | Display (block) equations | Core |
CODE | 'code' | Code block container | VLM 2.5 |
CODE_BODY | 'code_body' | Code content | VLM 2.5 |
CODE_CAPTION | 'code_caption' | Code caption | VLM 2.5 |
ALGORITHM | 'algorithm' | Algorithm pseudocode | VLM 2.5 |
REF_TEXT | 'ref_text' | References/bibliography | VLM 2.5 |
PHONETIC | 'phonetic' | Phonetic annotations | VLM 2.5 |
HEADER | 'header' | Page header | VLM 2.5 |
FOOTER | 'footer' | Page footer | VLM 2.5 |
PAGE_NUMBER | 'page_number' | Page numbering | VLM 2.5 |
ASIDE_TEXT | 'aside_text' | Sidebar/margin text | VLM 2.5 |
PAGE_FOOTNOTE | 'page_footnote' | Page-level footnotes | VLM 2.5 |
DISCARDED | 'discarded' | Filtered content (watermarks, etc.) | Core |
Sources: mineru/utils/enum_class.py3-31
The ContentType and ContentTypeV2 classes define span-level content types. Spans are inline elements within lines of text.
ContentType vs ContentTypeV2:
content_list output formatcontent_list_v2 output format with finer-grained classificationsContentTypeV2 Classifications:
| Category | Type | Value | Description |
|---|---|---|---|
| Block Types | PARAGRAPH | 'paragraph' | Text paragraphs |
| TITLE | 'title' | Headings with level metadata | |
| LIST | 'list' | List container | |
| IMAGE | 'image' | Image with source and captions | |
| TABLE | 'table' | Table with HTML and metadata | |
| CODE | 'code' | Code block with language | |
| ALGORITHM | 'algorithm' | Algorithm pseudocode | |
| EQUATION_INTERLINE | 'equation_interline' | Display equations | |
| Span Types | SPAN_TEXT | 'text' | Regular inline text |
| SPAN_EQUATION_INLINE | 'equation_inline' | Inline math expressions | |
| SPAN_PHONETIC | 'phonetic' | Phonetic annotations | |
| List Subtypes | LIST_TEXT | 'text_list' | Regular lists |
| LIST_REF | 'reference_list' | Reference lists | |
| Table Subtypes | TABLE_SIMPLE | 'simple_table' | Tables without colspan/rowspan |
| TABLE_COMPLEX | 'complex_table' | Tables with merged cells | |
| Page Elements | PAGE_HEADER | 'page_header' | Page headers |
| PAGE_FOOTER | 'page_footer' | Page footers | |
| PAGE_NUMBER | 'page_number' | Page numbers | |
| PAGE_ASIDE_TEXT | 'page_aside_text' | Margin notes | |
| PAGE_FOOTNOTE | 'page_footnote' | Page footnotes |
Sources: mineru/utils/enum_class.py33-66
The MakeMode enumeration defines the available output format modes when converting middle JSON to final outputs.
MakeMode Definitions:
| Mode | Value | Description | Use Case |
|---|---|---|---|
MM_MD | 'mm_markdown' | Multimodal Markdown with all content types | Full document rendering with images, tables, equations |
NLP_MD | 'nlp_markdown' | Text-only Markdown | NLP pipelines, text extraction, RAG systems |
CONTENT_LIST | 'content_list' | Flat list of content items (legacy) | Simple structured output, backward compatibility |
CONTENT_LIST_V2 | 'content_list_v2' | Hierarchical content with enhanced types | Advanced document analysis, fine-grained extraction |
Sources: mineru/utils/enum_class.py86-90 mineru/backend/vlm/vlm_middle_json_mkcontent.py609-648
The middle JSON format is the central intermediate representation used by all MinerU backends. It standardizes the output of Pipeline, VLM, and Hybrid backends before conversion to final formats.
Middle JSON Schema:
Key Fields:
pdf_info: List of page objects, one per pagepage_idx: Zero-based page indexpage_size: [width, height] in pointspara_blocks: List of paragraph-level blocks after sorting and mergingdiscarded_blocks: Filtered content (watermarks, headers, footers)preproc_blocks: Internal field used during processing, contains blocks before paragraph splittingbbox: Bounding box [x0, y0, x1, y1] where (0,0) is top-left cornerlines: List of text lines within a blockspans: List of inline content elements within a lineindex: Reading order index assigned during block sortingHierarchical Block Structure:
Some block types like IMAGE, TABLE, and CODE contain nested blocks arrays:
Sources: mineru/backend/pipeline/model_json_to_middle_json.py256-263 mineru/backend/vlm/vlm_middle_json_mkcontent.py609-648
The content_list format is a flattened representation of document content, generated when make_mode='content_list'.
Content List V1 Schema Example:
Bbox Normalization:
Bounding boxes in content list are normalized to a 1000x1000 coordinate space:
Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py187-283 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py182-261
The content_list_v2 format provides enhanced hierarchical structure with page-level grouping and span-level detail.
Content List V2 Schema Example:
Key Differences from V1:
table_type (simple/complex) and table_nest_levelSources: mineru/backend/vlm/vlm_middle_json_mkcontent.py285-484 mineru/backend/vlm/vlm_middle_json_mkcontent.py527-606
LaTeX equation delimiters are configurable through mineru.json or use defaults:
Default Delimiters:
Usage in Code:
Custom delimiters can be configured via get_latex_delimiter_config() to support different rendering engines (e.g., MathJax vs KaTeX).
Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py10-22 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py92-104
The SplitFlag class defines special markers used during document processing:
| Flag | Value | Description |
|---|---|---|
CROSS_PAGE | 'cross_page' | Marks blocks/elements that span multiple pages (e.g., table footnotes from merged tables) |
LINES_DELETED | 'lines_deleted' | Marks blocks whose lines have been deleted during cross-page merging |
Cross-Page Table Merging Example:
When tables are merged across pages, the merged table's footnote on the second page is marked:
This flag is checked during visualization to skip rendering duplicate content:
Sources: mineru/utils/enum_class.py110-112 mineru/utils/table_merge.py530-532 mineru/utils/draw_bbox.py157-159
The ModelPath class centralizes model repository paths for both HuggingFace and ModelScope:
Complete ModelPath Definitions:
| Constant | HuggingFace Path | ModelScope Path |
|---|---|---|
vlm_root_hf | opendatalab/MinerU2.5-2509-1.2B | - |
vlm_root_modelscope | - | OpenDataLab/MinerU2.5-2509-1.2B |
pipeline_root_hf | opendatalab/PDF-Extract-Kit-1.0 | - |
pipeline_root_modelscope | - | OpenDataLab/PDF-Extract-Kit-1.0 |
Component Model Subpaths:
| Component | Relative Path |
|---|---|
| Layout Detection | models/Layout/YOLO/doclayout_yolo_docstructbench_imgsz1280_2501.pt |
| Formula Detection | models/MFD/YOLO/yolo_v8_ft.pt |
| Formula Recognition (UniMERNet) | models/MFR/unimernet_hf_small_2503 |
| Formula Recognition (PP) | models/MFR/pp_formulanet_plus_m |
| OCR | models/OCR/paddleocr_torch |
| Reading Order | models/ReadingOrder/layout_reader |
| Table Recognition | models/TabRec/SlanetPlus/slanet-plus.onnx |
| Table Structure | models/TabRec/UnetStructure/unet.onnx |
| Table Classification | models/TabCls/paddle_table_cls/PP-LCNet_x1_0_table_cls.onnx |
| Orientation | models/OriCls/paddle_orientation_classification/PP-LCNet_x1_0_doc_ori.onnx |
Sources: mineru/utils/enum_class.py93-107
This diagram shows how middle JSON is transformed into different output formats using the union_make function:
union_make Function Signature:
Parameters:
pdf_info_dict: The pdf_info list from middle JSONmake_mode: One of MakeMode.MM_MD, MakeMode.NLP_MD, MakeMode.CONTENT_LIST, MakeMode.CONTENT_LIST_V2img_buket_path: Base path for image references (e.g., 'auto' → 'auto/image_0_1.jpg')Returns:
\n\nSources: mineru/backend/vlm/vlm_middle_json_mkcontent.py609-649
MinerU implements sophisticated text processing that adapts to different language contexts:
Hyphen Handling Example:
CJK vs Western Spacing:
Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py25-91 mineru/utils/char_utils.py5-15 mineru/utils/char_utils.py18-35
MinerU generates debug PDFs for quality inspection:
Shows block-level structure with color-coded types and reading order numbers.
Color Legend (RGB values used in code):
| Block Type | RGB Color | Fill/Stroke |
|---|---|---|
| Dropped blocks | [158, 158, 158] | Fill |
| Table body | [204, 204, 0] | Fill |
| Table caption | [255, 255, 102] | Fill |
| Table footnote | [229, 255, 204] | Fill |
| Image body | [153, 255, 51] | Fill |
| Image caption | [102, 178, 255] | Fill |
| Image footnote | [255, 178, 102] | Fill |
| Code body | [102, 0, 204] | Fill |
| Code caption | [204, 153, 255] | Fill |
| Title | [102, 102, 255] | Fill |
| Text | [153, 0, 76] | Fill |
| Interline equation | [0, 255, 0] | Fill |
| List | [40, 169, 92] | Fill |
| List items | [40, 169, 92] | Stroke only |
| Index | [40, 169, 92] | Fill |
| Reading order | [255, 0, 0] | Numbers only |
Drawing Functions:
draw_bbox_without_number(): Draws filled rectangles for block typesdraw_bbox_with_number(): Draws reading order numbers at block cornerscal_canvas_rect(): Handles PDF rotation (0°, 90°, 180°, 270°) for correct placementSources: mineru/utils/draw_bbox.py120-289
Shows span-level elements (inline content) with different colors.
Span Color Legend:
| Span Type | RGB Color | Description |
|---|---|---|
| Text | [255, 0, 0] | Regular text spans |
| Inline equation | [0, 255, 0] | Inline math expressions |
| Interline equation | [0, 0, 255] | Display equations |
| Image | [255, 204, 0] | Image regions |
| Table | [204, 0, 255] | Table regions |
| Dropped | [158, 158, 158] | Filtered spans |
Sources: mineru/utils/draw_bbox.py292-392
Shows the sequential reading order of blocks with numbered annotations:
The reading order is determined by:
Each block and line is assigned an index field during sorting that determines rendering sequence.
Sources: mineru/utils/draw_bbox.py395-473
MinerU uses MkDocs with Material theme and i18n plugin for multilingual documentation.
Navigation Structure:
The nav section defines the documentation hierarchy, which appears twice in the config (once in the Home section with descriptions, once as top-level for actual navigation):
i18n Configuration:
This creates separate builds for English (en/) and Chinese (zh/) with translated navigation items.
Markdown Extensions:
| Extension | Purpose |
|---|---|
admonition | Note/warning/info boxes |
pymdownx.details | Collapsible sections |
attr_list | Custom HTML attributes |
def_list | Definition lists |
gfm_admonition | GitHub-style admonitions |
pymdownx.highlight | Syntax highlighting with Pygments |
pymdownx.superfences | Fenced code blocks with syntax |
pymdownx.tasklist | Task lists with checkboxes |
Sources: mkdocs.yml1-158
This reference documentation has covered:
BlockType, ContentType, and ContentTypeV2 enumerationsMakeMode options for different output format requirementsFor implementation details of specific backends, see:
For output file format examples and usage, see Output File Formats.
Refresh this wiki