Reference Documentation

Relevant source files

Purpose and Scope

This page provides technical reference materials and specifications for the MinerU system. It documents the core type definitions, data structures, output formats, and configuration specifications that developers and integrators need when working with MinerU's APIs and output files.

For usage examples and quick-start guides, see Getting Started. For detailed architectural explanations, see System Architecture. For output file format specifications and examples, see Output File Formats.

This reference documentation covers:

Core Enumerations: BlockType, ContentType, and related type systems
Data Structure Specifications: Middle JSON format, content list schemas
Configuration Constants: Model paths, delimiters, flags
Documentation System: MkDocs setup and i18n structure

Core Type System and Enumerations

MinerU uses a comprehensive type system defined in enumerations to classify document elements. These types flow through the entire processing pipeline from layout detection to final output generation.

BlockType Taxonomy

The BlockType class defines all block-level structural elements recognized by MinerU. Blocks are rectangular regions on a page that contain cohesive content.

BlockType Definitions:

BlockType	Value	Description	Introduced
`TEXT`	`'text'`	Regular paragraph text	Core
`TITLE`	`'title'`	Section headings (levels 1-4)	Core
`LIST`	`'list'`	Ordered or unordered lists	Core
`INDEX`	`'index'`	Index entries	Core
`IMAGE`	`'image'`	Image container block	Core
`IMAGE_BODY`	`'image_body'`	Actual image content	Core
`IMAGE_CAPTION`	`'image_caption'`	Image caption text	Core
`IMAGE_FOOTNOTE`	`'image_footnote'`	Image footnote text	Core
`TABLE`	`'table'`	Table container block	Core
`TABLE_BODY`	`'table_body'`	Actual table content	Core
`TABLE_CAPTION`	`'table_caption'`	Table caption text	Core
`TABLE_FOOTNOTE`	`'table_footnote'`	Table footnote text	Core
`INTERLINE_EQUATION`	`'interline_equation'`	Display (block) equations	Core
`CODE`	`'code'`	Code block container	VLM 2.5
`CODE_BODY`	`'code_body'`	Code content	VLM 2.5
`CODE_CAPTION`	`'code_caption'`	Code caption	VLM 2.5
`ALGORITHM`	`'algorithm'`	Algorithm pseudocode	VLM 2.5
`REF_TEXT`	`'ref_text'`	References/bibliography	VLM 2.5
`PHONETIC`	`'phonetic'`	Phonetic annotations	VLM 2.5
`HEADER`	`'header'`	Page header	VLM 2.5
`FOOTER`	`'footer'`	Page footer	VLM 2.5
`PAGE_NUMBER`	`'page_number'`	Page numbering	VLM 2.5
`ASIDE_TEXT`	`'aside_text'`	Sidebar/margin text	VLM 2.5
`PAGE_FOOTNOTE`	`'page_footnote'`	Page-level footnotes	VLM 2.5
`DISCARDED`	`'discarded'`	Filtered content (watermarks, etc.)	Core

Sources: mineru/utils/enum_class.py3-31

ContentType Hierarchy

The ContentType and ContentTypeV2 classes define span-level content types. Spans are inline elements within lines of text.

ContentType vs ContentTypeV2:

ContentType: Original type system used for basic span classification and content_list output format
ContentTypeV2: Enhanced type system introduced for content_list_v2 output format with finer-grained classifications

ContentTypeV2 Classifications:

Category	Type	Value	Description
Block Types	PARAGRAPH	`'paragraph'`	Text paragraphs
	TITLE	`'title'`	Headings with level metadata
	LIST	`'list'`	List container
	IMAGE	`'image'`	Image with source and captions
	TABLE	`'table'`	Table with HTML and metadata
	CODE	`'code'`	Code block with language
	ALGORITHM	`'algorithm'`	Algorithm pseudocode
	EQUATION_INTERLINE	`'equation_interline'`	Display equations
Span Types	SPAN_TEXT	`'text'`	Regular inline text
	SPAN_EQUATION_INLINE	`'equation_inline'`	Inline math expressions
	SPAN_PHONETIC	`'phonetic'`	Phonetic annotations
List Subtypes	LIST_TEXT	`'text_list'`	Regular lists
	LIST_REF	`'reference_list'`	Reference lists
Table Subtypes	TABLE_SIMPLE	`'simple_table'`	Tables without colspan/rowspan
	TABLE_COMPLEX	`'complex_table'`	Tables with merged cells
Page Elements	PAGE_HEADER	`'page_header'`	Page headers
	PAGE_FOOTER	`'page_footer'`	Page footers
	PAGE_NUMBER	`'page_number'`	Page numbers
	PAGE_ASIDE_TEXT	`'page_aside_text'`	Margin notes
	PAGE_FOOTNOTE	`'page_footnote'`	Page footnotes

Sources: mineru/utils/enum_class.py33-66

MakeMode Output Format Options

The MakeMode enumeration defines the available output format modes when converting middle JSON to final outputs.

MakeMode Definitions:

Mode	Value	Description	Use Case
`MM_MD`	`'mm_markdown'`	Multimodal Markdown with all content types	Full document rendering with images, tables, equations
`NLP_MD`	`'nlp_markdown'`	Text-only Markdown	NLP pipelines, text extraction, RAG systems
`CONTENT_LIST`	`'content_list'`	Flat list of content items (legacy)	Simple structured output, backward compatibility
`CONTENT_LIST_V2`	`'content_list_v2'`	Hierarchical content with enhanced types	Advanced document analysis, fine-grained extraction

Sources: mineru/utils/enum_class.py86-90 mineru/backend/vlm/vlm_middle_json_mkcontent.py609-648

Data Structure Specifications

Middle JSON Format

The middle JSON format is the central intermediate representation used by all MinerU backends. It standardizes the output of Pipeline, VLM, and Hybrid backends before conversion to final formats.

Middle JSON Schema:

Key Fields:

pdf_info: List of page objects, one per page
page_idx: Zero-based page index
page_size: [width, height] in points
para_blocks: List of paragraph-level blocks after sorting and merging
discarded_blocks: Filtered content (watermarks, headers, footers)
preproc_blocks: Internal field used during processing, contains blocks before paragraph splitting
bbox: Bounding box [x0, y0, x1, y1] where (0,0) is top-left corner
lines: List of text lines within a block
spans: List of inline content elements within a line
index: Reading order index assigned during block sorting

Hierarchical Block Structure:

Some block types like IMAGE, TABLE, and CODE contain nested blocks arrays:

Sources: mineru/backend/pipeline/model_json_to_middle_json.py256-263 mineru/backend/vlm/vlm_middle_json_mkcontent.py609-648

Content List Format (V1)

The content_list format is a flattened representation of document content, generated when make_mode='content_list'.

Content List V1 Schema Example:

Bbox Normalization:

Bounding boxes in content list are normalized to a 1000x1000 coordinate space:

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py187-283 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py182-261

Content List Format V2

The content_list_v2 format provides enhanced hierarchical structure with page-level grouping and span-level detail.

Content List V2 Schema Example:

Key Differences from V1:

Page-level grouping: Outer list represents pages, inner lists represent blocks on each page
Span arrays: Text content is represented as arrays of typed spans for fine-grained extraction
Enhanced metadata: Tables include table_type (simple/complex) and table_nest_level
Unified structure: Captions and footnotes are consistently represented as span arrays

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py285-484 mineru/backend/vlm/vlm_middle_json_mkcontent.py527-606

Configuration Constants and Flags

LaTeX Delimiter Configuration

LaTeX equation delimiters are configurable through mineru.json or use defaults:

Default Delimiters:

Usage in Code:

Custom delimiters can be configured via get_latex_delimiter_config() to support different rendering engines (e.g., MathJax vs KaTeX).

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py10-22 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py92-104

SplitFlag Annotations

The SplitFlag class defines special markers used during document processing:

Flag	Value	Description
`CROSS_PAGE`	`'cross_page'`	Marks blocks/elements that span multiple pages (e.g., table footnotes from merged tables)
`LINES_DELETED`	`'lines_deleted'`	Marks blocks whose lines have been deleted during cross-page merging

Cross-Page Table Merging Example:

When tables are merged across pages, the merged table's footnote on the second page is marked:

This flag is checked during visualization to skip rendering duplicate content:

Sources: mineru/utils/enum_class.py110-112 mineru/utils/table_merge.py530-532 mineru/utils/draw_bbox.py157-159

ModelPath Constants

The ModelPath class centralizes model repository paths for both HuggingFace and ModelScope:

Complete ModelPath Definitions:

Constant	HuggingFace Path	ModelScope Path
`vlm_root_hf`	`opendatalab/MinerU2.5-2509-1.2B`	-
`vlm_root_modelscope`	-	`OpenDataLab/MinerU2.5-2509-1.2B`
`pipeline_root_hf`	`opendatalab/PDF-Extract-Kit-1.0`	-
`pipeline_root_modelscope`	-	`OpenDataLab/PDF-Extract-Kit-1.0`

Component Model Subpaths:

Component	Relative Path
Layout Detection	`models/Layout/YOLO/doclayout_yolo_docstructbench_imgsz1280_2501.pt`
Formula Detection	`models/MFD/YOLO/yolo_v8_ft.pt`
Formula Recognition (UniMERNet)	`models/MFR/unimernet_hf_small_2503`
Formula Recognition (PP)	`models/MFR/pp_formulanet_plus_m`
OCR	`models/OCR/paddleocr_torch`
Reading Order	`models/ReadingOrder/layout_reader`
Table Recognition	`models/TabRec/SlanetPlus/slanet-plus.onnx`
Table Structure	`models/TabRec/UnetStructure/unet.onnx`
Table Classification	`models/TabCls/paddle_table_cls/PP-LCNet_x1_0_table_cls.onnx`
Orientation	`models/OriCls/paddle_orientation_classification/PP-LCNet_x1_0_doc_ori.onnx`

Sources: mineru/utils/enum_class.py93-107

Output Format Generation Flow

This diagram shows how middle JSON is transformed into different output formats using the union_make function:

union_make Function Signature:

Parameters:

pdf_info_dict: The pdf_info list from middle JSON
make_mode: One of MakeMode.MM_MD, MakeMode.NLP_MD, MakeMode.CONTENT_LIST, MakeMode.CONTENT_LIST_V2
img_buket_path: Base path for image references (e.g., 'auto' → 'auto/image_0_1.jpg')

Returns:

Markdown modes: Single string with pages joined by \n\n
Content list modes: List of content items (V1) or list of page arrays (V2)

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py609-649

Text Processing and Language Handling

MinerU implements sophisticated text processing that adapts to different language contexts:

Hyphen Handling Example:

CJK vs Western Spacing:

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py25-91 mineru/utils/char_utils.py5-15 mineru/utils/char_utils.py18-35

Debug and Visualization Outputs

MinerU generates debug PDFs for quality inspection:

Layout Visualization (layout.pdf)

Shows block-level structure with color-coded types and reading order numbers.

Color Legend (RGB values used in code):

Block Type	RGB Color	Fill/Stroke
Dropped blocks	`[158, 158, 158]`	Fill
Table body	`[204, 204, 0]`	Fill
Table caption	`[255, 255, 102]`	Fill
Table footnote	`[229, 255, 204]`	Fill
Image body	`[153, 255, 51]`	Fill
Image caption	`[102, 178, 255]`	Fill
Image footnote	`[255, 178, 102]`	Fill
Code body	`[102, 0, 204]`	Fill
Code caption	`[204, 153, 255]`	Fill
Title	`[102, 102, 255]`	Fill
Text	`[153, 0, 76]`	Fill
Interline equation	`[0, 255, 0]`	Fill
List	`[40, 169, 92]`	Fill
List items	`[40, 169, 92]`	Stroke only
Index	`[40, 169, 92]`	Fill
Reading order	`[255, 0, 0]`	Numbers only

Drawing Functions:

draw_bbox_without_number(): Draws filled rectangles for block types
draw_bbox_with_number(): Draws reading order numbers at block corners
cal_canvas_rect(): Handles PDF rotation (0°, 90°, 180°, 270°) for correct placement

Sources: mineru/utils/draw_bbox.py120-289

Span Visualization (span.pdf)

Shows span-level elements (inline content) with different colors.

Span Color Legend:

Span Type	RGB Color	Description
Text	`[255, 0, 0]`	Regular text spans
Inline equation	`[0, 255, 0]`	Inline math expressions
Interline equation	`[0, 0, 255]`	Display equations
Image	`[255, 204, 0]`	Image regions
Table	`[204, 0, 255]`	Table regions
Dropped	`[158, 158, 158]`	Filtered spans

Sources: mineru/utils/draw_bbox.py292-392

Reading Order Visualization

Shows the sequential reading order of blocks with numbered annotations:

The reading order is determined by:

LayoutLMv3 model (for VLM backend)
XY-cut algorithm (fallback for pipeline backend)

Each block and line is assigned an index field during sorting that determines rendering sequence.

Sources: mineru/utils/draw_bbox.py395-473

Documentation System (MkDocs)

MinerU uses MkDocs with Material theme and i18n plugin for multilingual documentation.

Configuration Structure

Navigation Structure:

The nav section defines the documentation hierarchy, which appears twice in the config (once in the Home section with descriptions, once as top-level for actual navigation):

i18n Configuration:

This creates separate builds for English (en/) and Chinese (zh/) with translated navigation items.

Markdown Extensions:

Extension	Purpose
`admonition`	Note/warning/info boxes
`pymdownx.details`	Collapsible sections
`attr_list`	Custom HTML attributes
`def_list`	Definition lists
`gfm_admonition`	GitHub-style admonitions
`pymdownx.highlight`	Syntax highlighting with Pygments
`pymdownx.superfences`	Fenced code blocks with syntax
`pymdownx.tasklist`	Task lists with checkboxes

Sources: mkdocs.yml1-158

Summary

This reference documentation has covered:

Type System: Complete taxonomy of BlockType, ContentType, and ContentTypeV2 enumerations
Data Structures: Middle JSON format, content list V1/V2 schemas with examples
Output Modes: MakeMode options for different output format requirements
Configuration: LaTeX delimiters, split flags, model path constants
Text Processing: Language-aware spacing and hyphen handling
Debug Outputs: Color-coded visualization PDFs for quality inspection
Documentation System: MkDocs configuration with i18n support

For implementation details of specific backends, see:

Pipeline Backend: Page 5
VLM Backend: Page 6
Hybrid Backend: Page 7

For output file format examples and usage, see Output File Formats.

Reference Documentation

Relevant source files

Purpose and Scope

This reference documentation covers:

Core Enumerations: BlockType, ContentType, and related type systems
Data Structure Specifications: Middle JSON format, content list schemas
Configuration Constants: Model paths, delimiters, flags
Documentation System: MkDocs setup and i18n structure

Core Type System and Enumerations

BlockType Taxonomy

The BlockType class defines all block-level structural elements recognized by MinerU. Blocks are rectangular regions on a page that contain cohesive content.

BlockType Definitions:

BlockType	Value	Description	Introduced
`TEXT`	`'text'`	Regular paragraph text	Core
`TITLE`	`'title'`	Section headings (levels 1-4)	Core
`LIST`	`'list'`	Ordered or unordered lists	Core
`INDEX`	`'index'`	Index entries	Core
`IMAGE`	`'image'`	Image container block	Core
`IMAGE_BODY`	`'image_body'`	Actual image content	Core
`IMAGE_CAPTION`	`'image_caption'`	Image caption text	Core
`IMAGE_FOOTNOTE`	`'image_footnote'`	Image footnote text	Core
`TABLE`	`'table'`	Table container block	Core
`TABLE_BODY`	`'table_body'`	Actual table content	Core
`TABLE_CAPTION`	`'table_caption'`	Table caption text	Core
`TABLE_FOOTNOTE`	`'table_footnote'`	Table footnote text	Core
`INTERLINE_EQUATION`	`'interline_equation'`	Display (block) equations	Core
`CODE`	`'code'`	Code block container	VLM 2.5
`CODE_BODY`	`'code_body'`	Code content	VLM 2.5
`CODE_CAPTION`	`'code_caption'`	Code caption	VLM 2.5
`ALGORITHM`	`'algorithm'`	Algorithm pseudocode	VLM 2.5
`REF_TEXT`	`'ref_text'`	References/bibliography	VLM 2.5
`PHONETIC`	`'phonetic'`	Phonetic annotations	VLM 2.5
`HEADER`	`'header'`	Page header	VLM 2.5
`FOOTER`	`'footer'`	Page footer	VLM 2.5
`PAGE_NUMBER`	`'page_number'`	Page numbering	VLM 2.5
`ASIDE_TEXT`	`'aside_text'`	Sidebar/margin text	VLM 2.5
`PAGE_FOOTNOTE`	`'page_footnote'`	Page-level footnotes	VLM 2.5
`DISCARDED`	`'discarded'`	Filtered content (watermarks, etc.)	Core

Sources: mineru/utils/enum_class.py3-31

ContentType Hierarchy

The ContentType and ContentTypeV2 classes define span-level content types. Spans are inline elements within lines of text.

ContentType vs ContentTypeV2:

ContentType: Original type system used for basic span classification and content_list output format
ContentTypeV2: Enhanced type system introduced for content_list_v2 output format with finer-grained classifications

ContentTypeV2 Classifications:

Category	Type	Value	Description
Block Types	PARAGRAPH	`'paragraph'`	Text paragraphs
	TITLE	`'title'`	Headings with level metadata
	LIST	`'list'`	List container
	IMAGE	`'image'`	Image with source and captions
	TABLE	`'table'`	Table with HTML and metadata
	CODE	`'code'`	Code block with language
	ALGORITHM	`'algorithm'`	Algorithm pseudocode
	EQUATION_INTERLINE	`'equation_interline'`	Display equations
Span Types	SPAN_TEXT	`'text'`	Regular inline text
	SPAN_EQUATION_INLINE	`'equation_inline'`	Inline math expressions
	SPAN_PHONETIC	`'phonetic'`	Phonetic annotations
List Subtypes	LIST_TEXT	`'text_list'`	Regular lists
	LIST_REF	`'reference_list'`	Reference lists
Table Subtypes	TABLE_SIMPLE	`'simple_table'`	Tables without colspan/rowspan
	TABLE_COMPLEX	`'complex_table'`	Tables with merged cells
Page Elements	PAGE_HEADER	`'page_header'`	Page headers
	PAGE_FOOTER	`'page_footer'`	Page footers
	PAGE_NUMBER	`'page_number'`	Page numbers
	PAGE_ASIDE_TEXT	`'page_aside_text'`	Margin notes
	PAGE_FOOTNOTE	`'page_footnote'`	Page footnotes

Sources: mineru/utils/enum_class.py33-66

MakeMode Output Format Options

The MakeMode enumeration defines the available output format modes when converting middle JSON to final outputs.

MakeMode Definitions:

Mode	Value	Description	Use Case
`MM_MD`	`'mm_markdown'`	Multimodal Markdown with all content types	Full document rendering with images, tables, equations
`NLP_MD`	`'nlp_markdown'`	Text-only Markdown	NLP pipelines, text extraction, RAG systems
`CONTENT_LIST`	`'content_list'`	Flat list of content items (legacy)	Simple structured output, backward compatibility
`CONTENT_LIST_V2`	`'content_list_v2'`	Hierarchical content with enhanced types	Advanced document analysis, fine-grained extraction

Sources: mineru/utils/enum_class.py86-90 mineru/backend/vlm/vlm_middle_json_mkcontent.py609-648

Data Structure Specifications

Middle JSON Format

The middle JSON format is the central intermediate representation used by all MinerU backends. It standardizes the output of Pipeline, VLM, and Hybrid backends before conversion to final formats.

Middle JSON Schema:

Key Fields:

pdf_info: List of page objects, one per page
page_idx: Zero-based page index
page_size: [width, height] in points
para_blocks: List of paragraph-level blocks after sorting and merging
discarded_blocks: Filtered content (watermarks, headers, footers)
preproc_blocks: Internal field used during processing, contains blocks before paragraph splitting
bbox: Bounding box [x0, y0, x1, y1] where (0,0) is top-left corner
lines: List of text lines within a block
spans: List of inline content elements within a line
index: Reading order index assigned during block sorting

Hierarchical Block Structure:

Some block types like IMAGE, TABLE, and CODE contain nested blocks arrays:

Sources: mineru/backend/pipeline/model_json_to_middle_json.py256-263 mineru/backend/vlm/vlm_middle_json_mkcontent.py609-648

Content List Format (V1)

The content_list format is a flattened representation of document content, generated when make_mode='content_list'.

Content List V1 Schema Example:

Bbox Normalization:

Bounding boxes in content list are normalized to a 1000x1000 coordinate space:

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py187-283 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py182-261

Content List Format V2

The content_list_v2 format provides enhanced hierarchical structure with page-level grouping and span-level detail.

Content List V2 Schema Example:

Key Differences from V1:

Page-level grouping: Outer list represents pages, inner lists represent blocks on each page
Span arrays: Text content is represented as arrays of typed spans for fine-grained extraction
Enhanced metadata: Tables include table_type (simple/complex) and table_nest_level
Unified structure: Captions and footnotes are consistently represented as span arrays

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py285-484 mineru/backend/vlm/vlm_middle_json_mkcontent.py527-606

Configuration Constants and Flags

LaTeX Delimiter Configuration

LaTeX equation delimiters are configurable through mineru.json or use defaults:

Default Delimiters:

Usage in Code:

Custom delimiters can be configured via get_latex_delimiter_config() to support different rendering engines (e.g., MathJax vs KaTeX).

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py10-22 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py92-104

SplitFlag Annotations

The SplitFlag class defines special markers used during document processing:

Flag	Value	Description
`CROSS_PAGE`	`'cross_page'`	Marks blocks/elements that span multiple pages (e.g., table footnotes from merged tables)
`LINES_DELETED`	`'lines_deleted'`	Marks blocks whose lines have been deleted during cross-page merging

Cross-Page Table Merging Example:

When tables are merged across pages, the merged table's footnote on the second page is marked:

This flag is checked during visualization to skip rendering duplicate content:

Sources: mineru/utils/enum_class.py110-112 mineru/utils/table_merge.py530-532 mineru/utils/draw_bbox.py157-159

ModelPath Constants

The ModelPath class centralizes model repository paths for both HuggingFace and ModelScope:

Complete ModelPath Definitions:

Constant	HuggingFace Path	ModelScope Path
`vlm_root_hf`	`opendatalab/MinerU2.5-2509-1.2B`	-
`vlm_root_modelscope`	-	`OpenDataLab/MinerU2.5-2509-1.2B`
`pipeline_root_hf`	`opendatalab/PDF-Extract-Kit-1.0`	-
`pipeline_root_modelscope`	-	`OpenDataLab/PDF-Extract-Kit-1.0`

Component Model Subpaths:

Component	Relative Path
Layout Detection	`models/Layout/YOLO/doclayout_yolo_docstructbench_imgsz1280_2501.pt`
Formula Detection	`models/MFD/YOLO/yolo_v8_ft.pt`
Formula Recognition (UniMERNet)	`models/MFR/unimernet_hf_small_2503`
Formula Recognition (PP)	`models/MFR/pp_formulanet_plus_m`
OCR	`models/OCR/paddleocr_torch`
Reading Order	`models/ReadingOrder/layout_reader`
Table Recognition	`models/TabRec/SlanetPlus/slanet-plus.onnx`
Table Structure	`models/TabRec/UnetStructure/unet.onnx`
Table Classification	`models/TabCls/paddle_table_cls/PP-LCNet_x1_0_table_cls.onnx`
Orientation	`models/OriCls/paddle_orientation_classification/PP-LCNet_x1_0_doc_ori.onnx`

Sources: mineru/utils/enum_class.py93-107

Output Format Generation Flow

This diagram shows how middle JSON is transformed into different output formats using the union_make function:

union_make Function Signature:

Parameters:

pdf_info_dict: The pdf_info list from middle JSON
make_mode: One of MakeMode.MM_MD, MakeMode.NLP_MD, MakeMode.CONTENT_LIST, MakeMode.CONTENT_LIST_V2
img_buket_path: Base path for image references (e.g., 'auto' → 'auto/image_0_1.jpg')

Returns:

Markdown modes: Single string with pages joined by \n\n
Content list modes: List of content items (V1) or list of page arrays (V2)

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py609-649

Text Processing and Language Handling

MinerU implements sophisticated text processing that adapts to different language contexts:

Hyphen Handling Example:

CJK vs Western Spacing:

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py25-91 mineru/utils/char_utils.py5-15 mineru/utils/char_utils.py18-35

Debug and Visualization Outputs

MinerU generates debug PDFs for quality inspection:

Layout Visualization (layout.pdf)

Shows block-level structure with color-coded types and reading order numbers.

Color Legend (RGB values used in code):

Block Type	RGB Color	Fill/Stroke
Dropped blocks	`[158, 158, 158]`	Fill
Table body	`[204, 204, 0]`	Fill
Table caption	`[255, 255, 102]`	Fill
Table footnote	`[229, 255, 204]`	Fill
Image body	`[153, 255, 51]`	Fill
Image caption	`[102, 178, 255]`	Fill
Image footnote	`[255, 178, 102]`	Fill
Code body	`[102, 0, 204]`	Fill
Code caption	`[204, 153, 255]`	Fill
Title	`[102, 102, 255]`	Fill
Text	`[153, 0, 76]`	Fill
Interline equation	`[0, 255, 0]`	Fill
List	`[40, 169, 92]`	Fill
List items	`[40, 169, 92]`	Stroke only
Index	`[40, 169, 92]`	Fill
Reading order	`[255, 0, 0]`	Numbers only

Drawing Functions:

draw_bbox_without_number(): Draws filled rectangles for block types
draw_bbox_with_number(): Draws reading order numbers at block corners
cal_canvas_rect(): Handles PDF rotation (0°, 90°, 180°, 270°) for correct placement

Sources: mineru/utils/draw_bbox.py120-289

Span Visualization (span.pdf)

Shows span-level elements (inline content) with different colors.

Span Color Legend:

Span Type	RGB Color	Description
Text	`[255, 0, 0]`	Regular text spans
Inline equation	`[0, 255, 0]`	Inline math expressions
Interline equation	`[0, 0, 255]`	Display equations
Image	`[255, 204, 0]`	Image regions
Table	`[204, 0, 255]`	Table regions
Dropped	`[158, 158, 158]`	Filtered spans

Sources: mineru/utils/draw_bbox.py292-392

Reading Order Visualization

Shows the sequential reading order of blocks with numbered annotations:

The reading order is determined by:

LayoutLMv3 model (for VLM backend)
XY-cut algorithm (fallback for pipeline backend)

Each block and line is assigned an index field during sorting that determines rendering sequence.

Sources: mineru/utils/draw_bbox.py395-473

Documentation System (MkDocs)

MinerU uses MkDocs with Material theme and i18n plugin for multilingual documentation.

Configuration Structure

Navigation Structure:

The nav section defines the documentation hierarchy, which appears twice in the config (once in the Home section with descriptions, once as top-level for actual navigation):

i18n Configuration:

This creates separate builds for English (en/) and Chinese (zh/) with translated navigation items.

Markdown Extensions:

Extension	Purpose
`admonition`	Note/warning/info boxes
`pymdownx.details`	Collapsible sections
`attr_list`	Custom HTML attributes
`def_list`	Definition lists
`gfm_admonition`	GitHub-style admonitions
`pymdownx.highlight`	Syntax highlighting with Pygments
`pymdownx.superfences`	Fenced code blocks with syntax
`pymdownx.tasklist`	Task lists with checkboxes

Sources: mkdocs.yml1-158

Summary

This reference documentation has covered:

Type System: Complete taxonomy of BlockType, ContentType, and ContentTypeV2 enumerations
Data Structures: Middle JSON format, content list V1/V2 schemas with examples
Output Modes: MakeMode options for different output format requirements
Configuration: LaTeX delimiters, split flags, model path constants
Text Processing: Language-aware spacing and hyphen handling
Debug Outputs: Color-coded visualization PDFs for quality inspection
Documentation System: MkDocs configuration with i18n support

For implementation details of specific backends, see:

Pipeline Backend: Page 5
VLM Backend: Page 6
Hybrid Backend: Page 7

For output file format examples and usage, see Output File Formats.

Reference Documentation

Purpose and Scope

Core Type System and Enumerations

BlockType Taxonomy

ContentType Hierarchy

MakeMode Output Format Options

Data Structure Specifications

Middle JSON Format

Content List Format (V1)

Content List Format V2

Configuration Constants and Flags

LaTeX Delimiter Configuration

SplitFlag Annotations

ModelPath Constants

Output Format Generation Flow

Text Processing and Language Handling

Debug and Visualization Outputs

Layout Visualization (layout.pdf)

Span Visualization (span.pdf)

Reading Order Visualization

Documentation System (MkDocs)

Configuration Structure

Summary

On this page

Reference Documentation

Purpose and Scope

Core Type System and Enumerations

BlockType Taxonomy

ContentType Hierarchy

MakeMode Output Format Options

Data Structure Specifications

Middle JSON Format

Content List Format (V1)

Content List Format V2

Configuration Constants and Flags

LaTeX Delimiter Configuration

SplitFlag Annotations

ModelPath Constants

Output Format Generation Flow

Text Processing and Language Handling

Debug and Visualization Outputs

Layout Visualization (layout.pdf)

Span Visualization (span.pdf)

Reading Order Visualization

Documentation System (MkDocs)

Configuration Structure

Summary

On this page