BlockType and ContentType Enums

Relevant source files

Purpose and Scope

This page documents the enumeration types that define the type taxonomy used throughout MinerU for categorizing document elements. These enums are fundamental to the document processing pipeline, appearing in the middle.json intermediate representation and all output formats.

The three main enum classes serve distinct purposes:

BlockType: Defines structural layout units (blocks) in document hierarchy. Used in MagicModel classes across all backends to categorize detected regions.
ContentType: Defines content-level elements (spans) within blocks. Used for atomic content units with bounding boxes.
ContentTypeV2: Enhanced type system for content_list_v2.json output format with richer metadata.

Additional enumerations in the same module:

CategoryId: Internal YOLO model detection categories (pipeline backend)
NotExtractType: Block types excluded from span extraction (hybrid backend)
MakeMode: Output format selection enum
ImageType: Image representation format enum

For information about the middle.json structure that uses these types, see page 9.2. For output format generation, see page 9.5.

Sources: mineru/utils/enum_class.py1-132 mineru/backend/vlm/vlm_magic_model.py1-11 mineru/backend/hybrid/hybrid_magic_model.py1-13

BlockType Enumeration

BlockType defines structural units in document layout. These are hierarchical layout elements that correspond to bounding boxes detected by layout analysis models or synthesized during processing.

Definition and Location

The BlockType class is defined in mineru/utils/enum_class.py as a collection of string constants (not a Python Enum subclass, but used as an enumeration pattern throughout the codebase). Each constant is a string literal representing a document element type.

Sources: mineru/utils/enum_class.py3-31

BlockType Taxonomy

The BlockType values are organized into several categories:

Category	BlockType Value	String Constant	Backend Usage
Composite Blocks	`IMAGE`	`"image"`	VLM/Hybrid: Created by `fix_two_layer_blocks()`
	`TABLE`	`"table"`	VLM/Hybrid: Created by `fix_two_layer_blocks()`
	`CODE`	`"code"`	VLM/Hybrid: Created by `fix_two_layer_blocks()`
Image Components	`IMAGE_BODY`	`"image_body"`	VLM: Mapped from `"image"` in line 70-71
	`IMAGE_CAPTION`	`"image_caption"`	VLM: Parsed from VLM output
	`IMAGE_FOOTNOTE`	`"image_footnote"`	VLM: Parsed from VLM output
Table Components	`TABLE_BODY`	`"table_body"`	VLM: Mapped from `"table"` in line 73-74
	`TABLE_CAPTION`	`"table_caption"`	VLM: Parsed from VLM output
	`TABLE_FOOTNOTE`	`"table_footnote"`	VLM: Parsed from VLM output
Code Components	`CODE_BODY`	`"code_body"`	VLM: Mapped from `"code"/"algorithm"`
	`CODE_CAPTION`	`"code_caption"`	VLM: Parsed from VLM output
Text Blocks	`TEXT`	`"text"`	All backends: Default text block
	`TITLE`	`"title"`	All backends: Section headings
	`LIST`	`"list"`	VLM/Hybrid: Container for list items
	`REF_TEXT`	`"ref_text"`	VLM: Reference text blocks
	`PHONETIC`	`"phonetic"`	VLM: Phonetic annotation blocks
Equations	`INTERLINE_EQUATION`	`"interline_equation"`	VLM: Mapped from `"equation"` in line 82-83
Meta Blocks	`INDEX`	`"index"`	Pipeline: Index entries
	`DISCARDED`	`"discarded"`	Pipeline: Filtered content
	`HEADER`	`"header"`	VLM: Page headers (line 211)
	`FOOTER`	`"footer"`	VLM: Page footers (line 211)
	`PAGE_NUMBER`	`"page_number"`	VLM: Page numbers (line 211)
	`ASIDE_TEXT`	`"aside_text"`	VLM: Sidebar text (line 211)
	`PAGE_FOOTNOTE`	`"page_footnote"`	VLM: Page footnotes (line 211)

Sources: mineru/utils/enum_class.py3-31 mineru/backend/vlm/vlm_magic_model.py51-83 mineru/backend/vlm/vlm_magic_model.py194-216 mineru/backend/hybrid/hybrid_magic_model.py84-116

BlockType Hierarchy

Sources: mineru/utils/enum_class.py3-31 mineru/backend/vlm/vlm_middle_json_mkcontent.py99-177

BlockType Usage in Code

BlockTypes are used throughout the codebase in three primary patterns:

Pattern 1: MagicModel Block Categorization

The MagicModel class in each backend categorizes blocks into type-specific lists:

Pattern 2: Block Type Mapping from VLM Output

The MagicModel.__init__() method maps raw VLM output strings to BlockType constants:

Pattern 3: Block Preprocessing and Filtering

The prepare_block_bboxes() function uses BlockType for filtering and validation:

Sources: mineru/backend/vlm/vlm_magic_model.py51-83 mineru/backend/vlm/vlm_magic_model.py194-216 mineru/backend/hybrid/hybrid_magic_model.py255-277 mineru/utils/block_pre_proc.py50-58

ContentType Enumeration

ContentType defines content-level elements (spans) that exist within blocks. Spans are the atomic units of content with specific bounding boxes and content.

Definition

Sources: mineru/utils/enum_class.py33-41

ContentType Values

ContentType Value	String Constant	Container Block	Span Fields	Assignment Logic
`TEXT`	`"text"`	TEXT, TITLE, LIST, etc.	`content`: string, `bbox`, `score`	VLM: line 68, 101
`IMAGE`	`"image"`	IMAGE_BODY	`bbox`, `score`, optional `image_path`	VLM: line 71, 89-92
`TABLE`	`"table"`	TABLE_BODY	`bbox`, `html`: HTML string, `score`	VLM: line 74, 93-94
`INLINE_EQUATION`	`"inline_equation"`	Any text block	`content`: LaTeX, `bbox`, `score`	VLM: line 132, 130-134
`INTERLINE_EQUATION`	`"interline_equation"`	INTERLINE_EQUATION	`content`: LaTeX, `bbox`	VLM: line 83, 95-100
`EQUATION`	`"equation"`	content_list output only	`content`: LaTeX	Output format enum
`CODE`	`"code"`	content_list output only	`content`: string	Output format enum

MagicModel Span Assignment Pattern:

Sources: mineru/utils/enum_class.py33-41 mineru/backend/vlm/vlm_magic_model.py68-83 mineru/backend/vlm/vlm_magic_model.py88-154

Span Structure with ContentType

Spans in middle.json have this general structure:

Sources: mineru/backend/pipeline/model_json_to_middle_json.py99-104

ContentType Usage Examples

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py28-49 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py106-178

ContentType in Markdown Generation

ContentType values control how spans are rendered in markdown output:

Sources: mineru/backend/pipeline/pipeline_middle_json_mkcontent.py124-133

ContentTypeV2 Enumeration

ContentTypeV2 is an enhanced type system introduced for the content_list_v2.json output format, providing more granular categorization and supporting complex nested structures.

Definition

Sources: mineru/utils/enum_class.py43-66

ContentTypeV2 Categories

Block-Level Types

ContentTypeV2 Value	Description	Maps from BlockType
`PARAGRAPH`	Regular paragraph	TEXT, PHONETIC
`TITLE`	Section heading	TITLE
`CODE`	Code block	CODE (when sub_type=CODE)
`ALGORITHM`	Algorithm block	CODE (when sub_type=ALGORITHM)
`EQUATION_INTERLINE`	Display equation	INTERLINE_EQUATION
`IMAGE`	Image with metadata	IMAGE
`TABLE`	Table with metadata	TABLE
`TABLE_SIMPLE`	Simple table type	TABLE (no colspan/rowspan)
`TABLE_COMPLEX`	Complex table type	TABLE (with colspan/rowspan)
`LIST`	List container	LIST, REF_TEXT
`LIST_TEXT`	Text list	LIST (sub_type=TEXT)
`LIST_REF`	Reference list	LIST (sub_type=REF_TEXT)

Page-Level Types

ContentTypeV2 Value	Description
`PAGE_HEADER`	Page header content
`PAGE_FOOTER`	Page footer content
`PAGE_NUMBER`	Page numbering
`PAGE_ASIDE_TEXT`	Sidebar/margin content
`PAGE_FOOTNOTE`	Page footnote

Span-Level Types

ContentTypeV2 Value	Description	Usage
`SPAN_TEXT`	Text span	Basic text content
`SPAN_EQUATION_INLINE`	Inline equation span	Math in paragraphs
`SPAN_PHONETIC`	Phonetic notation span	Pronunciation guides
`SPAN_MD`	Markdown span	Pre-formatted markdown
`SPAN_CODE_INLINE`	Inline code span	Code within text

Sources: mineru/utils/enum_class.py43-66

ContentTypeV2 Structured Output

The v2 format uses nested structures with richer metadata:

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py362-369

Table Type Classification

ContentTypeV2 includes automatic table complexity classification:

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py377-388

BlockType vs ContentType Conceptual Model

Key Distinctions:

BlockType = Structural layout unit with bounding box
- Represents detected/synthesized layout regions
- Can contain child blocks (e.g., IMAGE → IMAGE_BODY + IMAGE_CAPTION)
- Hierarchical organization for reading order
ContentType = Atomic content unit within blocks
- Represents actual content spans with precise locations
- Always contained within a block's lines
- Multiple spans per line, multiple lines per block
ContentTypeV2 = Enhanced type system for structured output
- Maps BlockType → richer categories with metadata
- Preserves span-level structure with content arrays
- Adds classification (e.g., TABLE_SIMPLE vs TABLE_COMPLEX)

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py25-91 mineru/backend/pipeline/model_json_to_middle_json.py156-172

Enum Usage Flow Through Processing Pipeline

This diagram shows how BlockType and ContentType enums flow through the document processing pipeline with actual class and function names from the codebase.

Sources: mineru/backend/vlm/vlm_magic_model.py12-238 mineru/backend/vlm/vlm_magic_model.py51-83 mineru/backend/vlm/vlm_magic_model.py194-221

Hybrid Backend Enum Usage with Span Extraction

The hybrid backend has a unique pattern where some blocks use VLM-extracted content while others use span filling from OCR and formula recognition. The NotExtractType enum controls this behavior.

Sources: mineru/backend/hybrid/hybrid_magic_model.py15-299 mineru/backend/hybrid/hybrid_magic_model.py38-62 mineru/backend/hybrid/hybrid_magic_model.py135-241 mineru/utils/enum_class.py69-76

Pattern 3: BlockType to ContentTypeV2 Transformation

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py285-484

Common Code Patterns and Implementation Details

Pattern 1: fix_two_layer_blocks() - Composite Block Assembly

The fix_two_layer_blocks() function in both VLM and hybrid backends creates composite blocks (IMAGE, TABLE, CODE) by associating body, caption, and footnote components using index-based matching.

Sources: mineru/backend/vlm/vlm_magic_model.py373-502 mineru/backend/hybrid/hybrid_magic_model.py449-577

Pattern 2: inline_equation Recognition via Regex

Both VLM and hybrid backends detect inline equations within text using LaTeX delimiters \( and \).

Sources: mineru/backend/vlm/vlm_magic_model.py106-146 mineru/backend/hybrid/hybrid_magic_model.py141-181

Pattern 3: MagicModel Block List Getters

The MagicModel class provides getter methods for each block type category, used by downstream processing:

These are called during middle.json construction to organize blocks by type.

Sources: mineru/backend/vlm/vlm_magic_model.py240-271 mineru/backend/hybrid/hybrid_magic_model.py315-346

Pattern 4: Pipeline Backend CategoryId to ContentType Mapping

The pipeline backend uses YOLO CategoryId enums internally, which are mapped to ContentType in the get_all_spans() method:

Sources: mineru/backend/pipeline/pipeline_magic_model.py308-352 mineru/utils/enum_class.py77-99

Pattern 5: Table HTML Generation with OTSL

Tables use the OTSL (Open Table Structure Language) format internally, converted to HTML using convert_otsl_to_html():

The resulting HTML is stored in the span with span['html'] field and span['type'] = ContentType.TABLE.

Sources: mineru/utils/format_utils.py307-319 mineru/utils/format_utils.py256-304

CategoryId (Pipeline Backend)

CategoryId defines YOLO model detection categories used exclusively in the pipeline backend. These are integer constants mapped to BlockType during processing.

Usage: Pipeline backend's MagicModel class filters layout_dets by CategoryId values.

NotExtractType (Hybrid Backend)

NotExtractType defines BlockType values that should not use VLM content extraction in hybrid mode when _vlm_ocr_enable=False. These blocks use span filling instead.

Usage: Checked in hybrid_magic_model.py line 135: block_type not in not_extract_list

MakeMode (Output Format)

SplitFlag (Processing State)

ImageType (Image Representation)

Sources: mineru/utils/enum_class.py69-116 mineru/backend/hybrid/hybrid_magic_model.py13 mineru/backend/hybrid/hybrid_magic_model.py135

Best Practices

Use BlockType for structural decisions
- Block routing in processing pipelines
- Layout visualization and debugging
- Reading order determination
Use ContentType for content extraction
- Text extraction from spans
- Equation detection and processing
- Image/table span identification
Use ContentTypeV2 for structured output
- When generating content_list_v2.json
- When richer metadata is needed
- For downstream NLP/RAG applications
Type Safety
- These are string constants, not true Python enums
- Use equality checks: block['type'] == BlockType.TEXT
- Use in for multiple types: if para_type in [BlockType.TEXT, BlockType.LIST]
Hierarchical Processing
- Process composite blocks (IMAGE, TABLE, CODE) by iterating their blocks array
- Extract spans from blocks by iterating lines then spans
- Respect the hierarchy: Document → Page → Block → Line → Span

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py94-184 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py10-90

BlockType and ContentType Enums

Relevant source files

Purpose and Scope

The three main enum classes serve distinct purposes:

BlockType: Defines structural layout units (blocks) in document hierarchy. Used in MagicModel classes across all backends to categorize detected regions.
ContentType: Defines content-level elements (spans) within blocks. Used for atomic content units with bounding boxes.
ContentTypeV2: Enhanced type system for content_list_v2.json output format with richer metadata.

Additional enumerations in the same module:

CategoryId: Internal YOLO model detection categories (pipeline backend)
NotExtractType: Block types excluded from span extraction (hybrid backend)
MakeMode: Output format selection enum
ImageType: Image representation format enum

For information about the middle.json structure that uses these types, see page 9.2. For output format generation, see page 9.5.

Sources: mineru/utils/enum_class.py1-132 mineru/backend/vlm/vlm_magic_model.py1-11 mineru/backend/hybrid/hybrid_magic_model.py1-13

BlockType Enumeration

BlockType defines structural units in document layout. These are hierarchical layout elements that correspond to bounding boxes detected by layout analysis models or synthesized during processing.

Definition and Location

Sources: mineru/utils/enum_class.py3-31

BlockType Taxonomy

The BlockType values are organized into several categories:

Category	BlockType Value	String Constant	Backend Usage
Composite Blocks	`IMAGE`	`"image"`	VLM/Hybrid: Created by `fix_two_layer_blocks()`
	`TABLE`	`"table"`	VLM/Hybrid: Created by `fix_two_layer_blocks()`
	`CODE`	`"code"`	VLM/Hybrid: Created by `fix_two_layer_blocks()`
Image Components	`IMAGE_BODY`	`"image_body"`	VLM: Mapped from `"image"` in line 70-71
	`IMAGE_CAPTION`	`"image_caption"`	VLM: Parsed from VLM output
	`IMAGE_FOOTNOTE`	`"image_footnote"`	VLM: Parsed from VLM output
Table Components	`TABLE_BODY`	`"table_body"`	VLM: Mapped from `"table"` in line 73-74
	`TABLE_CAPTION`	`"table_caption"`	VLM: Parsed from VLM output
	`TABLE_FOOTNOTE`	`"table_footnote"`	VLM: Parsed from VLM output
Code Components	`CODE_BODY`	`"code_body"`	VLM: Mapped from `"code"/"algorithm"`
	`CODE_CAPTION`	`"code_caption"`	VLM: Parsed from VLM output
Text Blocks	`TEXT`	`"text"`	All backends: Default text block
	`TITLE`	`"title"`	All backends: Section headings
	`LIST`	`"list"`	VLM/Hybrid: Container for list items
	`REF_TEXT`	`"ref_text"`	VLM: Reference text blocks
	`PHONETIC`	`"phonetic"`	VLM: Phonetic annotation blocks
Equations	`INTERLINE_EQUATION`	`"interline_equation"`	VLM: Mapped from `"equation"` in line 82-83
Meta Blocks	`INDEX`	`"index"`	Pipeline: Index entries
	`DISCARDED`	`"discarded"`	Pipeline: Filtered content
	`HEADER`	`"header"`	VLM: Page headers (line 211)
	`FOOTER`	`"footer"`	VLM: Page footers (line 211)
	`PAGE_NUMBER`	`"page_number"`	VLM: Page numbers (line 211)
	`ASIDE_TEXT`	`"aside_text"`	VLM: Sidebar text (line 211)
	`PAGE_FOOTNOTE`	`"page_footnote"`	VLM: Page footnotes (line 211)

Sources: mineru/utils/enum_class.py3-31 mineru/backend/vlm/vlm_magic_model.py51-83 mineru/backend/vlm/vlm_magic_model.py194-216 mineru/backend/hybrid/hybrid_magic_model.py84-116

BlockType Hierarchy

Sources: mineru/utils/enum_class.py3-31 mineru/backend/vlm/vlm_middle_json_mkcontent.py99-177

BlockType Usage in Code

BlockTypes are used throughout the codebase in three primary patterns:

Pattern 1: MagicModel Block Categorization

The MagicModel class in each backend categorizes blocks into type-specific lists:

Pattern 2: Block Type Mapping from VLM Output

The MagicModel.__init__() method maps raw VLM output strings to BlockType constants:

Pattern 3: Block Preprocessing and Filtering

The prepare_block_bboxes() function uses BlockType for filtering and validation:

Sources: mineru/backend/vlm/vlm_magic_model.py51-83 mineru/backend/vlm/vlm_magic_model.py194-216 mineru/backend/hybrid/hybrid_magic_model.py255-277 mineru/utils/block_pre_proc.py50-58

ContentType Enumeration

ContentType defines content-level elements (spans) that exist within blocks. Spans are the atomic units of content with specific bounding boxes and content.

Definition

Sources: mineru/utils/enum_class.py33-41

ContentType Values

ContentType Value	String Constant	Container Block	Span Fields	Assignment Logic
`TEXT`	`"text"`	TEXT, TITLE, LIST, etc.	`content`: string, `bbox`, `score`	VLM: line 68, 101
`IMAGE`	`"image"`	IMAGE_BODY	`bbox`, `score`, optional `image_path`	VLM: line 71, 89-92
`TABLE`	`"table"`	TABLE_BODY	`bbox`, `html`: HTML string, `score`	VLM: line 74, 93-94
`INLINE_EQUATION`	`"inline_equation"`	Any text block	`content`: LaTeX, `bbox`, `score`	VLM: line 132, 130-134
`INTERLINE_EQUATION`	`"interline_equation"`	INTERLINE_EQUATION	`content`: LaTeX, `bbox`	VLM: line 83, 95-100
`EQUATION`	`"equation"`	content_list output only	`content`: LaTeX	Output format enum
`CODE`	`"code"`	content_list output only	`content`: string	Output format enum

MagicModel Span Assignment Pattern:

Sources: mineru/utils/enum_class.py33-41 mineru/backend/vlm/vlm_magic_model.py68-83 mineru/backend/vlm/vlm_magic_model.py88-154

Span Structure with ContentType

Spans in middle.json have this general structure:

Sources: mineru/backend/pipeline/model_json_to_middle_json.py99-104

ContentType Usage Examples

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py28-49 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py106-178

ContentType in Markdown Generation

ContentType values control how spans are rendered in markdown output:

Sources: mineru/backend/pipeline/pipeline_middle_json_mkcontent.py124-133

ContentTypeV2 Enumeration

ContentTypeV2 is an enhanced type system introduced for the content_list_v2.json output format, providing more granular categorization and supporting complex nested structures.

Definition

Sources: mineru/utils/enum_class.py43-66

ContentTypeV2 Categories

Block-Level Types

ContentTypeV2 Value	Description	Maps from BlockType
`PARAGRAPH`	Regular paragraph	TEXT, PHONETIC
`TITLE`	Section heading	TITLE
`CODE`	Code block	CODE (when sub_type=CODE)
`ALGORITHM`	Algorithm block	CODE (when sub_type=ALGORITHM)
`EQUATION_INTERLINE`	Display equation	INTERLINE_EQUATION
`IMAGE`	Image with metadata	IMAGE
`TABLE`	Table with metadata	TABLE
`TABLE_SIMPLE`	Simple table type	TABLE (no colspan/rowspan)
`TABLE_COMPLEX`	Complex table type	TABLE (with colspan/rowspan)
`LIST`	List container	LIST, REF_TEXT
`LIST_TEXT`	Text list	LIST (sub_type=TEXT)
`LIST_REF`	Reference list	LIST (sub_type=REF_TEXT)

Page-Level Types

ContentTypeV2 Value	Description
`PAGE_HEADER`	Page header content
`PAGE_FOOTER`	Page footer content
`PAGE_NUMBER`	Page numbering
`PAGE_ASIDE_TEXT`	Sidebar/margin content
`PAGE_FOOTNOTE`	Page footnote

Span-Level Types

ContentTypeV2 Value	Description	Usage
`SPAN_TEXT`	Text span	Basic text content
`SPAN_EQUATION_INLINE`	Inline equation span	Math in paragraphs
`SPAN_PHONETIC`	Phonetic notation span	Pronunciation guides
`SPAN_MD`	Markdown span	Pre-formatted markdown
`SPAN_CODE_INLINE`	Inline code span	Code within text

Sources: mineru/utils/enum_class.py43-66

ContentTypeV2 Structured Output

The v2 format uses nested structures with richer metadata:

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py362-369

Table Type Classification

ContentTypeV2 includes automatic table complexity classification:

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py377-388

BlockType vs ContentType Conceptual Model

Key Distinctions:

BlockType = Structural layout unit with bounding box
- Represents detected/synthesized layout regions
- Can contain child blocks (e.g., IMAGE → IMAGE_BODY + IMAGE_CAPTION)
- Hierarchical organization for reading order
ContentType = Atomic content unit within blocks
- Represents actual content spans with precise locations
- Always contained within a block's lines
- Multiple spans per line, multiple lines per block
ContentTypeV2 = Enhanced type system for structured output
- Maps BlockType → richer categories with metadata
- Preserves span-level structure with content arrays
- Adds classification (e.g., TABLE_SIMPLE vs TABLE_COMPLEX)

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py25-91 mineru/backend/pipeline/model_json_to_middle_json.py156-172

Enum Usage Flow Through Processing Pipeline

This diagram shows how BlockType and ContentType enums flow through the document processing pipeline with actual class and function names from the codebase.

Sources: mineru/backend/vlm/vlm_magic_model.py12-238 mineru/backend/vlm/vlm_magic_model.py51-83 mineru/backend/vlm/vlm_magic_model.py194-221

Hybrid Backend Enum Usage with Span Extraction

The hybrid backend has a unique pattern where some blocks use VLM-extracted content while others use span filling from OCR and formula recognition. The NotExtractType enum controls this behavior.

Sources: mineru/backend/hybrid/hybrid_magic_model.py15-299 mineru/backend/hybrid/hybrid_magic_model.py38-62 mineru/backend/hybrid/hybrid_magic_model.py135-241 mineru/utils/enum_class.py69-76

Pattern 3: BlockType to ContentTypeV2 Transformation

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py285-484

Common Code Patterns and Implementation Details

Pattern 1: fix_two_layer_blocks() - Composite Block Assembly

The fix_two_layer_blocks() function in both VLM and hybrid backends creates composite blocks (IMAGE, TABLE, CODE) by associating body, caption, and footnote components using index-based matching.

Sources: mineru/backend/vlm/vlm_magic_model.py373-502 mineru/backend/hybrid/hybrid_magic_model.py449-577

Pattern 2: inline_equation Recognition via Regex

Both VLM and hybrid backends detect inline equations within text using LaTeX delimiters \( and \).

Sources: mineru/backend/vlm/vlm_magic_model.py106-146 mineru/backend/hybrid/hybrid_magic_model.py141-181

Pattern 3: MagicModel Block List Getters

The MagicModel class provides getter methods for each block type category, used by downstream processing:

These are called during middle.json construction to organize blocks by type.

Sources: mineru/backend/vlm/vlm_magic_model.py240-271 mineru/backend/hybrid/hybrid_magic_model.py315-346

Pattern 4: Pipeline Backend CategoryId to ContentType Mapping

The pipeline backend uses YOLO CategoryId enums internally, which are mapped to ContentType in the get_all_spans() method:

Sources: mineru/backend/pipeline/pipeline_magic_model.py308-352 mineru/utils/enum_class.py77-99

Pattern 5: Table HTML Generation with OTSL

Tables use the OTSL (Open Table Structure Language) format internally, converted to HTML using convert_otsl_to_html():

The resulting HTML is stored in the span with span['html'] field and span['type'] = ContentType.TABLE.

Sources: mineru/utils/format_utils.py307-319 mineru/utils/format_utils.py256-304

CategoryId (Pipeline Backend)

CategoryId defines YOLO model detection categories used exclusively in the pipeline backend. These are integer constants mapped to BlockType during processing.

Usage: Pipeline backend's MagicModel class filters layout_dets by CategoryId values.

NotExtractType (Hybrid Backend)

NotExtractType defines BlockType values that should not use VLM content extraction in hybrid mode when _vlm_ocr_enable=False. These blocks use span filling instead.

Usage: Checked in hybrid_magic_model.py line 135: block_type not in not_extract_list

MakeMode (Output Format)

SplitFlag (Processing State)

ImageType (Image Representation)

Sources: mineru/utils/enum_class.py69-116 mineru/backend/hybrid/hybrid_magic_model.py13 mineru/backend/hybrid/hybrid_magic_model.py135

Best Practices

Use BlockType for structural decisions
- Block routing in processing pipelines
- Layout visualization and debugging
- Reading order determination
Use ContentType for content extraction
- Text extraction from spans
- Equation detection and processing
- Image/table span identification
Use ContentTypeV2 for structured output
- When generating content_list_v2.json
- When richer metadata is needed
- For downstream NLP/RAG applications
Type Safety
- These are string constants, not true Python enums
- Use equality checks: block['type'] == BlockType.TEXT
- Use in for multiple types: if para_type in [BlockType.TEXT, BlockType.LIST]
Hierarchical Processing
- Process composite blocks (IMAGE, TABLE, CODE) by iterating their blocks array
- Extract spans from blocks by iterating lines then spans
- Respect the hierarchy: Document → Page → Block → Line → Span

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py94-184 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py10-90

BlockType and ContentType Enums

Purpose and Scope

BlockType Enumeration

Definition and Location

BlockType Taxonomy

BlockType Hierarchy

BlockType Usage in Code

Pattern 1: MagicModel Block Categorization

Pattern 2: Block Type Mapping from VLM Output

Pattern 3: Block Preprocessing and Filtering

ContentType Enumeration

Definition

ContentType Values

Span Structure with ContentType

ContentType Usage Examples

ContentType in Markdown Generation

ContentTypeV2 Enumeration

Definition

ContentTypeV2 Categories

Block-Level Types

Page-Level Types

Span-Level Types

ContentTypeV2 Structured Output

Table Type Classification

BlockType vs ContentType Conceptual Model

Enum Usage Flow Through Processing Pipeline

Hybrid Backend Enum Usage with Span Extraction

Pattern 3: BlockType to ContentTypeV2 Transformation

Common Code Patterns and Implementation Details

Pattern 1: fix_two_layer_blocks() - Composite Block Assembly

Pattern 2: inline_equation Recognition via Regex

Pattern 3: MagicModel Block List Getters

Pattern 4: Pipeline Backend CategoryId to ContentType Mapping

Pattern 5: Table HTML Generation with OTSL

Related Enumerations in enum_class.py

CategoryId (Pipeline Backend)

NotExtractType (Hybrid Backend)

MakeMode (Output Format)

SplitFlag (Processing State)

ImageType (Image Representation)

Best Practices

On this page

BlockType and ContentType Enums

Purpose and Scope

BlockType Enumeration

Definition and Location

BlockType Taxonomy

BlockType Hierarchy

BlockType Usage in Code

Pattern 1: MagicModel Block Categorization

Pattern 2: Block Type Mapping from VLM Output

Pattern 3: Block Preprocessing and Filtering

ContentType Enumeration

Definition

ContentType Values

Span Structure with ContentType

ContentType Usage Examples

ContentType in Markdown Generation

ContentTypeV2 Enumeration

Definition

ContentTypeV2 Categories

Block-Level Types

Page-Level Types

Span-Level Types

ContentTypeV2 Structured Output

Table Type Classification

BlockType vs ContentType Conceptual Model

Enum Usage Flow Through Processing Pipeline

Hybrid Backend Enum Usage with Span Extraction

Pattern 3: BlockType to ContentTypeV2 Transformation

Common Code Patterns and Implementation Details

Pattern 1: fix_two_layer_blocks() - Composite Block Assembly

Pattern 2: inline_equation Recognition via Regex

Pattern 3: MagicModel Block List Getters

Pattern 4: Pipeline Backend CategoryId to ContentType Mapping

Pattern 5: Table HTML Generation with OTSL

Related Enumerations in enum_class.py

CategoryId (Pipeline Backend)

NotExtractType (Hybrid Backend)

MakeMode (Output Format)