This page documents the enumeration types that define the type taxonomy used throughout MinerU for categorizing document elements. These enums are fundamental to the document processing pipeline, appearing in the middle.json intermediate representation and all output formats.
The three main enum classes serve distinct purposes:
BlockType: Defines structural layout units (blocks) in document hierarchy. Used in MagicModel classes across all backends to categorize detected regions.ContentType: Defines content-level elements (spans) within blocks. Used for atomic content units with bounding boxes.ContentTypeV2: Enhanced type system for content_list_v2.json output format with richer metadata.Additional enumerations in the same module:
CategoryId: Internal YOLO model detection categories (pipeline backend)NotExtractType: Block types excluded from span extraction (hybrid backend)MakeMode: Output format selection enumImageType: Image representation format enumFor information about the middle.json structure that uses these types, see page 9.2. For output format generation, see page 9.5.
Sources: mineru/utils/enum_class.py1-132 mineru/backend/vlm/vlm_magic_model.py1-11 mineru/backend/hybrid/hybrid_magic_model.py1-13
BlockType defines structural units in document layout. These are hierarchical layout elements that correspond to bounding boxes detected by layout analysis models or synthesized during processing.
The BlockType class is defined in mineru/utils/enum_class.py as a collection of string constants (not a Python Enum subclass, but used as an enumeration pattern throughout the codebase). Each constant is a string literal representing a document element type.
Sources: mineru/utils/enum_class.py3-31
The BlockType values are organized into several categories:
| Category | BlockType Value | String Constant | Backend Usage |
|---|---|---|---|
| Composite Blocks | IMAGE | "image" | VLM/Hybrid: Created by fix_two_layer_blocks() |
TABLE | "table" | VLM/Hybrid: Created by fix_two_layer_blocks() | |
CODE | "code" | VLM/Hybrid: Created by fix_two_layer_blocks() | |
| Image Components | IMAGE_BODY | "image_body" | VLM: Mapped from "image" in line 70-71 |
IMAGE_CAPTION | "image_caption" | VLM: Parsed from VLM output | |
IMAGE_FOOTNOTE | "image_footnote" | VLM: Parsed from VLM output | |
| Table Components | TABLE_BODY | "table_body" | VLM: Mapped from "table" in line 73-74 |
TABLE_CAPTION | "table_caption" | VLM: Parsed from VLM output | |
TABLE_FOOTNOTE | "table_footnote" | VLM: Parsed from VLM output | |
| Code Components | CODE_BODY | "code_body" | VLM: Mapped from "code"/"algorithm" |
CODE_CAPTION | "code_caption" | VLM: Parsed from VLM output | |
| Text Blocks | TEXT | "text" | All backends: Default text block |
TITLE | "title" | All backends: Section headings | |
LIST | "list" | VLM/Hybrid: Container for list items | |
REF_TEXT | "ref_text" | VLM: Reference text blocks | |
PHONETIC | "phonetic" | VLM: Phonetic annotation blocks | |
| Equations | INTERLINE_EQUATION | "interline_equation" | VLM: Mapped from "equation" in line 82-83 |
| Meta Blocks | INDEX | "index" | Pipeline: Index entries |
DISCARDED | "discarded" | Pipeline: Filtered content | |
HEADER | "header" | VLM: Page headers (line 211) | |
FOOTER | "footer" | VLM: Page footers (line 211) | |
PAGE_NUMBER | "page_number" | VLM: Page numbers (line 211) | |
ASIDE_TEXT | "aside_text" | VLM: Sidebar text (line 211) | |
PAGE_FOOTNOTE | "page_footnote" | VLM: Page footnotes (line 211) |
Sources: mineru/utils/enum_class.py3-31 mineru/backend/vlm/vlm_magic_model.py51-83 mineru/backend/vlm/vlm_magic_model.py194-216 mineru/backend/hybrid/hybrid_magic_model.py84-116
Sources: mineru/utils/enum_class.py3-31 mineru/backend/vlm/vlm_middle_json_mkcontent.py99-177
BlockTypes are used throughout the codebase in three primary patterns:
The MagicModel class in each backend categorizes blocks into type-specific lists:
The MagicModel.__init__() method maps raw VLM output strings to BlockType constants:
The prepare_block_bboxes() function uses BlockType for filtering and validation:
Sources: mineru/backend/vlm/vlm_magic_model.py51-83 mineru/backend/vlm/vlm_magic_model.py194-216 mineru/backend/hybrid/hybrid_magic_model.py255-277 mineru/utils/block_pre_proc.py50-58
ContentType defines content-level elements (spans) that exist within blocks. Spans are the atomic units of content with specific bounding boxes and content.
Sources: mineru/utils/enum_class.py33-41
| ContentType Value | String Constant | Container Block | Span Fields | Assignment Logic |
|---|---|---|---|---|
TEXT | "text" | TEXT, TITLE, LIST, etc. | content: string, bbox, score | VLM: line 68, 101 |
IMAGE | "image" | IMAGE_BODY | bbox, score, optional image_path | VLM: line 71, 89-92 |
TABLE | "table" | TABLE_BODY | bbox, html: HTML string, score | VLM: line 74, 93-94 |
INLINE_EQUATION | "inline_equation" | Any text block | content: LaTeX, bbox, score | VLM: line 132, 130-134 |
INTERLINE_EQUATION | "interline_equation" | INTERLINE_EQUATION | content: LaTeX, bbox | VLM: line 83, 95-100 |
EQUATION | "equation" | content_list output only | content: LaTeX | Output format enum |
CODE | "code" | content_list output only | content: string | Output format enum |
MagicModel Span Assignment Pattern:
Sources: mineru/utils/enum_class.py33-41 mineru/backend/vlm/vlm_magic_model.py68-83 mineru/backend/vlm/vlm_magic_model.py88-154
Spans in middle.json have this general structure:
Sources: mineru/backend/pipeline/model_json_to_middle_json.py99-104
Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py28-49 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py106-178
ContentType values control how spans are rendered in markdown output:
Sources: mineru/backend/pipeline/pipeline_middle_json_mkcontent.py124-133
ContentTypeV2 is an enhanced type system introduced for the content_list_v2.json output format, providing more granular categorization and supporting complex nested structures.
Sources: mineru/utils/enum_class.py43-66
| ContentTypeV2 Value | Description | Maps from BlockType |
|---|---|---|
PARAGRAPH | Regular paragraph | TEXT, PHONETIC |
TITLE | Section heading | TITLE |
CODE | Code block | CODE (when sub_type=CODE) |
ALGORITHM | Algorithm block | CODE (when sub_type=ALGORITHM) |
EQUATION_INTERLINE | Display equation | INTERLINE_EQUATION |
IMAGE | Image with metadata | IMAGE |
TABLE | Table with metadata | TABLE |
TABLE_SIMPLE | Simple table type | TABLE (no colspan/rowspan) |
TABLE_COMPLEX | Complex table type | TABLE (with colspan/rowspan) |
LIST | List container | LIST, REF_TEXT |
LIST_TEXT | Text list | LIST (sub_type=TEXT) |
LIST_REF | Reference list | LIST (sub_type=REF_TEXT) |
| ContentTypeV2 Value | Description |
|---|---|
PAGE_HEADER | Page header content |
PAGE_FOOTER | Page footer content |
PAGE_NUMBER | Page numbering |
PAGE_ASIDE_TEXT | Sidebar/margin content |
PAGE_FOOTNOTE | Page footnote |
| ContentTypeV2 Value | Description | Usage |
|---|---|---|
SPAN_TEXT | Text span | Basic text content |
SPAN_EQUATION_INLINE | Inline equation span | Math in paragraphs |
SPAN_PHONETIC | Phonetic notation span | Pronunciation guides |
SPAN_MD | Markdown span | Pre-formatted markdown |
SPAN_CODE_INLINE | Inline code span | Code within text |
Sources: mineru/utils/enum_class.py43-66
The v2 format uses nested structures with richer metadata:
Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py362-369
ContentTypeV2 includes automatic table complexity classification:
Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py377-388
Key Distinctions:
BlockType = Structural layout unit with bounding box
ContentType = Atomic content unit within blocks
ContentTypeV2 = Enhanced type system for structured output
Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py25-91 mineru/backend/pipeline/model_json_to_middle_json.py156-172
This diagram shows how BlockType and ContentType enums flow through the document processing pipeline with actual class and function names from the codebase.
Sources: mineru/backend/vlm/vlm_magic_model.py12-238 mineru/backend/vlm/vlm_magic_model.py51-83 mineru/backend/vlm/vlm_magic_model.py194-221
The hybrid backend has a unique pattern where some blocks use VLM-extracted content while others use span filling from OCR and formula recognition. The NotExtractType enum controls this behavior.
Sources: mineru/backend/hybrid/hybrid_magic_model.py15-299 mineru/backend/hybrid/hybrid_magic_model.py38-62 mineru/backend/hybrid/hybrid_magic_model.py135-241 mineru/utils/enum_class.py69-76
Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py285-484
The fix_two_layer_blocks() function in both VLM and hybrid backends creates composite blocks (IMAGE, TABLE, CODE) by associating body, caption, and footnote components using index-based matching.
Sources: mineru/backend/vlm/vlm_magic_model.py373-502 mineru/backend/hybrid/hybrid_magic_model.py449-577
Both VLM and hybrid backends detect inline equations within text using LaTeX delimiters \( and \).
Sources: mineru/backend/vlm/vlm_magic_model.py106-146 mineru/backend/hybrid/hybrid_magic_model.py141-181
The MagicModel class provides getter methods for each block type category, used by downstream processing:
These are called during middle.json construction to organize blocks by type.
Sources: mineru/backend/vlm/vlm_magic_model.py240-271 mineru/backend/hybrid/hybrid_magic_model.py315-346
The pipeline backend uses YOLO CategoryId enums internally, which are mapped to ContentType in the get_all_spans() method:
Sources: mineru/backend/pipeline/pipeline_magic_model.py308-352 mineru/utils/enum_class.py77-99
Tables use the OTSL (Open Table Structure Language) format internally, converted to HTML using convert_otsl_to_html():
The resulting HTML is stored in the span with span['html'] field and span['type'] = ContentType.TABLE.
Sources: mineru/utils/format_utils.py307-319 mineru/utils/format_utils.py256-304
CategoryId defines YOLO model detection categories used exclusively in the pipeline backend. These are integer constants mapped to BlockType during processing.
Usage: Pipeline backend's MagicModel class filters layout_dets by CategoryId values.
NotExtractType defines BlockType values that should not use VLM content extraction in hybrid mode when _vlm_ocr_enable=False. These blocks use span filling instead.
Usage: Checked in hybrid_magic_model.py line 135: block_type not in not_extract_list
Sources: mineru/utils/enum_class.py69-116 mineru/backend/hybrid/hybrid_magic_model.py13 mineru/backend/hybrid/hybrid_magic_model.py135
Use BlockType for structural decisions
Use ContentType for content extraction
Use ContentTypeV2 for structured output
content_list_v2.jsonType Safety
block['type'] == BlockType.TEXTin for multiple types: if para_type in [BlockType.TEXT, BlockType.LIST]Hierarchical Processing
blocks arraylines then spansSources: mineru/backend/vlm/vlm_middle_json_mkcontent.py94-184 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py10-90
Refresh this wiki