MagicModel Block Processing

Relevant source files

Purpose and Scope

This page documents the MagicModel class implementations that transform raw model inference outputs into structured, hierarchical block representations. MagicModel is a critical component in the data transformation pipeline, responsible for extracting blocks and spans from backend-specific inference results.

Related Pages:

For the complete data transformation flow from input to output, see Document Processing Flow
For the middle.json structure that MagicModel produces, see middle.json Structure
For backend-specific processing details, see Pipeline Backend, VLM Backend, and Hybrid Backend
For text merging after block processing, see Text Merging and Language Handling

Architecture Overview

MagicModel has three backend-specific implementations, each processing different input formats but producing a common output structure:

Sources: mineru/backend/pipeline/pipeline_magic_model.py6-8 mineru/backend/vlm/vlm_magic_model.py12-14 mineru/backend/hybrid/hybrid_magic_model.py15-27

All three implementations expose the same getter methods:

Method	Returns	Description
`get_image_blocks()`	list	Image blocks with captions/footnotes
`get_table_blocks()`	list	Table blocks with captions/footnotes
`get_text_blocks()`	list	Regular text blocks
`get_title_blocks()`	list	Title/heading blocks
`get_interline_equation_blocks()`	list	Display equation blocks
`get_code_blocks()`	list	Code and algorithm blocks
`get_list_blocks()`	list	List item blocks
`get_ref_text_blocks()`	list	Reference text blocks
`get_phonetic_blocks()`	list	Phonetic annotation blocks
`get_discarded_blocks()`	list	Discarded blocks (headers, footers)
`get_all_spans()`	list	All atomic content spans

Sources: mineru/backend/vlm/vlm_magic_model.py240-271 mineru/backend/hybrid/hybrid_magic_model.py315-346 mineru/backend/pipeline/pipeline_magic_model.py246-318

Block Types and Content Types

MagicModel works with two hierarchical enumerations that define the structure of parsed documents:

BlockType Enumeration

Sources: mineru/utils/enum_class.py3-31

ContentType Enumeration

ContentType defines the atomic span types that appear within blocks:

ContentType	Description	Example Usage
`TEXT`	Regular text content	Paragraphs, captions
`INLINE_EQUATION`	Inline math formula	$x^2 + y^2$
`INTERLINE_EQUATION`	Display math formula	`$$\int_0^\infty e^{-x}dx$$`
`IMAGE`	Image content	Cropped image references
`TABLE`	Table content	HTML table structure

Sources: mineru/utils/enum_class.py33-40

Pipeline MagicModel Implementation

The pipeline backend's MagicModel processes layout detection results with CategoryId labels:

Input Structure

Sources: mineru/backend/pipeline/pipeline_magic_model.py8-11

Processing Pipeline

Sources: mineru/backend/pipeline/pipeline_magic_model.py8-21

Caption and Footnote Association

The pipeline MagicModel uses distance-based association to link captions and footnotes with their parent elements:

The __tie_up_category_by_distance_v3 method uses spatial proximity to match captions and footnotes to their parent blocks, ensuring structural relationships are preserved.

Sources: mineru/backend/pipeline/pipeline_magic_model.py246-283 mineru/utils/magic_model_utils.py31-171

CategoryId to BlockType Mapping

The pipeline backend maps CategoryId values to BlockType through the extraction process:

CategoryId	BlockType Output
`CategoryId.Title`	`BlockType.TITLE`
`CategoryId.Text`	`BlockType.TEXT`
`CategoryId.ImageBody`	`BlockType.IMAGE_BODY`
`CategoryId.ImageCaption`	`BlockType.IMAGE_CAPTION`
`CategoryId.ImageFootnote`	`BlockType.IMAGE_FOOTNOTE`
`CategoryId.TableBody`	`BlockType.TABLE_BODY`
`CategoryId.TableCaption`	`BlockType.TABLE_CAPTION`
`CategoryId.TableFootnote`	`BlockType.TABLE_FOOTNOTE`
`CategoryId.InterlineEquation_Layout`	`BlockType.INTERLINE_EQUATION`
`CategoryId.InlineEquation`	`ContentType.INLINE_EQUATION` span
`CategoryId.Abandon`	`BlockType.DISCARDED`

Sources: mineru/utils/enum_class.py68-84 mineru/backend/pipeline/pipeline_magic_model.py284-318

VLM MagicModel Implementation

The VLM backend's MagicModel receives fully parsed blocks with content from the vision-language model:

Input Structure

Sources: mineru/backend/vlm/vlm_magic_model.py12-45

Block Processing Logic

The VLM MagicModel performs content-based processing:

Sources: mineru/backend/vlm/vlm_magic_model.py47-183

Inline Equation Detection

The VLM MagicModel extracts inline equations from text content using regex:

Sources: mineru/backend/vlm/vlm_magic_model.py106-148

Code Block Handling

VLM MagicModel processes code blocks with special logic:

If a code block contains inline equations (indicated by $...$ patterns), it is automatically reclassified as an algorithm block, since algorithms often contain mathematical notation.

Sources: mineru/backend/vlm/vlm_magic_model.py76-80 mineru/backend/vlm/vlm_magic_model.py282-295 mineru/backend/vlm/vlm_magic_model.py86-108

Hybrid MagicModel Implementation

The hybrid backend's MagicModel combines VLM results with OCR-based span extraction:

Input Structure

Sources: mineru/backend/hybrid/hybrid_magic_model.py15-27

Dual-Path Processing

The hybrid MagicModel uses two different processing paths based on the _vlm_ocr_enable flag:

Sources: mineru/backend/hybrid/hybrid_magic_model.py38-242

NotExtractType Blocks

The hybrid backend defines certain block types that should NOT be extracted from VLM content when in span-fill mode:

These blocks are filled with OCR-derived spans instead of VLM content to ensure higher text extraction accuracy.

Sources: mineru/utils/enum_class.py120-132 mineru/backend/hybrid/hybrid_magic_model.py13 mineru/backend/hybrid/hybrid_magic_model.py135-242

Span Filling Process

When not in VLM OCR mode, the hybrid backend fills text blocks with OCR-derived spans:

The fix_text_block function organizes loose spans into structured lines within the block.

Sources: mineru/backend/hybrid/hybrid_magic_model.py224-242 mineru/utils/span_block_fix.py1-50

Two-Layer Block Construction

All three MagicModel implementations construct two-layer structures for images, tables, and code blocks:

Association Logic

The fix_two_layer_blocks function (used by VLM and Hybrid) implements index-based association. It differs from the pipeline backend which uses tie_up_category_by_distance_v3:

VLM/Hybrid Backend Association Flow (Index-Based)

Pipeline Backend Association (Distance-Based)

The pipeline backend uses tie_up_category_by_distance_v3 which matches components by spatial proximity rather than index:

This spatial matching uses bbox_distance to find nearest neighbors rather than relying on reading order indices.

Sources: mineru/backend/vlm/vlm_magic_model.py373-502 mineru/backend/hybrid/hybrid_magic_model.py449-577 mineru/utils/magic_model_utils.py173-299 mineru/backend/pipeline/pipeline_magic_model.py212-244 mineru/utils/magic_model_utils.py31-171

Caption and Footnote Position Validation

The fix_two_layer_blocks function validates position constraints and index continuity:

Position Constraints (Lines 461-472 in vlm_magic_model.py)

Index Continuity Validation (Lines 428-451 for captions, 454-466 for footnotes)

Captions must form a continuous sequence working backward from the body:

Sort by index descending (closest to body first)
Check each caption_idx == prev_idx - 1
Allow body_index as gap exception
Break on first real gap

Footnotes must form a continuous sequence working forward from the body:

Sort by index ascending (closest to body first)
Check each footnote_idx == prev_idx + 1
Break on first gap

tie_up_category_by_index Matching Logic

The underlying tie_up_category_by_index function uses a three-tier priority system:

Priority	Criterion	Code Reference
1 (Highest)	Effective index difference	`calc_effective_index_diff()` at line 219-237
2	Bbox edge distance	`bbox_distance()` at line 265
3 (Lowest)	Bbox center distance	`bbox_center_distance()` at line 285

Special rules when edge distance diff <= 2:

For table_caption: match to later subject (line 276-278)
For *_footnote: match to earlier subject (line 279-282)
Otherwise: use center distance as tiebreaker (line 284-288)

Sources: mineru/backend/vlm/vlm_magic_model.py379-502 mineru/backend/hybrid/hybrid_magic_model.py456-577 mineru/utils/magic_model_utils.py173-299

List Block Processing

The fix_list_blocks function associates list items with their container using overlap detection:

List Block Processing Flow

Sub-Type Determination (Lines 529-541)

The sub_type field is determined by counting block types within the list:

This allows distinguishing between text lists, reference lists, or mixed content lists.

Sources: mineru/backend/vlm/vlm_magic_model.py505-543 mineru/backend/hybrid/hybrid_magic_model.py580-618 mineru/utils/boxbase.py174-191

Span Generation and Extraction

Span Structure

Spans are atomic content units stored in both individual blocks and the aggregated all_spans list:

Field	Type	Present When	Description
`bbox`	`[x0, y0, x1, y1]`	Always	Pixel coordinates
`type`	`ContentType` enum	Always	TEXT, INLINE_EQUATION, INTERLINE_EQUATION, IMAGE, TABLE
`content`	string	TEXT or EQUATION types	Extracted or recognized text/LaTeX
`html`	string	TABLE type	HTML table structure from VLM/pipeline
`latex`	string	TABLE type (pipeline)	OTSL or LaTeX table format
`score`	float	Pipeline/Hybrid	Model confidence (0.0-1.0)
`image_path`	string	IMAGE type (added later)	Path to cropped image file

Pipeline Backend Span Generation

The pipeline backend generates spans from get_all_spans() method (lines 308-352):

Pipeline get_all_spans() Processing

Downstream Processing (model_json_to_middle_json.py)

Sources: mineru/backend/pipeline/pipeline_magic_model.py308-352 mineru/backend/pipeline/model_json_to_middle_json.py136-173 mineru/utils/span_pre_proc.py

VLM Backend Span Generation

The VLM backend generates spans directly from block content:

Sources: mineru/backend/vlm/vlm_magic_model.py86-164

Hybrid Backend Span Generation

The hybrid backend uses a conditional dual-path approach for span generation:

Span Source Selection (Lines 38-62)

Block Content vs Span Filling (Lines 135-242)

For each block during processing:

The not_extract_list includes: TEXT, TITLE, HEADER, FOOTER, PAGE_NUMBER, PAGE_FOOTNOTE, REF_TEXT, and all caption/footnote types. These blocks use span-filling for better accuracy.

Sources: mineru/backend/hybrid/hybrid_magic_model.py13 mineru/backend/hybrid/hybrid_magic_model.py38-62 mineru/backend/hybrid/hybrid_magic_model.py135-242 mineru/utils/enum_class.py120-132 mineru/utils/span_block_fix.py

Block and Span Aggregation Flow

The following diagram shows how blocks and spans are collected after MagicModel processing:

Sources: mineru/backend/pipeline/model_json_to_middle_json.py36-169 mineru/utils/block_pre_proc.py11-31

Usage in Backend Processing

Pipeline Backend Usage

Sources: mineru/backend/pipeline/model_json_to_middle_json.py28-56

VLM and Hybrid Backend Usage

VLM and Hybrid backends use MagicModel in their respective result_to_middle_json functions:

Sources: mineru/backend/vlm/vlm_analyze.py1-200 mineru/backend/hybrid/hybrid_analyze.py1-300

Integration with Block Processing Pipeline

After MagicModel extraction, blocks undergo further processing:

Sources: mineru/backend/pipeline/model_json_to_middle_json.py176-253 mineru/backend/pipeline/para_split.py1-50 mineru/utils/table_merge.py537-589

Summary

MagicModel serves as the critical transformation layer that:

Normalizes backend outputs - Converts backend-specific inference results into a common block structure
Establishes hierarchy - Creates two-layer structures for complex elements (images, tables, code)
Associates components - Links captions and footnotes with their parent elements using spatial or index-based matching
Generates spans - Extracts atomic content units (text, equations, images) from blocks
Handles special cases - Processes inline equations, code blocks, and list structures

The three implementations (pipeline, VLM, hybrid) use different input formats and processing strategies but converge on a unified output structure that enables consistent downstream processing in the document parsing pipeline.

Sources: mineru/backend/pipeline/pipeline_magic_model.py6-318 mineru/backend/vlm/vlm_magic_model.py12-543 mineru/backend/hybrid/hybrid_magic_model.py15-698

MagicModel Block Processing

Relevant source files

Purpose and Scope

Related Pages:

For the complete data transformation flow from input to output, see Document Processing Flow
For the middle.json structure that MagicModel produces, see middle.json Structure
For backend-specific processing details, see Pipeline Backend, VLM Backend, and Hybrid Backend
For text merging after block processing, see Text Merging and Language Handling

Architecture Overview

MagicModel has three backend-specific implementations, each processing different input formats but producing a common output structure:

Sources: mineru/backend/pipeline/pipeline_magic_model.py6-8 mineru/backend/vlm/vlm_magic_model.py12-14 mineru/backend/hybrid/hybrid_magic_model.py15-27

All three implementations expose the same getter methods:

Method	Returns	Description
`get_image_blocks()`	list	Image blocks with captions/footnotes
`get_table_blocks()`	list	Table blocks with captions/footnotes
`get_text_blocks()`	list	Regular text blocks
`get_title_blocks()`	list	Title/heading blocks
`get_interline_equation_blocks()`	list	Display equation blocks
`get_code_blocks()`	list	Code and algorithm blocks
`get_list_blocks()`	list	List item blocks
`get_ref_text_blocks()`	list	Reference text blocks
`get_phonetic_blocks()`	list	Phonetic annotation blocks
`get_discarded_blocks()`	list	Discarded blocks (headers, footers)
`get_all_spans()`	list	All atomic content spans

Sources: mineru/backend/vlm/vlm_magic_model.py240-271 mineru/backend/hybrid/hybrid_magic_model.py315-346 mineru/backend/pipeline/pipeline_magic_model.py246-318

Block Types and Content Types

MagicModel works with two hierarchical enumerations that define the structure of parsed documents:

BlockType Enumeration

Sources: mineru/utils/enum_class.py3-31

ContentType Enumeration

ContentType defines the atomic span types that appear within blocks:

ContentType	Description	Example Usage
`TEXT`	Regular text content	Paragraphs, captions
`INLINE_EQUATION`	Inline math formula	$x^2 + y^2$
`INTERLINE_EQUATION`	Display math formula	`$$\int_0^\infty e^{-x}dx$$`
`IMAGE`	Image content	Cropped image references
`TABLE`	Table content	HTML table structure

Sources: mineru/utils/enum_class.py33-40

Pipeline MagicModel Implementation

The pipeline backend's MagicModel processes layout detection results with CategoryId labels:

Input Structure

Sources: mineru/backend/pipeline/pipeline_magic_model.py8-11

Processing Pipeline

Sources: mineru/backend/pipeline/pipeline_magic_model.py8-21

Caption and Footnote Association

The pipeline MagicModel uses distance-based association to link captions and footnotes with their parent elements:

The __tie_up_category_by_distance_v3 method uses spatial proximity to match captions and footnotes to their parent blocks, ensuring structural relationships are preserved.

Sources: mineru/backend/pipeline/pipeline_magic_model.py246-283 mineru/utils/magic_model_utils.py31-171

CategoryId to BlockType Mapping

The pipeline backend maps CategoryId values to BlockType through the extraction process:

CategoryId	BlockType Output
`CategoryId.Title`	`BlockType.TITLE`
`CategoryId.Text`	`BlockType.TEXT`
`CategoryId.ImageBody`	`BlockType.IMAGE_BODY`
`CategoryId.ImageCaption`	`BlockType.IMAGE_CAPTION`
`CategoryId.ImageFootnote`	`BlockType.IMAGE_FOOTNOTE`
`CategoryId.TableBody`	`BlockType.TABLE_BODY`
`CategoryId.TableCaption`	`BlockType.TABLE_CAPTION`
`CategoryId.TableFootnote`	`BlockType.TABLE_FOOTNOTE`
`CategoryId.InterlineEquation_Layout`	`BlockType.INTERLINE_EQUATION`
`CategoryId.InlineEquation`	`ContentType.INLINE_EQUATION` span
`CategoryId.Abandon`	`BlockType.DISCARDED`

Sources: mineru/utils/enum_class.py68-84 mineru/backend/pipeline/pipeline_magic_model.py284-318

VLM MagicModel Implementation

The VLM backend's MagicModel receives fully parsed blocks with content from the vision-language model:

Input Structure

Sources: mineru/backend/vlm/vlm_magic_model.py12-45

Block Processing Logic

The VLM MagicModel performs content-based processing:

Sources: mineru/backend/vlm/vlm_magic_model.py47-183

Inline Equation Detection

The VLM MagicModel extracts inline equations from text content using regex:

Sources: mineru/backend/vlm/vlm_magic_model.py106-148

Code Block Handling

VLM MagicModel processes code blocks with special logic:

If a code block contains inline equations (indicated by $...$ patterns), it is automatically reclassified as an algorithm block, since algorithms often contain mathematical notation.

Sources: mineru/backend/vlm/vlm_magic_model.py76-80 mineru/backend/vlm/vlm_magic_model.py282-295 mineru/backend/vlm/vlm_magic_model.py86-108

Hybrid MagicModel Implementation

The hybrid backend's MagicModel combines VLM results with OCR-based span extraction:

Input Structure

Sources: mineru/backend/hybrid/hybrid_magic_model.py15-27

Dual-Path Processing

The hybrid MagicModel uses two different processing paths based on the _vlm_ocr_enable flag:

Sources: mineru/backend/hybrid/hybrid_magic_model.py38-242

NotExtractType Blocks

The hybrid backend defines certain block types that should NOT be extracted from VLM content when in span-fill mode:

These blocks are filled with OCR-derived spans instead of VLM content to ensure higher text extraction accuracy.

Sources: mineru/utils/enum_class.py120-132 mineru/backend/hybrid/hybrid_magic_model.py13 mineru/backend/hybrid/hybrid_magic_model.py135-242

Span Filling Process

When not in VLM OCR mode, the hybrid backend fills text blocks with OCR-derived spans:

The fix_text_block function organizes loose spans into structured lines within the block.

Sources: mineru/backend/hybrid/hybrid_magic_model.py224-242 mineru/utils/span_block_fix.py1-50

Two-Layer Block Construction

All three MagicModel implementations construct two-layer structures for images, tables, and code blocks:

Association Logic

The fix_two_layer_blocks function (used by VLM and Hybrid) implements index-based association. It differs from the pipeline backend which uses tie_up_category_by_distance_v3:

VLM/Hybrid Backend Association Flow (Index-Based)

Pipeline Backend Association (Distance-Based)

The pipeline backend uses tie_up_category_by_distance_v3 which matches components by spatial proximity rather than index:

This spatial matching uses bbox_distance to find nearest neighbors rather than relying on reading order indices.

Caption and Footnote Position Validation

The fix_two_layer_blocks function validates position constraints and index continuity:

Position Constraints (Lines 461-472 in vlm_magic_model.py)

Index Continuity Validation (Lines 428-451 for captions, 454-466 for footnotes)

Captions must form a continuous sequence working backward from the body:

Sort by index descending (closest to body first)
Check each caption_idx == prev_idx - 1
Allow body_index as gap exception
Break on first real gap

Footnotes must form a continuous sequence working forward from the body:

Sort by index ascending (closest to body first)
Check each footnote_idx == prev_idx + 1
Break on first gap

tie_up_category_by_index Matching Logic

The underlying tie_up_category_by_index function uses a three-tier priority system:

Priority	Criterion	Code Reference
1 (Highest)	Effective index difference	`calc_effective_index_diff()` at line 219-237
2	Bbox edge distance	`bbox_distance()` at line 265
3 (Lowest)	Bbox center distance	`bbox_center_distance()` at line 285

Special rules when edge distance diff <= 2:

For table_caption: match to later subject (line 276-278)
For *_footnote: match to earlier subject (line 279-282)
Otherwise: use center distance as tiebreaker (line 284-288)

Sources: mineru/backend/vlm/vlm_magic_model.py379-502 mineru/backend/hybrid/hybrid_magic_model.py456-577 mineru/utils/magic_model_utils.py173-299

List Block Processing

The fix_list_blocks function associates list items with their container using overlap detection:

List Block Processing Flow

Sub-Type Determination (Lines 529-541)

The sub_type field is determined by counting block types within the list:

This allows distinguishing between text lists, reference lists, or mixed content lists.

Sources: mineru/backend/vlm/vlm_magic_model.py505-543 mineru/backend/hybrid/hybrid_magic_model.py580-618 mineru/utils/boxbase.py174-191

Span Generation and Extraction

Span Structure

Spans are atomic content units stored in both individual blocks and the aggregated all_spans list:

Field	Type	Present When	Description
`bbox`	`[x0, y0, x1, y1]`	Always	Pixel coordinates
`type`	`ContentType` enum	Always	TEXT, INLINE_EQUATION, INTERLINE_EQUATION, IMAGE, TABLE
`content`	string	TEXT or EQUATION types	Extracted or recognized text/LaTeX
`html`	string	TABLE type	HTML table structure from VLM/pipeline
`latex`	string	TABLE type (pipeline)	OTSL or LaTeX table format
`score`	float	Pipeline/Hybrid	Model confidence (0.0-1.0)
`image_path`	string	IMAGE type (added later)	Path to cropped image file

Pipeline Backend Span Generation

The pipeline backend generates spans from get_all_spans() method (lines 308-352):

Pipeline get_all_spans() Processing

Downstream Processing (model_json_to_middle_json.py)

Sources: mineru/backend/pipeline/pipeline_magic_model.py308-352 mineru/backend/pipeline/model_json_to_middle_json.py136-173 mineru/utils/span_pre_proc.py

VLM Backend Span Generation

The VLM backend generates spans directly from block content:

Sources: mineru/backend/vlm/vlm_magic_model.py86-164

Hybrid Backend Span Generation

The hybrid backend uses a conditional dual-path approach for span generation:

Span Source Selection (Lines 38-62)

Block Content vs Span Filling (Lines 135-242)

For each block during processing:

The not_extract_list includes: TEXT, TITLE, HEADER, FOOTER, PAGE_NUMBER, PAGE_FOOTNOTE, REF_TEXT, and all caption/footnote types. These blocks use span-filling for better accuracy.

Block and Span Aggregation Flow

The following diagram shows how blocks and spans are collected after MagicModel processing:

Sources: mineru/backend/pipeline/model_json_to_middle_json.py36-169 mineru/utils/block_pre_proc.py11-31

Usage in Backend Processing

Pipeline Backend Usage

Sources: mineru/backend/pipeline/model_json_to_middle_json.py28-56

VLM and Hybrid Backend Usage

VLM and Hybrid backends use MagicModel in their respective result_to_middle_json functions:

Sources: mineru/backend/vlm/vlm_analyze.py1-200 mineru/backend/hybrid/hybrid_analyze.py1-300

Integration with Block Processing Pipeline

After MagicModel extraction, blocks undergo further processing:

Sources: mineru/backend/pipeline/model_json_to_middle_json.py176-253 mineru/backend/pipeline/para_split.py1-50 mineru/utils/table_merge.py537-589

Summary

MagicModel serves as the critical transformation layer that:

Normalizes backend outputs - Converts backend-specific inference results into a common block structure
Establishes hierarchy - Creates two-layer structures for complex elements (images, tables, code)
Associates components - Links captions and footnotes with their parent elements using spatial or index-based matching
Generates spans - Extracts atomic content units (text, equations, images) from blocks
Handles special cases - Processes inline equations, code blocks, and list structures

Sources: mineru/backend/pipeline/pipeline_magic_model.py6-318 mineru/backend/vlm/vlm_magic_model.py12-543 mineru/backend/hybrid/hybrid_magic_model.py15-698

MagicModel Block Processing

Purpose and Scope

Architecture Overview

Block Types and Content Types

BlockType Enumeration

ContentType Enumeration

Pipeline MagicModel Implementation

Input Structure

Processing Pipeline

Caption and Footnote Association

CategoryId to BlockType Mapping

VLM MagicModel Implementation

Input Structure

Block Processing Logic

Inline Equation Detection

Code Block Handling

Hybrid MagicModel Implementation

Input Structure

Dual-Path Processing

NotExtractType Blocks

Span Filling Process

Two-Layer Block Construction

Association Logic

Caption and Footnote Position Validation

List Block Processing

Span Generation and Extraction

Span Structure

Pipeline Backend Span Generation

VLM Backend Span Generation

Hybrid Backend Span Generation

Block and Span Aggregation Flow

Usage in Backend Processing

Pipeline Backend Usage

VLM and Hybrid Backend Usage

Integration with Block Processing Pipeline

Summary

On this page

MagicModel Block Processing

Purpose and Scope

Architecture Overview

Block Types and Content Types

BlockType Enumeration

ContentType Enumeration

Pipeline MagicModel Implementation

Input Structure

Processing Pipeline

Caption and Footnote Association

CategoryId to BlockType Mapping

VLM MagicModel Implementation

Input Structure

Block Processing Logic

Inline Equation Detection

Code Block Handling

Hybrid MagicModel Implementation

Input Structure

Dual-Path Processing

NotExtractType Blocks

Span Filling Process

Two-Layer Block Construction

Association Logic

Caption and Footnote Position Validation

List Block Processing

Span Generation and Extraction

Span Structure

Pipeline Backend Span Generation

VLM Backend Span Generation

Hybrid Backend Span Generation

Block and Span Aggregation Flow

Usage in Backend Processing

Pipeline Backend Usage

VLM and Hybrid Backend Usage

Integration with Block Processing Pipeline

Summary

On this page