Advanced Features

Relevant source files

This document covers specialized features and utilities in MinerU that enhance document processing capabilities beyond basic parsing. These features include multi-language OCR support, span refinement, cross-page table merging, LLM-aided optimizations, PDF-to-image conversion, and configuration management.

For core document parsing workflows, see System Architecture. For backend-specific processing details, see Pipeline Backend, VLM Backend, and Hybrid Backend.

Multi-Language Support and Detection

MinerU supports 109 languages through PaddleOCR integration and automatic language detection using fasttext models. Language configuration affects OCR accuracy and text processing behavior.

Language Categories

The system organizes languages into predefined groups for OCR model selection:

Primary Language Sets:

Language Code	Supported Languages
`ch`	Chinese, English, Chinese Traditional
`ch_lite`	Chinese, English, Chinese Traditional, Japanese
`ch_server`	Chinese, English, Chinese Traditional, Japanese
`en`	English
`korean`	Korean, English
`japan`	Chinese, English, Chinese Traditional, Japanese
`chinese_cht`	Chinese, English, Chinese Traditional, Japanese
`ta`	Tamil, English
`te`	Telugu, English
`ka`	Kannada
`el`	Greek, English
`th`	Thai, English

Extended Language Sets:

Language Code	Region	Languages
`latin`	European/Latin America	French, German, Italian, Spanish, Portuguese, Dutch, Norwegian, Polish, Swedish, Turkish, Romanian, Finnish, etc. (40+ languages)
`arabic`	Middle East/Central Asia	Arabic, Persian, Uyghur, Urdu, Pashto, Kurdish, Sindhi, Balochi, English
`east_slavic`	Eastern Europe	Russian, Belarusian, Ukrainian, English
`cyrillic`	Cyrillic Script	Russian, Bulgarian, Mongolian, Kazakh, Kyrgyz, Tajik, Macedonian, Serbian (Cyrillic), etc. (30+ languages)
`devanagari`	South Asia	Hindi, Marathi, Nepali, Sanskrit, English

Sources: mineru/cli/gradio_app.py149-169 mineru/cli/client.py76-84 mineru/cli/fast_api.py134-152

Language Detection Flow

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py25-91 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py106-179 mineru/utils/language.py

Character Normalization

The system handles full-width and half-width character conversion:

Implementation:

full_to_half_exclude_marks(): Converts full-width letters (FF21-FF3A, FF41-FF5A) and numbers (FF10-FF19) only
full_to_half(): Converts all full-width characters (FF01-FF5E) including punctuation
Ligature replacement: Handles special characters like 'ﬁ' → 'fi', 'ﬂ' → 'fl'

Sources: mineru/utils/char_utils.py18-55 mineru/utils/span_pre_proc.py110-114

CJK vs Western Text Handling

CJK Languages (Chinese, Japanese, Korean):

No space insertion between line breaks
Spaces added after inline formulas
No hyphenation logic

Western Languages:

Spaces between words at line breaks
Hyphen removal when word continues on next line
Logic checks next line's first character case

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py58-90 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py143-175

Span processing involves filtering, overlap removal, confidence-based selection, and text extraction from PDF layers.

Span Processing Pipeline

Sources: mineru/backend/pipeline/model_json_to_middle_json.py128-168 mineru/utils/span_pre_proc.py18-108

Overlap Removal Strategies

Strategy 1: Confidence-Based (IOU > 0.9)

Strategy 2: Size-Based (Ratio > 0.65)

Sources: mineru/utils/span_pre_proc.py60-107

Confidence Thresholds

The OcrConfidence class defines minimum confidence thresholds:

Confidence Level	Value	Usage
`min_confidence`	0.3-0.5	Minimum acceptable OCR score
High confidence	> 0.8	Reliable text extraction
Low confidence	< min	Marked as `category_id=16` (low score text)

Special Cases:

Specific low-confidence patterns (e.g., "(cid:)", "cd:)") with score < 0.8 and vertical layout are marked as unreliable
Pattern list includes CID mappings and mathematical symbols that OCR often misrecognizes

Sources: mineru/backend/hybrid/hybrid_analyze.py283-299 mineru/utils/ocr_utils.py

Text Extraction from PDF

Implementation Details:

Uses pdftext library for PDF text layer extraction
Filters out rotations other than 0°, 90°, 180°, 270° (within 0.1° tolerance)
Calculates median span height for vertical span detection
Vertical spans criteria: height > median*2.3 AND height/width > 2.3

Sources: mineru/utils/span_pre_proc.py124-180 mineru/utils/pdf_text_tool.py9-40

Table Cross-Page Merging

MinerU detects and merges tables split across pages using header matching, column structure analysis, and caption markers.

Table Merge Decision Logic

Sources: mineru/utils/table_merge.py287-354 mineru/backend/utils.py

Continuation Markers

End-of-Caption Markers:

"(续)", "(续表)", "(续上表)", "(continued)", "(cont.)", "(cont'd)", "(…continued)", "续表"

Inline Caption Markers:

"(continued)"

These markers trigger relaxed merging rules, allowing up to 1 footnote in the previous table.

Sources: mineru/utils/table_merge.py12-26

Column Count Calculation

Implementation:

calculate_table_total_columns(): Handles colspan and rowspan by building occupation matrix
calculate_row_effective_columns(): Returns per-row effective columns considering rowspan
calculate_row_columns(): Sums colspan values (ignores rowspan)
calculate_visual_columns(): Counts actual td/th cells (ignores both colspan and rowspan)

Sources: mineru/utils/table_merge.py28-168

Header Detection

Strict vs Visual Matching:

Strict: Requires identical colspan, rowspan, effective columns, AND text
Visual: Only requires identical text content and effective columns, ignores colspan/rowspan differences

Sources: mineru/utils/table_merge.py170-284

HTML Merging Process

colspan Adjustment Logic:

If row matches reference structure (same visual columns and colspan pattern), apply reference colspan values
Otherwise, extend last cell's colspan to fill column difference
Uses effective columns (considering rowspan) to calculate differences

Sources: mineru/utils/table_merge.py471-567 mineru/utils/table_merge.py419-469

LLM-Aided Title Optimization

MinerU optionally uses LLM models to optimize title hierarchy levels in parsed documents.

Configuration and Activation

Sources: mineru/backend/pipeline/model_json_to_middle_json.py237-247 mineru/backend/hybrid/hybrid_model_output_to_middle_json.py22-106

Line Height Calculation

For VLM Backend (no line information):

Extract title block from page image using get_crop_img()
Add 50-pixel white border padding
Run OCR detection only (ocr_model.ocr(title_img, rec=False))
Calculate average height: avg_height = mean([box[2][1] - box[0][1] for box in ocr_det_res])
Scale back: title_block['line_avg_height'] = round(avg_height/scale)

For Pipeline Backend (has line information):

Get lines from title_block['lines']
Calculate: avg_height = sum(line['bbox'][3] - line['bbox'][1] for line in lines) / len(lines)
Store: title_block['line_avg_height'] = round(avg_height)

Fallback: If no lines, use block height: bbox[3] - bbox[1]

Sources: mineru/backend/hybrid/hybrid_model_output_to_middle_json.py75-105

Title Level Assignment

The get_title_level() function extracts the title level from a block:

Levels are used in Markdown generation:

Level 1: # Title
Level 2: ## Title
Level N: # * N + Title

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py106-107 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py21-22

PDF to Image Conversion

MinerU converts PDF pages to images using pypdfium2 with multiprocessing support and timeout controls.

Conversion Pipeline

Sources: mineru/utils/pdf_image_tools.py mineru/cli/common.py54-91

Page Range Handling

Error Handling:

Page-level failure: Skip page, log warning, continue processing
Document-level failure: Fall back to original PDF bytes
Failed imports are removed from output PDF before continuing

Sources: mineru/cli/common.py54-82 mineru/utils/pdf_page_id.py

Image Type Options

Currently, all backends use PIL image format (ImageType.PIL). The BASE64 option exists for potential future use cases.

Sources: mineru/utils/enum_class.py115-117 mineru/backend/pipeline/pipeline_analyze.py103 mineru/backend/vlm/vlm_analyze.py mineru/backend/hybrid/hybrid_analyze.py402

Configuration Files and Environment Variables

MinerU supports configuration through mineru.json files and environment variables with precedence rules.

Configuration Priority

Sources: mineru/utils/cli_parser.py4-38 mineru/cli/client.py154-182

Key Environment Variables

Variable	Purpose	Values	Default
`MINERU_DEVICE_MODE`	Device selection	`cpu`, `cuda`, `cuda:0`, `npu`, `npu:0`, `mps`	Auto-detect
`MINERU_VIRTUAL_VRAM_SIZE`	GPU memory limit (MB)	Integer	Auto-detect
`MINERU_MODEL_SOURCE`	Model repository	`huggingface`, `modelscope`, `local`	`huggingface`
`MINERU_LOG_LEVEL`	Logging verbosity	`DEBUG`, `INFO`, `WARNING`, `ERROR`	`INFO`
`MINERU_MIN_BATCH_INFERENCE_SIZE`	Batch size threshold	Integer	`384`
`MINERU_HYBRID_BATCH_RATIO`	Hybrid backend batch ratio	Integer	Auto-calculated
`MINERU_VLM_FORMULA_ENABLE`	VLM formula recognition	`true`, `false`	`true`
`MINERU_VLM_TABLE_ENABLE`	VLM table recognition	`true`, `false`	`true`
`MINERU_FORCE_VLM_OCR_ENABLE`	Force VLM OCR mode	`0`, `1`, `true`, `false`	`false`
`MINERU_HYBRID_FORCE_PIPELINE_ENABLE`	Force pipeline OCR	`0`, `1`, `true`, `false`	`false`
`MINERU_API_MAX_CONCURRENT_REQUESTS`	FastAPI concurrency limit	Integer	`0` (unlimited)
`MINERU_API_ENABLE_FASTAPI_DOCS`	Enable API docs	`0`, `1`, `true`, `false`	`true`
`MINERU_DONOT_CLEAN_MEM`	Disable memory cleanup	Any value	Not set
`MINERU_LMDEPLOY_DEVICE`	LMDeploy device override	`maca`, `corex`, etc.	Not set
`TOKENIZERS_PARALLELISM`	Tokenizer parallel mode	`true`, `false`	`false`
`PYTORCH_ENABLE_MPS_FALLBACK`	MPS fallback	`0`, `1`	`1`
`NO_ALBUMENTATIONS_UPDATE`	Disable albumentations check	`0`, `1`	`1`

Sources: mineru/cli/common.py22-30 mineru/backend/pipeline/pipeline_analyze.py81 mineru/backend/hybrid/hybrid_analyze.py25-26 mineru/backend/hybrid/hybrid_analyze.py340-347 mineru/backend/hybrid/hybrid_analyze.py369-381 mineru/cli/fast_api.py52-75

mineru.json Structure

Configuration Readers:

get_device(): Returns device mode (cpu/cuda/npu/mps)
get_vram(): Returns available VRAM in GB
get_formula_enable(): Returns formula processing flag
get_table_enable(): Returns table processing flag
get_llm_aided_config(): Returns LLM optimization settings
get_latex_delimiter_config(): Returns LaTeX delimiter configuration

Sources: mineru/utils/config_reader.py mineru/backend/vlm/vlm_middle_json_mkcontent.py10-22

Batch Ratio Calculation (Hybrid Backend)

Recommended Settings for Client-Server Architecture:

Client VRAM	MINERU_HYBRID_BATCH_RATIO
≤ 6 GB	8
≤ 4.5 GB	4
≤ 3 GB	2
≤ 2.5 GB	1

Note: Values consider redundancy for multi-client deployments. Reserve one client's VRAM worth as overall redundancy.

Sources: mineru/backend/hybrid/hybrid_analyze.py323-366

Debugging and Visualization Flags

Output control flags in do_parse() and aio_do_parse():

Flag	Default	Output File	Purpose
`f_draw_layout_bbox`	`True`	`*_layout.pdf`	Colored bounding boxes for layout blocks
`f_draw_span_bbox`	`True`	`*_span.pdf`	Bounding boxes for text spans
`f_dump_md`	`True`	`*.md`	Markdown output
`f_dump_middle_json`	`True`	`*_middle.json`	Intermediate JSON format
`f_dump_model_output`	`True`	`*_model.json`	Raw model predictions
`f_dump_orig_pdf`	`True`	`*_origin.pdf`	Original PDF copy
`f_dump_content_list`	`True`	`*_content_list.json`	Flat content structure
`f_make_md_mode`	`MakeMode.MM_MD`	N/A	Markdown generation mode

MakeMode Options:

MM_MD: Multimodal Markdown (includes images/tables)
NLP_MD: NLP-focused Markdown (text-only)
CONTENT_LIST: Flat JSON list
CONTENT_LIST_V2: Enhanced JSON structure

Sources: mineru/cli/common.py94-169 mineru/utils/enum_class.py86-90 mineru/utils/draw_bbox.py120-258

Advanced Features

Relevant source files

For core document parsing workflows, see System Architecture. For backend-specific processing details, see Pipeline Backend, VLM Backend, and Hybrid Backend.

Multi-Language Support and Detection

MinerU supports 109 languages through PaddleOCR integration and automatic language detection using fasttext models. Language configuration affects OCR accuracy and text processing behavior.

Language Categories

The system organizes languages into predefined groups for OCR model selection:

Primary Language Sets:

Language Code	Supported Languages
`ch`	Chinese, English, Chinese Traditional
`ch_lite`	Chinese, English, Chinese Traditional, Japanese
`ch_server`	Chinese, English, Chinese Traditional, Japanese
`en`	English
`korean`	Korean, English
`japan`	Chinese, English, Chinese Traditional, Japanese
`chinese_cht`	Chinese, English, Chinese Traditional, Japanese
`ta`	Tamil, English
`te`	Telugu, English
`ka`	Kannada
`el`	Greek, English
`th`	Thai, English

Extended Language Sets:

Language Code	Region	Languages
`latin`	European/Latin America	French, German, Italian, Spanish, Portuguese, Dutch, Norwegian, Polish, Swedish, Turkish, Romanian, Finnish, etc. (40+ languages)
`arabic`	Middle East/Central Asia	Arabic, Persian, Uyghur, Urdu, Pashto, Kurdish, Sindhi, Balochi, English
`east_slavic`	Eastern Europe	Russian, Belarusian, Ukrainian, English
`cyrillic`	Cyrillic Script	Russian, Bulgarian, Mongolian, Kazakh, Kyrgyz, Tajik, Macedonian, Serbian (Cyrillic), etc. (30+ languages)
`devanagari`	South Asia	Hindi, Marathi, Nepali, Sanskrit, English

Sources: mineru/cli/gradio_app.py149-169 mineru/cli/client.py76-84 mineru/cli/fast_api.py134-152

Language Detection Flow

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py25-91 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py106-179 mineru/utils/language.py

Character Normalization

The system handles full-width and half-width character conversion:

Implementation:

full_to_half_exclude_marks(): Converts full-width letters (FF21-FF3A, FF41-FF5A) and numbers (FF10-FF19) only
full_to_half(): Converts all full-width characters (FF01-FF5E) including punctuation
Ligature replacement: Handles special characters like 'ﬁ' → 'fi', 'ﬂ' → 'fl'

Sources: mineru/utils/char_utils.py18-55 mineru/utils/span_pre_proc.py110-114

CJK vs Western Text Handling

CJK Languages (Chinese, Japanese, Korean):

No space insertion between line breaks
Spaces added after inline formulas
No hyphenation logic

Western Languages:

Spaces between words at line breaks
Hyphen removal when word continues on next line
Logic checks next line's first character case

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py58-90 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py143-175

Span processing involves filtering, overlap removal, confidence-based selection, and text extraction from PDF layers.

Span Processing Pipeline

Sources: mineru/backend/pipeline/model_json_to_middle_json.py128-168 mineru/utils/span_pre_proc.py18-108

Overlap Removal Strategies

Strategy 1: Confidence-Based (IOU > 0.9)

Strategy 2: Size-Based (Ratio > 0.65)

Sources: mineru/utils/span_pre_proc.py60-107

Confidence Thresholds

The OcrConfidence class defines minimum confidence thresholds:

Confidence Level	Value	Usage
`min_confidence`	0.3-0.5	Minimum acceptable OCR score
High confidence	> 0.8	Reliable text extraction
Low confidence	< min	Marked as `category_id=16` (low score text)

Special Cases:

Specific low-confidence patterns (e.g., "(cid:)", "cd:)") with score < 0.8 and vertical layout are marked as unreliable
Pattern list includes CID mappings and mathematical symbols that OCR often misrecognizes

Sources: mineru/backend/hybrid/hybrid_analyze.py283-299 mineru/utils/ocr_utils.py

Text Extraction from PDF

Implementation Details:

Uses pdftext library for PDF text layer extraction
Filters out rotations other than 0°, 90°, 180°, 270° (within 0.1° tolerance)
Calculates median span height for vertical span detection
Vertical spans criteria: height > median*2.3 AND height/width > 2.3

Sources: mineru/utils/span_pre_proc.py124-180 mineru/utils/pdf_text_tool.py9-40

Table Cross-Page Merging

MinerU detects and merges tables split across pages using header matching, column structure analysis, and caption markers.

Table Merge Decision Logic

Sources: mineru/utils/table_merge.py287-354 mineru/backend/utils.py

Continuation Markers

End-of-Caption Markers:

"(续)", "(续表)", "(续上表)", "(continued)", "(cont.)", "(cont'd)", "(…continued)", "续表"

Inline Caption Markers:

"(continued)"

These markers trigger relaxed merging rules, allowing up to 1 footnote in the previous table.

Sources: mineru/utils/table_merge.py12-26

Column Count Calculation

Implementation:

calculate_table_total_columns(): Handles colspan and rowspan by building occupation matrix
calculate_row_effective_columns(): Returns per-row effective columns considering rowspan
calculate_row_columns(): Sums colspan values (ignores rowspan)
calculate_visual_columns(): Counts actual td/th cells (ignores both colspan and rowspan)

Sources: mineru/utils/table_merge.py28-168

Header Detection

Strict vs Visual Matching:

Strict: Requires identical colspan, rowspan, effective columns, AND text
Visual: Only requires identical text content and effective columns, ignores colspan/rowspan differences

Sources: mineru/utils/table_merge.py170-284

HTML Merging Process

colspan Adjustment Logic:

If row matches reference structure (same visual columns and colspan pattern), apply reference colspan values
Otherwise, extend last cell's colspan to fill column difference
Uses effective columns (considering rowspan) to calculate differences

Sources: mineru/utils/table_merge.py471-567 mineru/utils/table_merge.py419-469

LLM-Aided Title Optimization

MinerU optionally uses LLM models to optimize title hierarchy levels in parsed documents.

Configuration and Activation

Sources: mineru/backend/pipeline/model_json_to_middle_json.py237-247 mineru/backend/hybrid/hybrid_model_output_to_middle_json.py22-106

Line Height Calculation

For VLM Backend (no line information):

Extract title block from page image using get_crop_img()
Add 50-pixel white border padding
Run OCR detection only (ocr_model.ocr(title_img, rec=False))
Calculate average height: avg_height = mean([box[2][1] - box[0][1] for box in ocr_det_res])
Scale back: title_block['line_avg_height'] = round(avg_height/scale)

For Pipeline Backend (has line information):

Get lines from title_block['lines']
Calculate: avg_height = sum(line['bbox'][3] - line['bbox'][1] for line in lines) / len(lines)
Store: title_block['line_avg_height'] = round(avg_height)

Fallback: If no lines, use block height: bbox[3] - bbox[1]

Sources: mineru/backend/hybrid/hybrid_model_output_to_middle_json.py75-105

Title Level Assignment

The get_title_level() function extracts the title level from a block:

Levels are used in Markdown generation:

Level 1: # Title
Level 2: ## Title
Level N: # * N + Title

Sources: mineru/backend/vlm/vlm_middle_json_mkcontent.py106-107 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py21-22

PDF to Image Conversion

MinerU converts PDF pages to images using pypdfium2 with multiprocessing support and timeout controls.

Conversion Pipeline

Sources: mineru/utils/pdf_image_tools.py mineru/cli/common.py54-91

Page Range Handling

Error Handling:

Page-level failure: Skip page, log warning, continue processing
Document-level failure: Fall back to original PDF bytes
Failed imports are removed from output PDF before continuing

Sources: mineru/cli/common.py54-82 mineru/utils/pdf_page_id.py

Image Type Options

Currently, all backends use PIL image format (ImageType.PIL). The BASE64 option exists for potential future use cases.

Sources: mineru/utils/enum_class.py115-117 mineru/backend/pipeline/pipeline_analyze.py103 mineru/backend/vlm/vlm_analyze.py mineru/backend/hybrid/hybrid_analyze.py402

Configuration Files and Environment Variables

MinerU supports configuration through mineru.json files and environment variables with precedence rules.

Configuration Priority

Sources: mineru/utils/cli_parser.py4-38 mineru/cli/client.py154-182

Key Environment Variables

Variable	Purpose	Values	Default
`MINERU_DEVICE_MODE`	Device selection	`cpu`, `cuda`, `cuda:0`, `npu`, `npu:0`, `mps`	Auto-detect
`MINERU_VIRTUAL_VRAM_SIZE`	GPU memory limit (MB)	Integer	Auto-detect
`MINERU_MODEL_SOURCE`	Model repository	`huggingface`, `modelscope`, `local`	`huggingface`
`MINERU_LOG_LEVEL`	Logging verbosity	`DEBUG`, `INFO`, `WARNING`, `ERROR`	`INFO`
`MINERU_MIN_BATCH_INFERENCE_SIZE`	Batch size threshold	Integer	`384`
`MINERU_HYBRID_BATCH_RATIO`	Hybrid backend batch ratio	Integer	Auto-calculated
`MINERU_VLM_FORMULA_ENABLE`	VLM formula recognition	`true`, `false`	`true`
`MINERU_VLM_TABLE_ENABLE`	VLM table recognition	`true`, `false`	`true`
`MINERU_FORCE_VLM_OCR_ENABLE`	Force VLM OCR mode	`0`, `1`, `true`, `false`	`false`
`MINERU_HYBRID_FORCE_PIPELINE_ENABLE`	Force pipeline OCR	`0`, `1`, `true`, `false`	`false`
`MINERU_API_MAX_CONCURRENT_REQUESTS`	FastAPI concurrency limit	Integer	`0` (unlimited)
`MINERU_API_ENABLE_FASTAPI_DOCS`	Enable API docs	`0`, `1`, `true`, `false`	`true`
`MINERU_DONOT_CLEAN_MEM`	Disable memory cleanup	Any value	Not set
`MINERU_LMDEPLOY_DEVICE`	LMDeploy device override	`maca`, `corex`, etc.	Not set
`TOKENIZERS_PARALLELISM`	Tokenizer parallel mode	`true`, `false`	`false`
`PYTORCH_ENABLE_MPS_FALLBACK`	MPS fallback	`0`, `1`	`1`
`NO_ALBUMENTATIONS_UPDATE`	Disable albumentations check	`0`, `1`	`1`

mineru.json Structure

Configuration Readers:

get_device(): Returns device mode (cpu/cuda/npu/mps)
get_vram(): Returns available VRAM in GB
get_formula_enable(): Returns formula processing flag
get_table_enable(): Returns table processing flag
get_llm_aided_config(): Returns LLM optimization settings
get_latex_delimiter_config(): Returns LaTeX delimiter configuration

Sources: mineru/utils/config_reader.py mineru/backend/vlm/vlm_middle_json_mkcontent.py10-22

Batch Ratio Calculation (Hybrid Backend)

Recommended Settings for Client-Server Architecture:

Client VRAM	MINERU_HYBRID_BATCH_RATIO
≤ 6 GB	8
≤ 4.5 GB	4
≤ 3 GB	2
≤ 2.5 GB	1

Note: Values consider redundancy for multi-client deployments. Reserve one client's VRAM worth as overall redundancy.

Sources: mineru/backend/hybrid/hybrid_analyze.py323-366

Debugging and Visualization Flags

Output control flags in do_parse() and aio_do_parse():

Flag	Default	Output File	Purpose
`f_draw_layout_bbox`	`True`	`*_layout.pdf`	Colored bounding boxes for layout blocks
`f_draw_span_bbox`	`True`	`*_span.pdf`	Bounding boxes for text spans
`f_dump_md`	`True`	`*.md`	Markdown output
`f_dump_middle_json`	`True`	`*_middle.json`	Intermediate JSON format
`f_dump_model_output`	`True`	`*_model.json`	Raw model predictions
`f_dump_orig_pdf`	`True`	`*_origin.pdf`	Original PDF copy
`f_dump_content_list`	`True`	`*_content_list.json`	Flat content structure
`f_make_md_mode`	`MakeMode.MM_MD`	N/A	Markdown generation mode

MakeMode Options:

MM_MD: Multimodal Markdown (includes images/tables)
NLP_MD: NLP-focused Markdown (text-only)
CONTENT_LIST: Flat JSON list
CONTENT_LIST_V2: Enhanced JSON structure

Sources: mineru/cli/common.py94-169 mineru/utils/enum_class.py86-90 mineru/utils/draw_bbox.py120-258

Advanced Features

Multi-Language Support and Detection

Language Categories

Language Detection Flow

Character Normalization

CJK vs Western Text Handling

Span Processing and Refinement

Span Processing Pipeline

Overlap Removal Strategies

Confidence Thresholds

Text Extraction from PDF

Table Cross-Page Merging

Table Merge Decision Logic

Continuation Markers

Column Count Calculation

Header Detection

HTML Merging Process

LLM-Aided Title Optimization

Configuration and Activation

Line Height Calculation

Title Level Assignment

PDF to Image Conversion

Conversion Pipeline

Page Range Handling

Image Type Options

Configuration Files and Environment Variables

Configuration Priority

Key Environment Variables

mineru.json Structure

Batch Ratio Calculation (Hybrid Backend)

Debugging and Visualization Flags

On this page

Advanced Features

Multi-Language Support and Detection

Language Categories

Language Detection Flow

Character Normalization

CJK vs Western Text Handling

Span Processing and Refinement

Span Processing Pipeline

Overlap Removal Strategies

Confidence Thresholds

Text Extraction from PDF

Table Cross-Page Merging

Table Merge Decision Logic

Continuation Markers

Column Count Calculation

Header Detection

HTML Merging Process

LLM-Aided Title Optimization

Configuration and Activation

Line Height Calculation

Title Level Assignment

PDF to Image Conversion

Conversion Pipeline

Page Range Handling

Image Type Options

Configuration Files and Environment Variables

Configuration Priority

Key Environment Variables

mineru.json Structure

Batch Ratio Calculation (Hybrid Backend)

Debugging and Visualization Flags

On this page