This page documents the post-processing and enrichment stages that occur after the core pipeline stages (preprocessing, OCR, layout detection, table structure) have completed. These stages refine the extracted structure and add semantic annotations to enhance the output document.
For information about the core pipeline stages that precede these steps, see Standard PDF Pipeline. For details on the threaded execution model, see Threaded Pipeline Architecture.
The following diagram shows the sequence of post-processing and enrichment operations:
Analysis: After core pipeline stages complete (preprocess, OCR, layout, table structure), the StandardPdfPipeline performs layout postprocessing to clean up overlapping clusters, then applies reading order prediction, and finally constructs the DoclingDocument. The enrichment phase operates on the fully assembled document, sequentially applying picture classification, picture description, chart extraction, and optional code/formula detection models. Each enrichment model processes elements in batches via the BasePipeline._enrich_document() orchestrator.
Sources: docling/pipeline/base_pipeline.py102-124 docling/pipeline/standard_pdf_pipeline.py
The ReadingOrderModel reorders document elements according to natural reading flow, merges related elements, and associates captions and footnotes with their parent elements.
Key Operations:
| Operation | Description | Code Reference |
|---|---|---|
predict_reading_order() | Sorts elements spatially (top-to-bottom, left-to-right within pages) | docling/models/stages/reading_order/readingorder_model.py410-412 |
predict_to_captions() | Associates captions with figures, tables, code blocks | docling/models/stages/reading_order/readingorder_model.py413-415 |
predict_to_footnotes() | Associates footnotes with parent elements | docling/models/stages/reading_order/readingorder_model.py416-418 |
predict_merges() | Identifies text elements that should be merged (e.g., hyphenation) | docling/models/stages/reading_order/readingorder_model.py419-421 |
The model handles several special cases during document construction:
GroupLabel.LIST containers (docling/models/stages/reading_order/readingorder_model.py351-359)ContentLayer.FURNITURE instead of BODY (docling/models/stages/reading_order/readingorder_model.py374-383)Sources: docling/models/stages/reading_order/readingorder_model.py42-431
Layout postprocessing refines layout predictions by resolving overlapping clusters, mapping text cells to clusters, and merging hierarchical elements. The LayoutPostprocessor uses spatial indexing (R-tree and interval trees) for efficient overlap detection.
The postprocessor uses three complementary indexes for efficient spatial queries:
Key Classes:
| Class | Purpose | Code Reference |
|---|---|---|
SpatialClusterIndex | Maintains R-tree + interval trees for clusters | docling/utils/layout_postprocessor.py52-108 |
IntervalTree | 1D interval overlap queries using binary search | docling/utils/layout_postprocessor.py124-155 |
UnionFind | Groups overlapping clusters efficiently | docling/utils/layout_postprocessor.py19-49 |
Sources: docling/utils/layout_postprocessor.py52-155
The overlap resolution algorithm uses type-specific thresholds to determine which clusters to merge or remove:
Overlap Detection Logic:
Type-Specific Parameters:
| Cluster Type | Area Threshold | Confidence Threshold | Code Reference |
|---|---|---|---|
| Regular (TEXT, SECTION_HEADER, etc.) | 1.3 | 0.05 | docling/utils/layout_postprocessor.py162 |
| Picture (PICTURE) | 2.0 | 0.3 | docling/utils/layout_postprocessor.py163 |
| Wrapper (FORM, TABLE, KEY_VALUE_REGION) | 2.0 | 0.2 | docling/utils/layout_postprocessor.py164 |
Sources: docling/utils/layout_postprocessor.py157-298
After overlap resolution, the postprocessor maps text cells (word/line-level) to their containing clusters:
Mapping Algorithm:
cluster.bbox.contains(cell.bbox)Special Cases:
Sources: docling/utils/layout_postprocessor.py300-400
The final postprocessing step merges hierarchically related clusters (e.g., FORM contains child clusters):
Wrapper Types:
DocItemLabel.FORMDocItemLabel.KEY_VALUE_REGIONDocItemLabel.TABLEDocItemLabel.DOCUMENT_INDEXThese clusters act as containers that group related child clusters. During hierarchy merging, child clusters are moved into the children list of their parent wrapper, and the wrapper's bounding box is expanded to fully contain all children.
Sources: docling/utils/layout_postprocessor.py402-500
The enrichment pipeline processes the assembled DoclingDocument to add semantic annotations. Each enrichment model implements the GenericEnrichmentModel interface and processes elements in batches.
Core Methods:
| Method | Purpose | Returns |
|---|---|---|
is_processable() | Filters elements that this model can process | bool |
prepare_element() | Prepares element for batch processing (may crop images) | Optional[EnrichElementT] |
__call__() | Processes batch of prepared elements | Iterable[NodeItem] |
Sources: docling/models/base_model.py150-231
The BasePipeline._enrich_document() method orchestrates enrichment execution:
The pipeline processes elements sequentially through each model, allowing later models to benefit from annotations added by earlier ones.
Sources: docling/pipeline/base_pipeline.py100-122
The CodeFormulaModel is an optional enrichment model that identifies and annotates code blocks and mathematical formulas. Unlike other enrichment models, it requires access to the PDF backend for text extraction and is therefore only available in PDF processing pipelines.
Limitations:
do_code_enrichment or do_formula_enrichment is enabledDoclingDocument objects without the original PDFConfiguration:
Model Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled | bool | - | Computed from do_code_enrichment or do_formula_enrichment |
do_code_enrichment | bool | True | Detect code blocks |
do_formula_enrichment | bool | True | Detect mathematical formulas |
accelerator_options | AcceleratorOptions | - | Device and threading config |
Integration Note:
When either enrichment flag is enabled, the pipeline sets self.keep_backend = True to preserve PDF backends through the enrichment phase. However, this conflicts with the memory optimization strategy in StandardPdfPipeline, which clears backends immediately after page assembly.
Sources: docling/datamodel/pipeline_options.py635-648 docling/cli/main.py480-487
The DocumentPictureClassifier categorizes pictures into 26 classes (bar chart, line chart, geographical map, etc.) using an engine-based inference architecture that supports multiple backends (Transformers, ONNX Runtime).
Engine Creation:
The classifier uses a factory pattern to create the appropriate inference engine:
Inference Flow:
ImageClassificationEngineInput listengine.predict_batch(input_batch) → List[ImageClassificationEngineOutput]Configuration:
| Option | Default | Description |
|---|---|---|
repo_id | "ds4sd/docling-models" | HuggingFace model repository |
revision | (pinned) | Specific model commit |
repo_cache_folder | Model-specific | Local cache directory |
model_spec | Preset-based | Model architecture specification |
engine_options | BaseImageClassificationEngineOptions | Backend-specific config |
Output Format:
Available Classes:
The model predicts 26 picture types including: abstract_painting, bar_chart, line_chart, pie_chart, scatter_plot, flowchart, diagram, organizational_chart, geographical_map, photograph_nature, photograph_people, etc.
Sources: docling/models/stages/picture_classifier/document_picture_classifier.py64-170 docling/models/inference_engines/image_classification/__init__.py
The ChartExtractionModelGraniteVision converts bar charts, pie charts, and line charts into tabular CSV format using the Granite Vision 3.3 2B model. This enrichment runs after picture classification to filter processable charts.
Supported Chart Types:
Only pictures with meta.classification.get_main_prediction().class_name in this list are processed.
Model Configuration:
| Parameter | Value |
|---|---|
| Model | ibm-granite/granite-vision-3.3-2b-chart2csv-preview |
| Revision | 6e1fbaae4604ecc85f4f371416d82154ca49ad67 (pinned) |
| Prompt | "Convert the information in this chart into a data table in CSV format." |
| Device | CPU or CUDA |
Processing Details:
Output Structure:
Integration:
Chart extraction is configured via PdfPipelineOptions:
Sources: docling/models/stages/chart_extraction/granite_vision.py36-280 docling/pipeline/base_pipeline.py149-183
Picture description models generate natural language captions for images using Vision-Language Models (VLMs). Docling supports both inline VLMs (run locally) and API-based VLMs (remote services).
Picture description models support sophisticated filtering to control which pictures receive captions:
| Filter Option | Type | Default | Description |
|---|---|---|---|
picture_area_threshold | float | 0.0 | Minimum page area fraction (0.0 = no threshold) |
classification_allow | Optional[List[PictureClassificationLabel]] | None | Whitelist of allowed classes |
classification_deny | Optional[List[PictureClassificationLabel]] | None | Blacklist of denied classes |
classification_min_confidence | float | 0.0 | Minimum classification confidence |
Filter Evaluation Logic:
Example Configuration:
Sources: docling/models/picture_description_base_model.py49-104
Inline VLMs (see Inline VLM Models for details):
API-based VLMs (see API-Based VLM Models for details):
Output Format:
Picture descriptions are stored in two locations:
annotations list): PictureDescriptionData(text=..., provenance=...)meta.description): DescriptionMetaField(text=..., created_by=...)Sources: docs/examples/pictures_description_api.py40-183 docs/examples/pictures_description.ipynb
Developers can create custom enrichment models by extending BaseEnrichmentModel or BaseItemAndImageEnrichmentModel.
This example demonstrates the minimal structure for a custom enrichment model:
Integrate into Pipeline:
Sources: docs/examples/develop_picture_enrichment.py1-128
Enrichment models can also process previously converted documents:
Key Points:
BaseItemAndImageEnrichmentModel)prepare_element() helper crops images with expansion_factor and images_scaleDoclingDocumentSources: docs/examples/enrich_doclingdocument.py1-154
For models that need to process cropped images from the document, extend BaseItemAndImageEnrichmentModel:
The base class automatically handles:
expansion_factor to include surrounding contextimages_scale for higher resolutionSources: docling/models/base_model.py179-231
Refresh this wiki