This page documents the hierarchical configuration system in Docling, which controls how documents are processed through pipelines and models. The configuration system has three main layers: FormatOption (which defines backend and pipeline selection), PipelineOptions (which configure pipeline-level behavior), and model-specific options (which configure individual AI models like OCR, layout detection, and VLMs).
For information about how formats are detected and routed to pipelines, see Format Detection and Routing. For details on the plugin system that extends model options, see Plugin System.
The Docling configuration system follows a three-tier hierarchy that flows from high-level format decisions down to individual model parameters.
Sources: docling/document_converter.py75-156 docling/datamodel/pipeline_options.py70-900
FormatOption associates an input format with a specific backend and pipeline implementation. Each format has a corresponding FormatOption subclass that defines the processing strategy.
| Class | Pipeline | Backend | Purpose |
|---|---|---|---|
PdfFormatOption | StandardPdfPipeline | DoclingParseDocumentBackend | PDF processing with ML models |
WordFormatOption | SimplePipeline | MsWordDocumentBackend | DOCX processing |
ExcelFormatOption | SimplePipeline | MsExcelDocumentBackend | XLSX processing |
PowerpointFormatOption | SimplePipeline | MsPowerpointDocumentBackend | PPTX processing |
ImageFormatOption | StandardPdfPipeline | ImageDocumentBackend | Image processing |
HTMLFormatOption | SimplePipeline | HTMLDocumentBackend | HTML processing |
MarkdownFormatOption | SimplePipeline | MarkdownDocumentBackend | Markdown processing |
AudioFormatOption | AsrPipeline | NoOpBackend | Audio/video transcription |
LatexFormatOption | SimplePipeline | LatexDocumentBackend | LaTeX processing |
The DocumentConverter initializes format options in its constructor docling/document_converter.py209-257 Each FormatOption is validated using a Pydantic model validator that sets default pipeline_options if not provided docling/document_converter.py79-84
Sources: docling/document_converter.py75-186 docling/datamodel/base_models.py37-42
PipelineOptions is the base class for all pipeline configuration. Different pipelines extend this base with their specific requirements.
Sources: docling/datamodel/pipeline_options.py70-1300
PdfPipelineOptions is the most complex pipeline configuration, used by StandardPdfPipeline for processing PDFs and images with ML models.
Key Fields:
| Field | Type | Default | Description |
|---|---|---|---|
do_ocr | bool | True | Enable OCR processing |
do_table_structure | bool | True | Enable table structure recognition |
ocr_options | OcrOptions | OcrAutoOptions() | OCR engine configuration |
table_structure_options | BaseTableStructureOptions | TableStructureOptions() | Table model configuration |
layout_options | LayoutOptions | LayoutObjectDetectionOptions() | Layout detection configuration |
images_scale | float | 2.0 | Scale factor for processing images |
generate_page_images | bool | False | Whether to generate page images |
generate_picture_images | bool | False | Whether to extract picture images |
The options also include enrichment model configurations:
picture_description_options: For generating picture descriptionspicture_classifier_options: For classifying picture typeschart_extraction_options: For extracting chart datacode_classifier_options: For detecting code blocksformula_classifier_options: For detecting formulasSources: docling/datamodel/pipeline_options.py961-1166
VlmPipelineOptions configures Vision-Language Model pipelines that process documents using multimodal AI models.
Key Fields:
| Field | Type | Description |
|---|---|---|
vlm_options | Union[InlineVlmOptions, ApiVlmOptions] | VLM model configuration |
generate_page_images | bool | Generate page images for VLM input |
images_scale | float | Image scaling factor |
max_image_size | Optional[int] | Maximum image dimension |
The vlm_options field uses a discriminated union to support both inline (local) and API-based (remote) VLM execution.
Sources: docling/datamodel/pipeline_options.py1228-1266
AsrPipelineOptions configures Automatic Speech Recognition for audio and video files.
Key Fields:
| Field | Type | Description |
|---|---|---|
asr_options | InlineAsrOptions | ASR model configuration |
generate_audio_waveform | bool | Generate waveform visualization |
Sources: docling/datamodel/pipeline_options.py1268-1294
Model-specific options configure individual AI models used within pipelines. These inherit from BaseOptions and implement the kind field for discrimination.
Common OCR Fields:
All OCR options share these base fields from OcrOptions docling/datamodel/pipeline_options.py121-145:
lang: List of language codes for OCRforce_full_page_ocr: Force OCR on entire pagebitmap_area_threshold: Minimum bitmap area percentage to trigger OCROCR Engine-Specific Fields:
| Engine | Key Fields | Notes |
|---|---|---|
OcrAutoOptions | None (uses defaults) | Automatically selects available engine |
RapidOcrOptions | backend, text_score, use_det, use_cls, use_rec | Supports onnxruntime, openvino, paddle, torch backends |
EasyOcrOptions | use_gpu, confidence_threshold, recog_network | GPU acceleration support |
TesseractCliOcrOptions | tesseract_cmd, path, psm | CLI-based Tesseract |
TesseractOcrOptions | path, psm | Python bindings (tesserocr) |
OcrMacOptions | recognition, framework | Native macOS Vision framework |
Sources: docling/datamodel/pipeline_options.py121-462
TableStructureOptions configures the TableFormer model for extracting table structure.
Fields:
do_cell_matching: Enable cell content matching (default: True)mode: Processing mode (FAST or ACCURATE)The ACCURATE mode provides higher quality but slower processing, while FAST prioritizes speed over precision docling/datamodel/pipeline_options.py76-119
Sources: docling/datamodel/pipeline_options.py76-119
Layout detection options configure object detection models that identify document structure elements (text blocks, figures, tables, etc.).
LayoutObjectDetectionOptions Fields:
layout_model_spec: Model specification (e.g., EGRET, HERON families)engine_options: Inference engine options (HuggingFace Transformers or ONNX)postprocessor: Configuration for layout postprocessingThe options support model presets that can be selected using the from_preset() method docling/datamodel/pipeline_options.py689-723
Sources: docling/datamodel/pipeline_options.py674-781
VLM (Vision-Language Model) options configure multimodal AI models. Docling supports both inline (local) and API-based (remote) VLM execution.
InlineVlmOptions configures local VLM execution:
repo_id: HuggingFace model repository IDinference_framework: Runtime (MLX for Apple Silicon, TRANSFORMERS for cross-platform, VLLM for high-throughput)load_in_8bit: Enable 8-bit quantizationresponse_format: Expected output format (DOCTAGS, MARKDOWN, HTML, OTSL)trust_remote_code: Allow custom model code executionApiVlmOptions configures remote VLM API calls:
url: API endpoint (OpenAI-compatible)model: Model name at the endpointconcurrency: Maximum concurrent requeststimeout: Request timeout in secondsheaders: Optional HTTP headersSources: docling/datamodel/pipeline_options_vlm_model.py18-372
Enrichment models enhance the document with additional metadata and transformations.
Picture Description Options:
Picture description options support filtering by classification label, so descriptions are only generated for specific picture types docling/datamodel/pipeline_options.py472-486
Other Enrichment Options:
DocumentPictureClassifierOptions: Classify pictures (charts, figures, diagrams, etc.)ChartExtractorOptions: Extract data from bar/pie/line chartsCodeClassifierOptions: Detect and label code blocksFormulaClassifierOptions: Detect and label mathematical formulasSources: docling/datamodel/pipeline_options.py464-673
The preset system provides pre-configured option sets for common use cases, particularly for VLM models.
Using Presets:
The preset system uses a class variable registry docling/datamodel/pipeline_options.py1297-1331 and provides factory methods for creating options from preset IDs docling/datamodel/pipeline_options.py1333-1373
Available VLM Presets:
Presets are defined in docling/datamodel/vlm_model_specs.py and include:
granite_docling: Granite-Docling model with Transformersgranite_docling_mlx: Granite-Docling with MLX (Apple Silicon)granite_docling_api: Granite-Docling via API (vLLM, LM Studio, Ollama)smoldocling_transformers: SmolDocling with Transformersgranite_vision: Granite Vision for chart extractiondeepseek_ocr_lmstudio: DeepSeek-OCR via LM StudioSources: docling/datamodel/pipeline_options.py1297-1373 docling/datamodel/vlm_model_specs.py1-300
Sources: docling/document_converter.py189-257 docling/datamodel/pipeline_options.py274-345
Sources: docling/pipeline/vlm_pipeline.py1-500 docling/datamodel/pipeline_options.py1228-1266
Sources: docling/datamodel/pipeline_options.py97-119
Sources: docling/datamodel/pipeline_options.py545-631
The CLI options in docling/cli/main.py374-850 map to configuration classes:
| CLI Flag | Configuration Class | Field |
|---|---|---|
--ocr / --no-ocr | PdfPipelineOptions | do_ocr |
--force-ocr | OcrOptions | force_full_page_ocr |
--tables / --no-tables | PdfPipelineOptions | do_table_structure |
--ocr-engine | PdfPipelineOptions | ocr_options.kind |
--ocr-lang | OcrOptions | lang |
--table-mode | TableStructureOptions | mode |
--pipeline | FormatOption | pipeline_cls |
--vlm-model | VlmPipelineOptions | vlm_options (preset) |
Sources: docling/cli/main.py374-850
Refresh this wiki