This page documents the docling command-line interface (CLI) for converting documents from various input formats (PDF, DOCX, HTML, images, etc.) into structured output formats. The CLI provides a convenient wrapper around the document conversion pipeline system.
For programmatic usage via Python API, see Python SDK. For details on configuring pipelines and models, see Configuration and Pipeline Options. For information about the underlying document conversion engine, see Core Architecture.
Sources: pyproject.toml86-88 docling/cli/main.py1-164
The Docling CLI consists of two command-line entry points defined in pyproject.toml86-88:
docling - Main document conversion interface (maps to docling.cli.main:app)docling-tools - Model management utilitiesThe primary command is docling convert, which orchestrates document conversion through the DocumentConverter class and exports results in multiple output formats.
Sources: pyproject.toml86-88 docling/cli/main.py158-163
Sources: docling/cli/main.py158-163 docling/cli/main.py373-559 docling/document_converter.py189-261
Sources: docling/cli/main.py373-767 docling/document_converter.py294-400
The convert command is the primary interface for document conversion, defined by the convert() function decorated with @app.command(no_args_is_help=True) at docling/cli/main.py373
Sources: docling/cli/main.py373-559
| Parameter | Type | Default | Description |
|---|---|---|---|
source (positional) | List[str] | Required | File paths, directory paths, or URLs to convert |
--from | List[InputFormat] | All formats | Filter input formats to process |
--headers | str | None | JSON string of HTTP headers for URL sources |
Source Handling: Input sources are processed through _DocumentConversionInput which handles path resolution, format detection via MIME types, and stream creation. See docling/datamodel/document.py441-486 for the input document construction logic.
Sources: docling/cli/main.py375-402 docling/datamodel/document.py441-486
| Parameter | Type | Default | Description |
|---|---|---|---|
--pipeline | ProcessingPipeline | STANDARD | Pipeline selection: STANDARD, VLM, LEGACY |
--vlm-model | str | "granite_docling" | VLM preset for VLM pipeline (see VLM Models) |
--asr-model | AsrModelType | WHISPER_TINY | ASR model for audio/video files |
--ocr / --no-ocr | bool | True | Enable/disable OCR processing |
--force-ocr | bool | False | Force OCR on all content, replacing existing text |
--ocr-engine | str | "auto" | OCR engine selection: auto, rapidocr, easyocr, tesseract, tesserocr, ocrmac |
--ocr-lang | str | None | Comma-separated list of OCR languages |
--psm | int | None | Tesseract Page Segmentation Mode (0-13) |
--tables / --no-tables | bool | True | Enable/disable table structure extraction |
--table-mode | TableFormerMode | ACCURATE | Table extraction mode: FAST, ACCURATE |
--pdf-backend | PdfBackend | DOCLING_PARSE | PDF backend: DOCLING_PARSE, PYPDFIUM2 |
--pdf-password | str | None | Password for protected PDF documents |
Pipeline Selection: The --pipeline parameter determines which processing pipeline class is used. The mapping is defined in docling/datamodel/pipeline_options.py868-887 Options include:
STANDARD: Uses StandardPdfPipeline for PDFs, multi-stage processing with OCR, layout, and table modelsVLM: Uses VlmPipeline for vision-language model-based conversionLEGACY: Uses deprecated legacy pipelineSources: docling/cli/main.py410-479 docling/datamodel/pipeline_options.py868-887
| Parameter | Type | Default | Description |
|---|---|---|---|
--enrich-code | bool | False | Enable code block detection and enrichment |
--enrich-formula | bool | False | Enable mathematical formula enrichment |
--enrich-picture-classes | bool | False | Enable picture classification (chart, diagram, etc.) |
--enrich-picture-description | bool | False | Enable picture description generation via VLM |
--enrich-chart-extraction | bool | False | Extract tabular data from bar/pie/line charts |
Enrichment models are applied during the post-processing phase after initial conversion. Configuration is handled in docling/cli/main.py658-706 where enrichment options are added to PdfPipelineOptions.
Sources: docling/cli/main.py480-505 docling/cli/main.py658-706
| Parameter | Type | Default | Description |
|---|---|---|---|
--to | List[OutputFormat] | [MARKDOWN] | Output formats: md, json, yaml, html, html_split_page, text, doctags |
--output | Path | "." | Output directory for converted files |
--image-export-mode | ImageRefMode | EMBEDDED | Image handling: PLACEHOLDER, EMBEDDED, REFERENCED |
--show-layout | bool | False | Show bounding boxes in HTML output |
--print-timings | bool | False | Print profiling timings to console |
--export-timings | bool | False | Export profiling timings to JSON file |
The export_documents() function at docling/cli/main.py211-365 handles all output format generation. For each successful conversion, it calls the appropriate export method on the DoclingDocument object:
document.save_as_json() (line 236-240)document.save_as_yaml() (line 243-248)document.save_as_html() (line 251-282)document.save_as_markdown() (line 295-300)document.save_as_markdown(strict_text=True) (line 285-292)document.save_as_doctags() (line 303-306)Sources: docling/cli/main.py383-409 docling/cli/main.py539-541 docling/cli/main.py211-365
| Parameter | Type | Default | Description |
|---|---|---|---|
--artifacts-path | Path | None | Custom path for model artifacts storage |
--enable-remote-services | bool | False | Allow connections to remote model services |
--allow-external-plugins | bool | False | Enable third-party OCR/layout/table plugins |
--abort-on-error | bool | False | Stop processing on first error |
--verbose / -v | int | 0 | Logging verbosity (0=WARNING, 1=INFO, 2=DEBUG) |
--version | Flag | - | Display version information and exit |
--logo | Flag | - | Display Docling ASCII art logo and exit |
The --artifacts-path parameter sets the settings.artifacts_path value (line 588-591), which is used by model factories to locate downloaded model files. The --verbose parameter configures the logging level (line 560-586).
Sources: docling/cli/main.py506-563 docling/cli/main.py588-591
The CLI constructs format-specific options by mapping input format types to their corresponding FormatOption subclasses:
Sources: docling/cli/main.py628-735
The CLI uses the following FormatOption subclasses from docling/document_converter.py75-156:
| Input Format | FormatOption Class | Backend Class | Pipeline Class |
|---|---|---|---|
PDF | PdfFormatOption | DoclingParseDocumentBackend | StandardPdfPipeline (default) or VlmPipeline |
IMAGE | ImageFormatOption | ImageDocumentBackend | StandardPdfPipeline |
DOCX | WordFormatOption | MsWordDocumentBackend | SimplePipeline |
XLSX | ExcelFormatOption | MsExcelDocumentBackend | SimplePipeline |
PPTX | PowerpointFormatOption | MsPowerpointDocumentBackend | SimplePipeline |
HTML | HTMLFormatOption | HTMLDocumentBackend | SimplePipeline |
MD | MarkdownFormatOption | MarkdownDocumentBackend | SimplePipeline |
AUDIO | AudioFormatOption | NoOpBackend | AsrPipeline |
LATEX | LatexFormatOption | LatexDocumentBackend | SimplePipeline |
Each FormatOption contains:
backend: Backend class for parsing the formatpipeline_cls: Pipeline class for processingpipeline_options: Format-specific pipeline configurationbackend_options: Format-specific backend configuration (optional)Sources: docling/document_converter.py75-183 docling/cli/main.py628-732
The CLI provides access to the pluggy-based plugin system for extending OCR, layout, and table structure models:
Sources: docling/cli/main.py114-118 docling/cli/main.py184-208 docling/cli/main.py516-521
The show_external_plugins_callback() function at docling/cli/main.py184-208 queries the factory registries and displays external plugins that are not part of the core docling. module namespace.
Sources: docling/cli/main.py184-208 docling/cli/main.py516-530
Sources: docling/cli/main.py628-767 docling/document_converter.py294-400
The CLI handles conversion errors based on the --abort-on-error flag:
--abort-on-error: Processing stops on first error, exception is raised (line 531-538)--no-abort-on-error (default): Errors are logged, processing continues for remaining documentsError information is captured in ConversionResult.errors as a list of ErrorItem objects containing:
component_type: Type of component that failed (e.g., DOCUMENT_BACKEND, MODEL)module_name: Name of the module/class where error occurrederror_message: Description of the errorSee docling/datamodel/base_models.py182-186 for the ErrorItem model definition.
Sources: docling/cli/main.py531-538 docling/cli/main.py352-360 docling/datamodel/base_models.py182-186
Displays version information via version_callback() at docling/cli/main.py172-181:
The version data is collected by the DoclingVersion model defined at docling/datamodel/document.py232-240
Sources: docling/cli/main.py172-181 docling/datamodel/document.py232-240
Displays the Docling ASCII art logo defined at docling/cli/main.py120-155 via the logo_callback() function at docling/cli/main.py166-169
Sources: docling/cli/main.py120-169
The CLI provides control over model artifact storage and retrieval:
The --artifacts-path parameter specifies where model files are stored. This is set via settings.artifacts_path at docling/cli/main.py588-591 If not specified, defaults to the system-specific cache directory.
Model factories use this path to:
The --enable-remote-services flag (line 510-515) allows models to connect to remote inference services. This is required when using:
Without this flag, only local model execution is permitted.
Sources: docling/cli/main.py506-515 docling/cli/main.py588-591
Sources: docling/cli/main.py373-767
The CLI returns the following exit codes:
--abort-on-error is used)The ConversionStatus enum at docling/datamodel/base_models.py46-53 defines possible conversion states:
PENDING: Not yet startedSTARTED: In progressSUCCESS: Completed successfullyPARTIAL_SUCCESS: Completed with some errorsFAILURE: Failed completelySKIPPED: Document was skipped (e.g., exceeded size limit)Conversion success/failure counts are logged at docling/cli/main.py362-364
Sources: docling/cli/main.py362-364 docling/datamodel/base_models.py46-53 docling/document_converter.py378-400
Refresh this wiki