Command Line Interface

Relevant source files

Purpose and Scope

This page documents the docling command-line interface (CLI) for converting documents from various input formats (PDF, DOCX, HTML, images, etc.) into structured output formats. The CLI provides a convenient wrapper around the document conversion pipeline system.

For programmatic usage via Python API, see Python SDK. For details on configuring pipelines and models, see Configuration and Pipeline Options. For information about the underlying document conversion engine, see Core Architecture.

Sources: pyproject.toml86-88 docling/cli/main.py1-164

Overview

The Docling CLI consists of two command-line entry points defined in pyproject.toml86-88:

docling - Main document conversion interface (maps to docling.cli.main:app)
docling-tools - Model management utilities

The primary command is docling convert, which orchestrates document conversion through the DocumentConverter class and exports results in multiple output formats.

Sources: pyproject.toml86-88 docling/cli/main.py158-163

CLI Architecture

Component Structure

Sources: docling/cli/main.py158-163 docling/cli/main.py373-559 docling/document_converter.py189-261

Conversion Flow

Sources: docling/cli/main.py373-767 docling/document_converter.py294-400

docling convert Command

The convert command is the primary interface for document conversion, defined by the convert() function decorated with @app.command(no_args_is_help=True) at docling/cli/main.py373

Command Syntax

Sources: docling/cli/main.py373-559

Input Parameters

Parameter	Type	Default	Description
`source` (positional)	`List[str]`	Required	File paths, directory paths, or URLs to convert
`--from`	`List[InputFormat]`	All formats	Filter input formats to process
`--headers`	`str`	`None`	JSON string of HTTP headers for URL sources

Source Handling: Input sources are processed through _DocumentConversionInput which handles path resolution, format detection via MIME types, and stream creation. See docling/datamodel/document.py441-486 for the input document construction logic.

Sources: docling/cli/main.py375-402 docling/datamodel/document.py441-486

Processing Parameters

Parameter	Type	Default	Description
`--pipeline`	`ProcessingPipeline`	`STANDARD`	Pipeline selection: `STANDARD`, `VLM`, `LEGACY`
`--vlm-model`	`str`	`"granite_docling"`	VLM preset for VLM pipeline (see VLM Models)
`--asr-model`	`AsrModelType`	`WHISPER_TINY`	ASR model for audio/video files
`--ocr / --no-ocr`	`bool`	`True`	Enable/disable OCR processing
`--force-ocr`	`bool`	`False`	Force OCR on all content, replacing existing text
`--ocr-engine`	`str`	`"auto"`	OCR engine selection: `auto`, `rapidocr`, `easyocr`, `tesseract`, `tesserocr`, `ocrmac`
`--ocr-lang`	`str`	`None`	Comma-separated list of OCR languages
`--psm`	`int`	`None`	Tesseract Page Segmentation Mode (0-13)
`--tables / --no-tables`	`bool`	`True`	Enable/disable table structure extraction
`--table-mode`	`TableFormerMode`	`ACCURATE`	Table extraction mode: `FAST`, `ACCURATE`
`--pdf-backend`	`PdfBackend`	`DOCLING_PARSE`	PDF backend: `DOCLING_PARSE`, `PYPDFIUM2`
`--pdf-password`	`str`	`None`	Password for protected PDF documents

Pipeline Selection: The --pipeline parameter determines which processing pipeline class is used. The mapping is defined in docling/datamodel/pipeline_options.py868-887 Options include:

STANDARD: Uses StandardPdfPipeline for PDFs, multi-stage processing with OCR, layout, and table models
VLM: Uses VlmPipeline for vision-language model-based conversion
LEGACY: Uses deprecated legacy pipeline

Sources: docling/cli/main.py410-479 docling/datamodel/pipeline_options.py868-887

Enrichment Parameters

Parameter	Type	Default	Description
`--enrich-code`	`bool`	`False`	Enable code block detection and enrichment
`--enrich-formula`	`bool`	`False`	Enable mathematical formula enrichment
`--enrich-picture-classes`	`bool`	`False`	Enable picture classification (chart, diagram, etc.)
`--enrich-picture-description`	`bool`	`False`	Enable picture description generation via VLM
`--enrich-chart-extraction`	`bool`	`False`	Extract tabular data from bar/pie/line charts

Enrichment models are applied during the post-processing phase after initial conversion. Configuration is handled in docling/cli/main.py658-706 where enrichment options are added to PdfPipelineOptions.

Sources: docling/cli/main.py480-505 docling/cli/main.py658-706

Output Parameters

Parameter	Type	Default	Description
`--to`	`List[OutputFormat]`	`[MARKDOWN]`	Output formats: `md`, `json`, `yaml`, `html`, `html_split_page`, `text`, `doctags`
`--output`	`Path`	`"."`	Output directory for converted files
`--image-export-mode`	`ImageRefMode`	`EMBEDDED`	Image handling: `PLACEHOLDER`, `EMBEDDED`, `REFERENCED`
`--show-layout`	`bool`	`False`	Show bounding boxes in HTML output
`--print-timings`	`bool`	`False`	Print profiling timings to console
`--export-timings`	`bool`	`False`	Export profiling timings to JSON file

The export_documents() function at docling/cli/main.py211-365 handles all output format generation. For each successful conversion, it calls the appropriate export method on the DoclingDocument object:

JSON: document.save_as_json() (line 236-240)
YAML: document.save_as_yaml() (line 243-248)
HTML: document.save_as_html() (line 251-282)
Markdown: document.save_as_markdown() (line 295-300)
Text: document.save_as_markdown(strict_text=True) (line 285-292)
DocTags: document.save_as_doctags() (line 303-306)

Sources: docling/cli/main.py383-409 docling/cli/main.py539-541 docling/cli/main.py211-365

System Parameters

Parameter	Type	Default	Description
`--artifacts-path`	`Path`	`None`	Custom path for model artifacts storage
`--enable-remote-services`	`bool`	`False`	Allow connections to remote model services
`--allow-external-plugins`	`bool`	`False`	Enable third-party OCR/layout/table plugins
`--abort-on-error`	`bool`	`False`	Stop processing on first error
`--verbose` / `-v`	`int`	`0`	Logging verbosity (0=WARNING, 1=INFO, 2=DEBUG)
`--version`	Flag	-	Display version information and exit
`--logo`	Flag	-	Display Docling ASCII art logo and exit

The --artifacts-path parameter sets the settings.artifacts_path value (line 588-591), which is used by model factories to locate downloaded model files. The --verbose parameter configures the logging level (line 560-586).

Sources: docling/cli/main.py506-563 docling/cli/main.py588-591

Format Option Construction

The CLI constructs format-specific options by mapping input format types to their corresponding FormatOption subclasses:

Sources: docling/cli/main.py628-735

Format Option Classes

The CLI uses the following FormatOption subclasses from docling/document_converter.py75-156:

Input Format	FormatOption Class	Backend Class	Pipeline Class
`PDF`	`PdfFormatOption`	`DoclingParseDocumentBackend`	`StandardPdfPipeline` (default) or `VlmPipeline`
`IMAGE`	`ImageFormatOption`	`ImageDocumentBackend`	`StandardPdfPipeline`
`DOCX`	`WordFormatOption`	`MsWordDocumentBackend`	`SimplePipeline`
`XLSX`	`ExcelFormatOption`	`MsExcelDocumentBackend`	`SimplePipeline`
`PPTX`	`PowerpointFormatOption`	`MsPowerpointDocumentBackend`	`SimplePipeline`
`HTML`	`HTMLFormatOption`	`HTMLDocumentBackend`	`SimplePipeline`
`MD`	`MarkdownFormatOption`	`MarkdownDocumentBackend`	`SimplePipeline`
`AUDIO`	`AudioFormatOption`	`NoOpBackend`	`AsrPipeline`
`LATEX`	`LatexFormatOption`	`LatexDocumentBackend`	`SimplePipeline`

Each FormatOption contains:

backend: Backend class for parsing the format
pipeline_cls: Pipeline class for processing
pipeline_options: Format-specific pipeline configuration
backend_options: Format-specific backend configuration (optional)

Sources: docling/document_converter.py75-183 docling/cli/main.py628-732

Plugin System Integration

The CLI provides access to the pluggy-based plugin system for extending OCR, layout, and table structure models:

Sources: docling/cli/main.py114-118 docling/cli/main.py184-208 docling/cli/main.py516-521

Viewing Available Plugins

The show_external_plugins_callback() function at docling/cli/main.py184-208 queries the factory registries and displays external plugins that are not part of the core docling. module namespace.

Sources: docling/cli/main.py184-208 docling/cli/main.py516-530

Conversion Execution Flow

Main Execution Path

Sources: docling/cli/main.py628-767 docling/document_converter.py294-400

Error Handling

The CLI handles conversion errors based on the --abort-on-error flag:

--abort-on-error: Processing stops on first error, exception is raised (line 531-538)
--no-abort-on-error (default): Errors are logged, processing continues for remaining documents

Error information is captured in ConversionResult.errors as a list of ErrorItem objects containing:

component_type: Type of component that failed (e.g., DOCUMENT_BACKEND, MODEL)
module_name: Name of the module/class where error occurred
error_message: Description of the error

See docling/datamodel/base_models.py182-186 for the ErrorItem model definition.

Sources: docling/cli/main.py531-538 docling/cli/main.py352-360 docling/datamodel/base_models.py182-186

Callback Options

Version Information

Displays version information via version_callback() at docling/cli/main.py172-181:

Docling version
Docling Core version
Docling IBM Models version
Docling Parse version
Python implementation and version
Platform information

The version data is collected by the DoclingVersion model defined at docling/datamodel/document.py232-240

Sources: docling/cli/main.py172-181 docling/datamodel/document.py232-240

Logo Display

Displays the Docling ASCII art logo defined at docling/cli/main.py120-155 via the logo_callback() function at docling/cli/main.py166-169

Sources: docling/cli/main.py120-169

Model Artifacts Management

The CLI provides control over model artifact storage and retrieval:

Artifacts Path Configuration

The --artifacts-path parameter specifies where model files are stored. This is set via settings.artifacts_path at docling/cli/main.py588-591 If not specified, defaults to the system-specific cache directory.

Model factories use this path to:

Check for locally cached models
Download missing models from Hugging Face Hub
Store downloaded model files

Remote Services

The --enable-remote-services flag (line 510-515) allows models to connect to remote inference services. This is required when using:

VLM API endpoints (e.g., LM Studio, Ollama, vLLM servers)
Remote OCR services
Cloud-based model inference

Without this flag, only local model execution is permitted.

Sources: docling/cli/main.py506-515 docling/cli/main.py588-591

Usage Examples

Basic PDF Conversion

Format-Specific Conversion

Output Format Selection

Pipeline and Model Selection

Enrichment Options

Advanced Options

Performance Profiling

Sources: docling/cli/main.py373-767

Exit Codes and Status

The CLI returns the following exit codes:

0: All conversions successful
1: Conversion errors occurred (when --abort-on-error is used)
1: No recognizable documents found in input sources

The ConversionStatus enum at docling/datamodel/base_models.py46-53 defines possible conversion states:

PENDING: Not yet started
STARTED: In progress
SUCCESS: Completed successfully
PARTIAL_SUCCESS: Completed with some errors
FAILURE: Failed completely
SKIPPED: Document was skipped (e.g., exceeded size limit)

Conversion success/failure counts are logged at docling/cli/main.py362-364

Sources: docling/cli/main.py362-364 docling/datamodel/base_models.py46-53 docling/document_converter.py378-400

Command Line Interface

Relevant source files

Purpose and Scope

Sources: pyproject.toml86-88 docling/cli/main.py1-164

Overview

The Docling CLI consists of two command-line entry points defined in pyproject.toml86-88:

docling - Main document conversion interface (maps to docling.cli.main:app)
docling-tools - Model management utilities

The primary command is docling convert, which orchestrates document conversion through the DocumentConverter class and exports results in multiple output formats.

Sources: pyproject.toml86-88 docling/cli/main.py158-163

CLI Architecture

Component Structure

Sources: docling/cli/main.py158-163 docling/cli/main.py373-559 docling/document_converter.py189-261

Conversion Flow

Sources: docling/cli/main.py373-767 docling/document_converter.py294-400

docling convert Command

The convert command is the primary interface for document conversion, defined by the convert() function decorated with @app.command(no_args_is_help=True) at docling/cli/main.py373

Command Syntax

Sources: docling/cli/main.py373-559

Input Parameters

Parameter	Type	Default	Description
`source` (positional)	`List[str]`	Required	File paths, directory paths, or URLs to convert
`--from`	`List[InputFormat]`	All formats	Filter input formats to process
`--headers`	`str`	`None`	JSON string of HTTP headers for URL sources

Sources: docling/cli/main.py375-402 docling/datamodel/document.py441-486

Processing Parameters

Parameter	Type	Default	Description
`--pipeline`	`ProcessingPipeline`	`STANDARD`	Pipeline selection: `STANDARD`, `VLM`, `LEGACY`
`--vlm-model`	`str`	`"granite_docling"`	VLM preset for VLM pipeline (see VLM Models)
`--asr-model`	`AsrModelType`	`WHISPER_TINY`	ASR model for audio/video files
`--ocr / --no-ocr`	`bool`	`True`	Enable/disable OCR processing
`--force-ocr`	`bool`	`False`	Force OCR on all content, replacing existing text
`--ocr-engine`	`str`	`"auto"`	OCR engine selection: `auto`, `rapidocr`, `easyocr`, `tesseract`, `tesserocr`, `ocrmac`
`--ocr-lang`	`str`	`None`	Comma-separated list of OCR languages
`--psm`	`int`	`None`	Tesseract Page Segmentation Mode (0-13)
`--tables / --no-tables`	`bool`	`True`	Enable/disable table structure extraction
`--table-mode`	`TableFormerMode`	`ACCURATE`	Table extraction mode: `FAST`, `ACCURATE`
`--pdf-backend`	`PdfBackend`	`DOCLING_PARSE`	PDF backend: `DOCLING_PARSE`, `PYPDFIUM2`
`--pdf-password`	`str`	`None`	Password for protected PDF documents

Pipeline Selection: The --pipeline parameter determines which processing pipeline class is used. The mapping is defined in docling/datamodel/pipeline_options.py868-887 Options include:

STANDARD: Uses StandardPdfPipeline for PDFs, multi-stage processing with OCR, layout, and table models
VLM: Uses VlmPipeline for vision-language model-based conversion
LEGACY: Uses deprecated legacy pipeline

Sources: docling/cli/main.py410-479 docling/datamodel/pipeline_options.py868-887

Enrichment Parameters

Parameter	Type	Default	Description
`--enrich-code`	`bool`	`False`	Enable code block detection and enrichment
`--enrich-formula`	`bool`	`False`	Enable mathematical formula enrichment
`--enrich-picture-classes`	`bool`	`False`	Enable picture classification (chart, diagram, etc.)
`--enrich-picture-description`	`bool`	`False`	Enable picture description generation via VLM
`--enrich-chart-extraction`	`bool`	`False`	Extract tabular data from bar/pie/line charts

Sources: docling/cli/main.py480-505 docling/cli/main.py658-706

Output Parameters

Parameter	Type	Default	Description
`--to`	`List[OutputFormat]`	`[MARKDOWN]`	Output formats: `md`, `json`, `yaml`, `html`, `html_split_page`, `text`, `doctags`
`--output`	`Path`	`"."`	Output directory for converted files
`--image-export-mode`	`ImageRefMode`	`EMBEDDED`	Image handling: `PLACEHOLDER`, `EMBEDDED`, `REFERENCED`
`--show-layout`	`bool`	`False`	Show bounding boxes in HTML output
`--print-timings`	`bool`	`False`	Print profiling timings to console
`--export-timings`	`bool`	`False`	Export profiling timings to JSON file

JSON: document.save_as_json() (line 236-240)
YAML: document.save_as_yaml() (line 243-248)
HTML: document.save_as_html() (line 251-282)
Markdown: document.save_as_markdown() (line 295-300)
Text: document.save_as_markdown(strict_text=True) (line 285-292)
DocTags: document.save_as_doctags() (line 303-306)

Sources: docling/cli/main.py383-409 docling/cli/main.py539-541 docling/cli/main.py211-365

System Parameters

Parameter	Type	Default	Description
`--artifacts-path`	`Path`	`None`	Custom path for model artifacts storage
`--enable-remote-services`	`bool`	`False`	Allow connections to remote model services
`--allow-external-plugins`	`bool`	`False`	Enable third-party OCR/layout/table plugins
`--abort-on-error`	`bool`	`False`	Stop processing on first error
`--verbose` / `-v`	`int`	`0`	Logging verbosity (0=WARNING, 1=INFO, 2=DEBUG)
`--version`	Flag	-	Display version information and exit
`--logo`	Flag	-	Display Docling ASCII art logo and exit

Sources: docling/cli/main.py506-563 docling/cli/main.py588-591

Format Option Construction

The CLI constructs format-specific options by mapping input format types to their corresponding FormatOption subclasses:

Sources: docling/cli/main.py628-735

Format Option Classes

The CLI uses the following FormatOption subclasses from docling/document_converter.py75-156:

Input Format	FormatOption Class	Backend Class	Pipeline Class
`PDF`	`PdfFormatOption`	`DoclingParseDocumentBackend`	`StandardPdfPipeline` (default) or `VlmPipeline`
`IMAGE`	`ImageFormatOption`	`ImageDocumentBackend`	`StandardPdfPipeline`
`DOCX`	`WordFormatOption`	`MsWordDocumentBackend`	`SimplePipeline`
`XLSX`	`ExcelFormatOption`	`MsExcelDocumentBackend`	`SimplePipeline`
`PPTX`	`PowerpointFormatOption`	`MsPowerpointDocumentBackend`	`SimplePipeline`
`HTML`	`HTMLFormatOption`	`HTMLDocumentBackend`	`SimplePipeline`
`MD`	`MarkdownFormatOption`	`MarkdownDocumentBackend`	`SimplePipeline`
`AUDIO`	`AudioFormatOption`	`NoOpBackend`	`AsrPipeline`
`LATEX`	`LatexFormatOption`	`LatexDocumentBackend`	`SimplePipeline`

Each FormatOption contains:

backend: Backend class for parsing the format
pipeline_cls: Pipeline class for processing
pipeline_options: Format-specific pipeline configuration
backend_options: Format-specific backend configuration (optional)

Sources: docling/document_converter.py75-183 docling/cli/main.py628-732

Plugin System Integration

The CLI provides access to the pluggy-based plugin system for extending OCR, layout, and table structure models:

Sources: docling/cli/main.py114-118 docling/cli/main.py184-208 docling/cli/main.py516-521

Viewing Available Plugins

The show_external_plugins_callback() function at docling/cli/main.py184-208 queries the factory registries and displays external plugins that are not part of the core docling. module namespace.

Sources: docling/cli/main.py184-208 docling/cli/main.py516-530

Conversion Execution Flow

Main Execution Path

Sources: docling/cli/main.py628-767 docling/document_converter.py294-400

Error Handling

The CLI handles conversion errors based on the --abort-on-error flag:

--abort-on-error: Processing stops on first error, exception is raised (line 531-538)
--no-abort-on-error (default): Errors are logged, processing continues for remaining documents

Error information is captured in ConversionResult.errors as a list of ErrorItem objects containing:

component_type: Type of component that failed (e.g., DOCUMENT_BACKEND, MODEL)
module_name: Name of the module/class where error occurred
error_message: Description of the error

See docling/datamodel/base_models.py182-186 for the ErrorItem model definition.

Sources: docling/cli/main.py531-538 docling/cli/main.py352-360 docling/datamodel/base_models.py182-186

Callback Options

Version Information

Displays version information via version_callback() at docling/cli/main.py172-181:

Docling version
Docling Core version
Docling IBM Models version
Docling Parse version
Python implementation and version
Platform information

The version data is collected by the DoclingVersion model defined at docling/datamodel/document.py232-240

Sources: docling/cli/main.py172-181 docling/datamodel/document.py232-240

Logo Display

Displays the Docling ASCII art logo defined at docling/cli/main.py120-155 via the logo_callback() function at docling/cli/main.py166-169

Sources: docling/cli/main.py120-169

Model Artifacts Management

The CLI provides control over model artifact storage and retrieval:

Artifacts Path Configuration

Model factories use this path to:

Check for locally cached models
Download missing models from Hugging Face Hub
Store downloaded model files

Remote Services

The --enable-remote-services flag (line 510-515) allows models to connect to remote inference services. This is required when using:

VLM API endpoints (e.g., LM Studio, Ollama, vLLM servers)
Remote OCR services
Cloud-based model inference

Without this flag, only local model execution is permitted.

Sources: docling/cli/main.py506-515 docling/cli/main.py588-591

Usage Examples

Basic PDF Conversion

Format-Specific Conversion

Output Format Selection

Pipeline and Model Selection

Enrichment Options

Advanced Options

Performance Profiling

Sources: docling/cli/main.py373-767

Exit Codes and Status

The CLI returns the following exit codes:

0: All conversions successful
1: Conversion errors occurred (when --abort-on-error is used)
1: No recognizable documents found in input sources

The ConversionStatus enum at docling/datamodel/base_models.py46-53 defines possible conversion states:

PENDING: Not yet started
STARTED: In progress
SUCCESS: Completed successfully
PARTIAL_SUCCESS: Completed with some errors
FAILURE: Failed completely
SKIPPED: Document was skipped (e.g., exceeded size limit)

Conversion success/failure counts are logged at docling/cli/main.py362-364

Sources: docling/cli/main.py362-364 docling/datamodel/base_models.py46-53 docling/document_converter.py378-400

Command Line Interface

Purpose and Scope

Overview

CLI Architecture

Component Structure

Conversion Flow

docling convert Command

Command Syntax

Input Parameters

Processing Parameters

Enrichment Parameters

Output Parameters

System Parameters

Format Option Construction

Format Option Classes

Plugin System Integration

Viewing Available Plugins

Conversion Execution Flow

Main Execution Path

Error Handling

Callback Options

Version Information

Logo Display

Model Artifacts Management

Artifacts Path Configuration

Remote Services

Usage Examples

Basic PDF Conversion

Format-Specific Conversion

Output Format Selection

Pipeline and Model Selection

Enrichment Options

Advanced Options

Performance Profiling

Exit Codes and Status

On this page

Command Line Interface

Purpose and Scope

Overview

CLI Architecture

Component Structure

Conversion Flow

docling convert Command

Command Syntax

Input Parameters

Processing Parameters

Enrichment Parameters

Output Parameters

System Parameters

Format Option Construction

Format Option Classes

Plugin System Integration

Viewing Available Plugins

Conversion Execution Flow

Main Execution Path

Error Handling

Callback Options

Version Information

Logo Display

Model Artifacts Management

Artifacts Path Configuration

Remote Services

Usage Examples

Basic PDF Conversion

Format-Specific Conversion

Output Format Selection

Pipeline and Model Selection

Enrichment Options

Advanced Options

Performance Profiling

Exit Codes and Status

On this page