This document describes Docling's plugin architecture, which enables extensibility through custom implementations of core components. The plugin system allows developers to register and use custom models for OCR, layout detection, table structure recognition, vision-language processing, and enrichment tasks without modifying Docling's core codebase.
For information about configuring specific model types, see Configuration and Pipeline Options. For details on the built-in model implementations, see AI/ML Models.
Sources: pyproject.toml70 pyproject.toml85-86 CHANGELOG.md107
Docling's plugin system is built on the pluggy library and Python's entry point mechanism. It uses a factory pattern where plugins register implementations that factories can instantiate based on configuration.
Diagram 1: Plugin System Architecture
The plugin system operates through these layers:
pyproject.toml filesSources: pyproject.toml70 pyproject.toml85-86 CHANGELOG.md107
Plugins register themselves using Python's entry point mechanism. The entry point group name is docling.
Docling's built-in models are registered through the default plugin:
This entry point references the module at docling.models.plugins.defaults, which registers all built-in model implementations.
Sources: pyproject.toml85-86
External packages can register plugins by declaring similar entry points in their own pyproject.toml:
The plugin module must implement the required hook specifications that pluggy expects. When installed, the plugin is automatically discovered by Docling's plugin manager.
Sources: pyproject.toml85-86 CHANGELOG.md832
Docling provides extension points for five primary model categories. Each category follows a factory pattern where plugins register implementations that factories can instantiate.
Diagram 2: Plugin Extension Points and Built-in Implementations
Sources: CHANGELOG.md107 CHANGELOG.md272 CHANGELOG.md543 CHANGELOG.md62
OCR plugins provide text extraction capabilities. The system supports multiple OCR engine implementations through the AutoOCR selector, which automatically chooses the best available engine.
Built-in implementations:
RapidOCR (default): Fast GPU-accelerated OCREasyOCR: Alternative OCR engineTesseract: Open-source OCR engine with PSM options supportOCRMac: macOS-native OCR (macOS only)AutoOCR: Automatic engine selection based on availabilityConfiguration is done through OcrOptions which specifies the engine type and parameters.
Sources: CHANGELOG.md272 CHANGELOG.md273 CHANGELOG.md186
Layout model plugins detect document structure (paragraphs, headings, tables, figures, etc.). Multiple implementations can be registered and selected via LayoutOptions.
Built-in implementations:
Heron (default): 258M parameter layout detection modelRT-DETR: Alternative layout detection approachEgret: Additional layout model optionThe factory pattern allows runtime selection: LayoutOptions(model="heron") or LayoutOptions(model="rt-detr").
Sources: CHANGELOG.md386 CHANGELOG.md543
Table structure plugins parse detected table regions into cell structures. Plugins must handle both cell detection and structure recognition (rows, columns, spans).
Built-in implementations:
TableFormer: Default implementation using OTSL (Optimized Table-Structure Language) tokenizationTableCropsLayoutModel: Experimental alternative approachConfiguration uses TableStructureOptions with parameters like do_cell_matching and mode.
Sources: CHANGELOG.md106 CHANGELOG.md53
VLM plugins provide vision-language model capabilities for document understanding. The system supports multiple backends (Transformers, MLX, vLLM, API-based).
Built-in implementations:
GraniteDocling: IBM's granite-docling modelSmolDocling: Smaller VLM optionPhi-4, Pixtral, DeepSeek-OCR: Additional VLM optionsVLM plugins can be inline (locally executed) or API-based (remote inference).
Sources: CHANGELOG.md343 CHANGELOG.md29 CHANGELOG.md62 CHANGELOG.md545
Enrichment plugins add metadata and classifications to document elements (code blocks, formulas, picture descriptions).
Built-in implementations:
Sources: CHANGELOG.md11 CHANGELOG.md446
The plugin discovery process occurs during Docling initialization. The following diagram shows the loading sequence:
Diagram 3: Plugin Discovery and Loading Sequence
docling groupThe CLI provides a command to inspect registered plugins:
This command displays:
Sources: CHANGELOG.md56
To create a custom plugin, follow this pattern:
Create a Python package with the following structure:
my_docling_plugin/
├── pyproject.toml
├── src/
│ └── my_plugin/
│ ├── __init__.py
│ └── docling_plugin.py
In pyproject.toml, declare the entry point:
The plugin module must implement hook functions that register model implementations. The exact hook specifications depend on the plugin type.
For example, an OCR plugin would register an OCR engine class, a layout plugin would register a layout model class, etc. The registration typically happens through function calls to the plugin manager.
Install the plugin package:
Verify registration:
Your plugin should appear in the output.
Sources: pyproject.toml85-86 CHANGELOG.md56
Once plugins are registered, they can be selected through configuration objects. Each plugin type uses specific option classes:
| Plugin Type | Configuration Class | Selection Method |
|---|---|---|
| OCR | OcrOptions | Specify engine parameter |
| Layout | LayoutOptions | Specify model parameter |
| Table | TableStructureOptions | Specify mode parameter |
| VLM | VlmOptions | Specify model spec and backend |
| Enrichment | ConvertPipelineOptions | Enable via flags |
Example configuration selecting specific plugins:
The factory layer uses these options to instantiate the selected plugin implementations.
Sources: CHANGELOG.md543 CHANGELOG.md53
The plugin system integrates with Docling's factory pattern. Factories are responsible for:
Diagram 4: Factory Pattern with Plugin Registry
Factories abstract the instantiation logic, allowing pipelines to work with any registered implementation without modification.
Sources: CHANGELOG.md107
The Docling ecosystem includes community-developed plugins:
A plugin providing SuryaOCR integration as an alternative OCR engine. Installation:
Once installed, it can be selected via configuration:
Sources: CHANGELOG.md137
Plugins can leverage hardware acceleration through AcceleratorOptions. The factory layer passes acceleration settings to plugin implementations:
Plugin implementations must respect these settings when initializing models. For CUDA-based models, this typically involves moving the model to the specified device. For MPS (Apple Silicon), it uses Metal Performance Shaders. For XPU (Intel GPUs), it uses Intel Extension for PyTorch.
Sources: CHANGELOG.md47 CHANGELOG.md16
Plugins are installed as Python packages with their own dependencies. Docling uses pluggy as the core dependency for the plugin system:
Plugins declare their dependencies in their own pyproject.toml, keeping them isolated from Docling's core dependencies. This allows plugins to use different versions of ML frameworks or introduce new dependencies without affecting Docling's core.
Optional dependencies for built-in model implementations are declared as extras:
Sources: pyproject.toml70 pyproject.toml93-112
Plugin implementations must be thread-safe when used in concurrent contexts, particularly in the ThreadedStandardPdfPipeline. The plugin system itself is thread-safe, but individual plugin implementations must handle concurrent access appropriately.
For models with non-thread-safe backends (e.g., pypdfium2), Docling provides synchronization primitives. Plugins should document their thread-safety characteristics and use appropriate locking mechanisms if needed.
Sources: CHANGELOG.md202
The Docling plugin system provides a flexible, standardized way to extend functionality:
AcceleratorOptionsThe architecture enables both built-in model variety and community-developed extensions without modifying Docling's core codebase.
Sources: pyproject.toml70 pyproject.toml85-86 CHANGELOG.md107 CHANGELOG.md56
Refresh this wiki