This page documents how Docling integrates with popular AI frameworks and libraries for downstream tasks such as Retrieval-Augmented Generation (RAG), document processing, and agentic applications. The focus is on the integration patterns, export mechanisms, and framework-specific adaptations that enable Docling to work seamlessly with LangChain, LlamaIndex, Haystack, and other AI development frameworks.
For document chunking strategies specific to RAG applications, see Document Chunking. For the Model Control Protocol (MCP) server integration with AI agents, see MCP Server Integration. For general export format details, see Export Formats.
Docling serves as a document parsing and conversion layer that produces structured DoclingDocument objects. These documents can be exported to various formats and consumed by downstream AI frameworks. The integration pattern is designed to be framework-agnostic, with Docling focusing on high-quality document understanding and frameworks focusing on their specific AI/ML workflows.
Sources: mkdocs.yml109-148 README.md29-43
The standard integration pattern involves three stages:
DocumentConverter to parse input documents into DoclingDocument objectsDoclingDocument to a format compatible with the target frameworkThe DoclingDocument class from the docling_core library serves as the central integration point. It provides a unified representation with:
Sources: README.md34-36 docs/index.md40
LangChain integration typically uses Docling as a document loader to populate Document objects for downstream processing. The integration leverages:
DoclingDocument to Markdown text for LangChain's Document classIntegration Pattern:
Sources: mkdocs.yml110-111 README.md38
LlamaIndex integration uses Docling for document parsing with integration into LlamaIndex's Document and node structure:
DoclingDocument content items into LlamaIndex nodesIntegration Pattern:
Sources: mkdocs.yml112 README.md38
Haystack integration positions Docling as a document converter in Haystack's pipeline architecture:
Integration Pattern:
Sources: mkdocs.yml110 README.md38
Crew AI integration uses Docling as a tool within agent workflows:
Sources: mkdocs.yml141 README.md38
Additional frameworks supported through the Docling integration ecosystem:
| Framework | Integration Type | Primary Use Case |
|---|---|---|
| txtai | Document loader | Semantic search and indexing |
| Semantica | Document processor | Knowledge graph construction |
| Bee Agent Framework | Agent tool | Document processing in agent workflows |
| Langflow | Node component | Visual workflow integration |
| Hector | Document parser | Domain-specific document processing |
| Apify | Actor integration | Web scraping and document processing |
| Data Prep Kit | Data pipeline | Large-scale document preparation |
| InstructLab | Knowledge base | Synthetic data generation |
| spaCy | NLP pipeline | Entity extraction and NLP tasks |
| Prodigy | Annotation tool | Document annotation workflows |
Sources: mkdocs.yml137-166 README.md116-117
Sources: README.md73-81 docs/usage/index.md5-20
Sources: mkdocs.yml109-113 README.md38
This pattern demonstrates how a single DoclingDocument can feed multiple frameworks simultaneously:
Sources: README.md34-36
The DoclingDocument class provides multiple export methods optimized for different framework requirements:
| Export Method | Output Format | Primary Use Case | Framework Compatibility |
|---|---|---|---|
export_to_markdown() | Markdown text | Text-based processing, RAG | LangChain, LlamaIndex, Haystack |
export_to_json() | JSON string | Structured data exchange | All frameworks |
export_to_dict() | Python dictionary | Programmatic access | Custom integrations |
export_to_html() | HTML string | Web rendering | Web applications |
export_to_doctags() | DocTags format | Semantic structure | Specialized NLP pipelines |
Each export method supports options to control output formatting:
Framework integrations typically require mapping Docling metadata to framework-specific schemas:
Sources: README.md36 docs/index.md41
For RAG applications, document chunking is a critical integration point. Docling provides chunking capabilities that work with framework-specific chunkers:
See Document Chunking for detailed information on chunking strategies and implementation.
Sources: mkdocs.yml103-106 docs/index.md78
The Docling repository includes working examples demonstrating framework integrations:
| Example | Framework | File Location | Description |
|---|---|---|---|
| RAG with Haystack | Haystack | docs/examples/rag_haystack.ipynb | Complete RAG pipeline with Haystack |
| RAG with LangChain | LangChain | docs/examples/rag_langchain.ipynb | Document loading and RAG with LangChain |
| RAG with LlamaIndex | LlamaIndex | docs/examples/rag_llamaindex.ipynb | Query engine integration with LlamaIndex |
| RAG with Milvus | Vector Store | docs/examples/rag_milvus.ipynb | Vector store integration with Milvus |
| RAG with Qdrant | Vector Store | docs/examples/retrieval_qdrant.ipynb | Retrieval pipeline with Qdrant |
| RAG with Weaviate | Vector Store | docs/examples/rag_weaviate.ipynb | Document indexing with Weaviate |
| RAG with MongoDB | Vector Store | docs/examples/rag_mongodb.ipynb | Atlas Vector Search integration |
| RAG with Azure Search | Vector Store | docs/examples/rag_azuresearch.ipynb | Azure Cognitive Search integration |
| RAG with OpenSearch | Vector Store | docs/examples/rag_opensearch.ipynb | OpenSearch vector store integration |
| Visual Grounding | Multimodal | docs/examples/visual_grounding.ipynb | Spatial provenance for multimodal RAG |
Sources: mkdocs.yml109-135
StandardPdfPipeline for structured documents, VlmPipeline for complex layoutsDocumentConverter for multiple documents efficientlySources: README.md84-85 docs/usage/index.md22-24
Refresh this wiki