Installation and Setup

Relevant source files

This document covers the installation procedures, dependency management, and environment configuration for MarkItDown. It explains how to install the core package, manage optional dependencies through feature groups, and configure external service integrations. For information about using the command-line interface after installation, see Command Line Interface. For Docker-based deployment, see Docker Deployment.

Prerequisites

MarkItDown requires Python 3.10 or higher. The use of a virtual environment is strongly recommended to avoid dependency conflicts with other Python projects.

Supported Python Versions

Python Version	Support Status
3.10	✓ Supported
3.11	✓ Supported
3.12	✓ Supported
3.13	✓ Supported
< 3.10	✗ Not Supported

Sources: packages/markitdown/pyproject.toml10-24

Virtual Environment Setup

Three standard methods are supported for creating isolated Python environments:

Standard venv (built-in):

Using uv (faster alternative):

Using Anaconda/Miniconda:

Sources: README.md42-65

Installation Methods

Basic Installation with All Dependencies

The simplest installation includes all optional dependencies via the [all] feature group:

This installs the core package plus all optional converters and external service integrations.

Sources: README.md69 packages/markitdown/pyproject.toml37-52

Installing from Source

For development or to use unreleased features:

The -e flag installs in editable mode, allowing code changes to take effect without reinstallation.

Sources: README.md71-75

Verification of Installation

After installation, verify the markitdown command is available:

To verify plugin system functionality:

Sources: README.md79-133

Dependency Management and Feature Groups

Feature Group Architecture

MarkItDown uses a feature group system to organize optional dependencies by file format and service integration. This allows users to install only the dependencies needed for their specific use case, reducing installation size and avoiding unnecessary requirements.

Sources: packages/markitdown/pyproject.toml26-61

Feature Group Reference Table

Feature Group	Dependencies Installed	File Formats Supported
`[all]`	All optional dependencies	All supported formats
`[pptx]`	`python-pptx`	PowerPoint presentations (.pptx)
`[docx]`	`mammoth`, `lxml`	Word documents (.docx)
`[xlsx]`	`pandas`, `openpyxl`	Excel workbooks (.xlsx)
`[xls]`	`pandas`, `xlrd`	Legacy Excel files (.xls)
`[pdf]`	`pdfminer.six`, `pdfplumber`	PDF documents (.pdf)
`[outlook]`	`olefile`	Outlook messages (.msg)
`[audio-transcription]`	`pydub`, `SpeechRecognition`	Audio files (.wav, .mp3)
`[youtube-transcription]`	`youtube-transcript-api`	YouTube video URLs
`[az-doc-intel]`	`azure-ai-documentintelligence`, `azure-identity`	Azure Document Intelligence service

Sources: packages/markitdown/pyproject.toml36-61 README.md97-117

Installing Specific Feature Groups

Install only needed converters by specifying feature groups:

Multiple feature groups can be combined in a single installation command.

Sources: README.md98-117

Missing Dependency Handling

When a converter is invoked but its required dependencies are not installed, the system raises a MissingDependencyException with actionable installation guidance. The exception handling pattern is implemented in each converter's convert() method.

Example from PptxConverter:

The converter checks for the pptx module at import time (packages/markitdown/src/markitdown/converters/_pptx_converter.py18-24) and stores any import exception. During conversion, if the dependency is missing, it raises MissingDependencyException with installation instructions (packages/markitdown/src/markitdown/converters/_pptx_converter.py68-79).

Sources: packages/markitdown/src/markitdown/converters/_pptx_converter.py17-79

Environment Configuration

Azure Document Intelligence Setup

To use Azure Document Intelligence for complex document conversion, configure the service endpoint and credentials:

Method 1: Constructor Parameters

Method 2: Environment Variable

The Azure Document Intelligence resource setup is documented at Microsoft's official documentation. The system uses azure-ai-documentintelligence and azure-identity packages from the [az-doc-intel] feature group.

Sources: README.md135-143 README.md157-165

ExifTool Configuration

For enhanced image metadata extraction, MarkItDown can optionally use the ExifTool command-line utility. If ExifTool is not in the system PATH, specify its location:

Method 1: Constructor Parameter

Method 2: Environment Variable

If ExifTool is not found, the ImageConverter gracefully degrades to extracting only EXIF metadata available through Python libraries.

Sources: README.md24

LLM Integration for Image Captioning

To enable AI-generated image descriptions (for image files and embedded images in PPTX), configure an LLM client:

The LLM parameters are passed through to converters that support image captioning (specifically ImageConverter and PptxConverter). The client must be compatible with OpenAI's API interface.

Sources: README.md167-177 packages/markitdown/src/markitdown/converters/_pptx_converter.py92-130

Environment Variables Summary

Environment Variable	Purpose	Used By
`AZURE_API_KEY`	Azure Document Intelligence API key	`DocumentIntelligenceConverter`
`EXIFTOOL_PATH`	Path to ExifTool executable	`ImageConverter`

Sources: README.md135-177

Post-Installation Plugin Management

After installation, third-party plugins can be discovered and managed through the CLI:

Plugins are disabled by default for security. They register via Python entry points under the markitdown.plugin group. For plugin development details, see Plugin Architecture and Registration.

Sources: README.md119-133

Breaking Changes and Version Compatibility

From version 0.0.1 to 0.1.0, the following breaking changes were introduced:

Feature groups became required: The plain pip install markitdown no longer includes optional dependencies. Use pip install 'markitdown[all]' for backward-compatible behavior.
Binary stream requirement: The convert_stream() method now requires binary file-like objects (opened with mode 'rb'), not text streams. io.BytesIO must be used instead of io.StringIO.
Converter interface changed: The DocumentConverter base class now reads from file-like streams rather than file paths. Plugin developers must update their implementations.

These changes eliminated temporary file creation and improved the modularity of the dependency system.

Sources: README.md10-14

Installation and Setup

Relevant source files

Prerequisites

MarkItDown requires Python 3.10 or higher. The use of a virtual environment is strongly recommended to avoid dependency conflicts with other Python projects.

Supported Python Versions

Python Version	Support Status
3.10	✓ Supported
3.11	✓ Supported
3.12	✓ Supported
3.13	✓ Supported
< 3.10	✗ Not Supported

Sources: packages/markitdown/pyproject.toml10-24

Virtual Environment Setup

Three standard methods are supported for creating isolated Python environments:

Standard venv (built-in):

Using uv (faster alternative):

Using Anaconda/Miniconda:

Sources: README.md42-65

Installation Methods

Basic Installation with All Dependencies

The simplest installation includes all optional dependencies via the [all] feature group:

This installs the core package plus all optional converters and external service integrations.

Sources: README.md69 packages/markitdown/pyproject.toml37-52

Installing from Source

For development or to use unreleased features:

The -e flag installs in editable mode, allowing code changes to take effect without reinstallation.

Sources: README.md71-75

Verification of Installation

After installation, verify the markitdown command is available:

To verify plugin system functionality:

Sources: README.md79-133

Dependency Management and Feature Groups

Feature Group Architecture

Sources: packages/markitdown/pyproject.toml26-61

Feature Group Reference Table

Feature Group	Dependencies Installed	File Formats Supported
`[all]`	All optional dependencies	All supported formats
`[pptx]`	`python-pptx`	PowerPoint presentations (.pptx)
`[docx]`	`mammoth`, `lxml`	Word documents (.docx)
`[xlsx]`	`pandas`, `openpyxl`	Excel workbooks (.xlsx)
`[xls]`	`pandas`, `xlrd`	Legacy Excel files (.xls)
`[pdf]`	`pdfminer.six`, `pdfplumber`	PDF documents (.pdf)
`[outlook]`	`olefile`	Outlook messages (.msg)
`[audio-transcription]`	`pydub`, `SpeechRecognition`	Audio files (.wav, .mp3)
`[youtube-transcription]`	`youtube-transcript-api`	YouTube video URLs
`[az-doc-intel]`	`azure-ai-documentintelligence`, `azure-identity`	Azure Document Intelligence service

Sources: packages/markitdown/pyproject.toml36-61 README.md97-117

Installing Specific Feature Groups

Install only needed converters by specifying feature groups:

Multiple feature groups can be combined in a single installation command.

Sources: README.md98-117

Missing Dependency Handling

Example from PptxConverter:

Sources: packages/markitdown/src/markitdown/converters/_pptx_converter.py17-79

Environment Configuration

Azure Document Intelligence Setup

To use Azure Document Intelligence for complex document conversion, configure the service endpoint and credentials:

Method 1: Constructor Parameters

Method 2: Environment Variable

Sources: README.md135-143 README.md157-165

ExifTool Configuration

For enhanced image metadata extraction, MarkItDown can optionally use the ExifTool command-line utility. If ExifTool is not in the system PATH, specify its location:

Method 1: Constructor Parameter

Method 2: Environment Variable

If ExifTool is not found, the ImageConverter gracefully degrades to extracting only EXIF metadata available through Python libraries.

Sources: README.md24

LLM Integration for Image Captioning

To enable AI-generated image descriptions (for image files and embedded images in PPTX), configure an LLM client:

The LLM parameters are passed through to converters that support image captioning (specifically ImageConverter and PptxConverter). The client must be compatible with OpenAI's API interface.

Sources: README.md167-177 packages/markitdown/src/markitdown/converters/_pptx_converter.py92-130

Environment Variables Summary

Environment Variable	Purpose	Used By
`AZURE_API_KEY`	Azure Document Intelligence API key	`DocumentIntelligenceConverter`
`EXIFTOOL_PATH`	Path to ExifTool executable	`ImageConverter`

Sources: README.md135-177

Post-Installation Plugin Management

After installation, third-party plugins can be discovered and managed through the CLI:

Plugins are disabled by default for security. They register via Python entry points under the markitdown.plugin group. For plugin development details, see Plugin Architecture and Registration.

Sources: README.md119-133

Breaking Changes and Version Compatibility

From version 0.0.1 to 0.1.0, the following breaking changes were introduced:

Feature groups became required: The plain pip install markitdown no longer includes optional dependencies. Use pip install 'markitdown[all]' for backward-compatible behavior.
Binary stream requirement: The convert_stream() method now requires binary file-like objects (opened with mode 'rb'), not text streams. io.BytesIO must be used instead of io.StringIO.
Converter interface changed: The DocumentConverter base class now reads from file-like streams rather than file paths. Plugin developers must update their implementations.

These changes eliminated temporary file creation and improved the modularity of the dependency system.

Sources: README.md10-14

Installation and Setup

Prerequisites

Supported Python Versions

Virtual Environment Setup

Installation Methods

Basic Installation with All Dependencies

Installing from Source

Verification of Installation

Dependency Management and Feature Groups

Feature Group Architecture

Feature Group Reference Table

Installing Specific Feature Groups

Missing Dependency Handling

Environment Configuration

Azure Document Intelligence Setup

ExifTool Configuration

LLM Integration for Image Captioning

Environment Variables Summary

Post-Installation Plugin Management

Breaking Changes and Version Compatibility

On this page

Installation and Setup

Prerequisites

Supported Python Versions

Virtual Environment Setup

Installation Methods

Basic Installation with All Dependencies

Installing from Source

Verification of Installation

Dependency Management and Feature Groups

Feature Group Architecture

Feature Group Reference Table

Installing Specific Feature Groups

Missing Dependency Handling

Environment Configuration

Azure Document Intelligence Setup

ExifTool Configuration

LLM Integration for Image Captioning

Environment Variables Summary

Post-Installation Plugin Management

Breaking Changes and Version Compatibility

On this page