This document describes the project configuration defined in packages/markitdown/pyproject.toml It covers the build system, dependency management, optional feature groups, development environments using Hatch, testing configuration, and entry points. For information about CI/CD workflows and automated quality checks, see CI/CD and Quality Assurance. For information about running tests, see Testing Framework.
MarkItDown uses a PEP 517-compliant build system configured via [build-system] in pyproject.toml.
Build System Configuration
The build system is defined at packages/markitdown/pyproject.toml1-3:
| Configuration Key | Value | Purpose |
|---|---|---|
requires | ["hatchling"] | Build backend dependency |
build-backend | "hatchling.build" | Backend module path |
Version Management
Version is dynamically sourced from packages/markitdown/src/markitdown/__about__.py4 which defines __version__ = "0.1.5b2". The [tool.hatch.version] section at packages/markitdown/pyproject.toml68-69 specifies:
This configuration tells hatchling to extract the version from the __version__ variable in __about__.py, enabling single-source versioning.
Source Distribution Configuration
The [tool.hatch.build.targets.sdist] section at packages/markitdown/pyproject.toml112-113 restricts source distributions to include only the core package:
Sources: packages/markitdown/pyproject.toml1-3 packages/markitdown/pyproject.toml68-69 packages/markitdown/pyproject.toml112-113 packages/markitdown/src/markitdown/__about__.py4
[project] Section Configuration
The [project] section is defined at packages/markitdown/pyproject.toml5-34:
| Configuration Key | Value | Line | Notes |
|---|---|---|---|
name | "markitdown" | 6 | PyPI package name |
dynamic | ["version"] | 7 | Version from __about__.py |
description | "Utility tool for converting..." | 8 | One-line summary |
readme | "README.md" | 9 | Long description source |
requires-python | ">=3.10" | 10 | Minimum Python version |
license | "MIT" | 11 | License identifier |
authors | [{"name": "Adam Fourney", "email": "[email protected]"}] | 13-15 | Package maintainer |
Classifiers
Classifiers at packages/markitdown/pyproject.toml16-25 declare:
"Development Status :: 4 - Beta"Project URLs
The [project.urls] section at packages/markitdown/pyproject.toml63-66 defines:
Sources: packages/markitdown/pyproject.toml5-34 packages/markitdown/pyproject.toml63-66
dependencies Array Configuration
The dependencies array at packages/markitdown/pyproject.toml26-34 lists required packages:
| Package Name | Version Constraint | Platform Marker | Purpose |
|---|---|---|---|
beautifulsoup4 | (latest) | All | HTML/XML parsing in HtmlConverter, RssConverter |
requests | (latest) | All | HTTP client for URI downloads |
markdownify | (latest) | All | HTML-to-Markdown conversion |
magika | ~=0.6.1 | All | ML-based file type detection in StreamInfo |
charset-normalizer | (latest) | All | Character encoding detection |
defusedxml | (latest) | All | Secure XML parsing (prevents XXE) |
onnxruntime | <=1.20.1 | Windows only | ONNX runtime for Magika ML model |
Platform-Specific Dependencies
The onnxruntime entry at packages/markitdown/pyproject.toml33 includes an environment marker:
This dependency is only installed on Windows (sys_platform == 'win32') where it's required for Magika's ML-based file detection. On other platforms, Magika uses alternative backends.
Sources: packages/markitdown/pyproject.toml26-34
MarkItDown uses optional dependency groups to allow users to install only the features they need.
Optional Dependency Mapping
The optional dependencies are defined at packages/markitdown/pyproject.toml36-60:
| Feature Group | Installation Command | Dependencies | Converter Support |
|---|---|---|---|
all | pip install markitdown[all] | All optional dependencies | All converters |
pptx | pip install markitdown[pptx] | python-pptx | PowerPoint files (.pptx) |
docx | pip install markitdown[docx] | mammoth~=1.11.0, lxml | Word documents (.docx) |
xlsx | pip install markitdown[xlsx] | pandas, openpyxl | Excel files (.xlsx) |
xls | pip install markitdown[xls] | pandas, xlrd | Legacy Excel files (.xls) |
pdf | pip install markitdown[pdf] | pdfminer.six | PDF documents |
outlook | pip install markitdown[outlook] | olefile | Outlook MSG files |
audio-transcription | pip install markitdown[audio-transcription] | pydub, SpeechRecognition | Audio transcription |
youtube-transcription | pip install markitdown[youtube-transcription] | youtube-transcript-api | YouTube video transcripts |
az-doc-intel | pip install markitdown[az-doc-intel] | azure-ai-documentintelligence, azure-identity | Azure Document Intelligence |
Combining Feature Groups
Multiple feature groups can be installed together:
The [all] group at packages/markitdown/pyproject.toml37-51 includes all optional dependencies for maximum functionality.
Sources: packages/markitdown/pyproject.toml36-60
[project.scripts] Configuration
The [project.scripts] section at packages/markitdown/pyproject.toml71-72 defines console script entry points:
| Script Name | Module Path | Function | Installed Location |
|---|---|---|---|
markitdown | markitdown.__main__ | main | System PATH / virtualenv bin/ |
When the package is installed via pip install markitdown, setuptools creates a markitdown executable that:
markitdown.__main__ modulemain() functionsys.argvThis entry point is used for the command-line interface. For CLI documentation, see Command Line Interface.
Sources: packages/markitdown/pyproject.toml71-72
Hatch Environment Hierarchy
Three Hatch environments are configured in [tool.hatch.envs] sections:
Defined at packages/markitdown/pyproject.toml74-75:
| Property | Value | Purpose |
|---|---|---|
features | ["all"] | Installs all optional dependencies |
| Activation | hatch shell | Interactive development environment |
Defined at packages/markitdown/pyproject.toml77-81:
| Property | Value | Purpose |
|---|---|---|
features | ["all"] | All format converters |
extra-dependencies | ["openai"] | LLM integration testing |
| Execution | hatch test | Runs pytest test suite |
Defined at packages/markitdown/pyproject.toml83-91:
| Property | Value | Purpose |
|---|---|---|
features | ["all"] | All converters for type checking |
extra-dependencies | ["openai", "mypy>=1.0.0"] | Type checking tools |
scripts.check | mypy command | Type check invocation |
| Execution | hatch run types:check | Run mypy on codebase |
MyPy Script Configuration
The check script at packages/markitdown/pyproject.toml91 configures mypy with:
| Flag | Effect |
|---|---|
--install-types | Auto-install missing type stubs |
--non-interactive | No user prompts during execution |
--ignore-missing-imports | Skip errors for untyped dependencies |
{args:src/markitdown tests} | Default targets: package source and tests |
Sources: packages/markitdown/pyproject.toml74-91
Coverage.py Configuration Sections
Defined at packages/markitdown/pyproject.toml93-99:
| Configuration Key | Value | Effect |
|---|---|---|
source_pkgs | ["markitdown", "tests"] | Packages to measure |
branch | true | Enable branch coverage (if/else paths) |
parallel | true | Support parallel pytest execution |
omit | ["src/markitdown/__about__.py"] | Exclude version file from coverage |
Defined at packages/markitdown/pyproject.toml101-103:
| Key | Paths | Purpose |
|---|---|---|
markitdown | ["src/markitdown", "*/markitdown/src/markitdown"] | Normalize package paths across environments |
tests | ["tests", "*/markitdown/tests"] | Normalize test paths across environments |
This path mapping combines coverage from different installation locations (editable install vs. wheel install) into unified metrics.
Defined at packages/markitdown/pyproject.toml105-110:
| Pattern | Example | Reason for Exclusion |
|---|---|---|
"no cov" | # pragma: no cov | Manual exclusion marker |
"if __name__ == .__main__.:" | Main guard in __main__.py | Entry point not tested |
"if TYPE_CHECKING:" | Import blocks for type hints | Only evaluated by type checkers |
These patterns prevent false-negative coverage warnings for code that is intentionally untested or unreachable at runtime.
Sources: packages/markitdown/pyproject.toml93-110
Common Development Commands
| Command | Purpose | Configuration Reference |
|---|---|---|
hatch shell | Activate default environment | 73-74 |
hatch test | Run test suite | 76-80 |
hatch run types:check | Run type checker | 89-90 |
hatch build | Build distributions | 1-3 |
hatch version | Show current version | 67-68 |
Feature-Specific Installation
For development of specific converters:
The -e flag installs in editable mode, allowing code changes to take effect immediately without reinstallation.
Refresh this wiki