This page covers the document-skills plugin bundle: a suite of four skills for processing Word (.docx), Excel (.xlsx), PowerPoint (.pptx), and PDF (.pdf) files. It documents what is common across the suite — bundle registration, the shared Open XML processing workflow, and LibreOffice headless integration. For per-skill technical detail, see DOCX, XLSX, PPTX, and PDF. For information on how this bundle integrates with the plugin system, see Marketplace and Plugin System.
The four skills are registered as the document-skills plugin in `.claude-plugin/marketplace.json11-23 The bundle uses "strict": false resolution, meaning Claude matches skills by name and description rather than exact keyword matching.
Diagram: document-skills Bundle — Marketplace Registration to Skill Paths
Sources: `.claude-plugin/marketplace.json11-23
| Skill | File Format | Internal Structure | Creation | Editing | Unique Scripts |
|---|---|---|---|---|---|
docx | .docx | ZIP + XML | docx-js (npm) | Unpack → XML edit → Repack | accept_changes.py, comment.py |
xlsx | .xlsx | ZIP + XML | XML edit | Unpack → XML edit → Repack | recalc.py |
pptx | .pptx | ZIP + XML | XML edit | Unpack → XML edit → Repack | add_slide.py, thumbnail.py |
pdf | .pdf | Binary | N/A | Read-only extraction | extract_form_structure.py |
| Skill | LibreOffice Usage | Python Libraries |
|---|---|---|
docx | Accept tracked changes, format conversion | — |
xlsx | Recalculate formulas via StarBasic macro | openpyxl |
pptx | Convert slides to images for thumbnails | Pillow / PIL |
pdf | — | pdfplumber |
Sources: `.claude-plugin/marketplace.json` `skills/docx/SKILL.md` `skills/docx/scripts/accept_changes.py`
The .docx, .xlsx, and .pptx formats are ZIP archives. Each skill that edits these files follows the same three-step workflow using scripts in the scripts/office/ module:
office/unpack.py extracts the ZIP into a directory tree of XML files, pretty-prints them, and merges adjacent text runs for easier editing.office/pack.py condenses the XML and assembles a valid output file. It applies auto-repair for common issues (e.g., invalid durableId values, missing xml:space="preserve") and validates the result using office/validate.py.The DOCX skill's SKILL.md documents this pattern explicitly skills/docx/SKILL.md398-442
Diagram: Open XML Processing Pipeline — Scripts and Intermediate Artifacts
Sources: `skills/docx/SKILL.md398-442
Three skills invoke LibreOffice (soffice) in --headless mode to perform operations that require a document rendering engine: accepting tracked changes (DOCX), recalculating formulas (XLSX), and converting slides to images (PPTX).
Each skill uses an isolated LibreOffice user profile stored under /tmp/libreoffice_<skill>_profile to prevent conflicts between concurrent skill invocations. For the DOCX skill, this is defined in `skills/docx/scripts/accept_changes.py16-17:
LIBREOFFICE_PROFILE = "/tmp/libreoffice_docx_profile"
MACRO_DIR = f"{LIBREOFFICE_PROFILE}/user/basic/Standard"
StarBasic macros are written to Module1.xba inside that profile before LibreOffice is launched. The soffice subprocess is invoked with the macro's fully qualified name as the script target.
The office/soffice.py module provides get_soffice_env(), a shared helper used by accept_changes.py skills/docx/scripts/accept_changes.py13 and likely the equivalent scripts in other skills, to set up the process environment for the soffice subprocess.
Diagram: LibreOffice Integration — File System Artifacts and Invocation Pattern
Sources: `skills/docx/scripts/accept_changes.py16-88
| Library | Skill | Purpose |
|---|---|---|
openpyxl | xlsx | Read cell formulas, check formula validity |
pdfplumber | Extract text, lines, and geometry from PDF pages | |
Pillow / PIL | pptx | Composite individual slide images into a thumbnail grid |
These libraries are used in read/analysis paths only; document modifications go through the XML-editing or LibreOffice paths described above.
Sources: [Diagram 4 — Document Skills Shared Architecture (repo overview)]
The document-skills bundle is released under a proprietary, source-available license. Each SKILL.md in the bundle includes the field:
This contrasts with the example-skills bundle, which is licensed under Apache 2.0. See Skills Catalog for the licensing boundary between the two bundles.
Each skill has a dedicated reference page covering its SKILL.md instructions, script internals, and edge cases:
| Page | Skill | Key Topics |
|---|---|---|
| 3.1.1 | DOCX | docx-js creation, XML editing workflow, accept_changes.py, comment.py, pandoc/LibreOffice conversion |
| 3.1.2 | XLSX | recalc.py, StarBasic formula recalculation macro, openpyxl formula checking, JSON error report |
| 3.1.3 | PPTX | add_slide.py (Open XML slide insertion), thumbnail.py (LibreOffice + Pillow grid rendering) |
| 3.1.4 | extract_form_structure.py, pdfplumber heuristics for labels/rules/checkboxes, output JSON schema |
Refresh this wiki
This wiki was recently refreshed. Please wait 4 days to refresh again.