DOCX Skill

Relevant source files

This page is the full technical reference for the docx skill located at skills/docx/. It covers: programmatic document creation via docx-js, the XML-based unpack/edit/repack editing workflow, how accept_changes.py accepts tracked changes via a LibreOffice StarBasic macro, how comment.py manages the four-file Open XML comment chain, and the pandoc/LibreOffice conversion paths.

For an overview of how all four document skills share a common architecture, see Document Skills. For the SKILL.md format specification, see SKILL.md Format Specification.

Skill Identity

Field	Value
`name`	`docx`
File	`skills/docx/SKILL.md`
License	Proprietary (see `LICENSE.txt`)
Trigger scope	Word documents: create, read, edit, convert, tracked changes, comments
Excluded	PDFs, spreadsheets, Google Docs, general coding tasks

The description field in the SKILL.md frontmatter drives invocation matching. It covers terms like "Word doc", ".docx", "tracked changes", "report", "memo", "letter", "template", and similar document deliverables.

Sources: skills/docx/SKILL.md1-5

Capability Map

The following diagram maps each user-facing capability to the specific tool or script that implements it.

Diagram: DOCX Skill — Capabilities to Code Entities

Sources: skills/docx/SKILL.md14-55

Creating New Documents with `docx-js`

New documents are generated by writing a JavaScript file that uses the docx npm package, then executing it with Node.js. The output is a .docx buffer written to disk.

Install: npm install -g docx

Core Entry Point

After generating, validate with scripts/office/validate.py.

Key Components and Rules

Diagram: docx-js Component Hierarchy

Sources: skills/docx/SKILL.md56-395

Critical Rules Summary

The SKILL.md contains specific rules that prevent common rendering failures:

Rule	Consequence if broken
Always set `page.size` explicitly	docx-js defaults to A4, not US Letter
Landscape: pass portrait dimensions	docx-js swaps `width`/`height` internally
Never use `\n` in `TextRun`	Use separate `Paragraph` elements
Never use unicode bullets (`•`)	Use `LevelFormat.BULLET` with `numbering.config`
`PageBreak` must be inside a `Paragraph`	Standalone `PageBreak` creates invalid XML
`ImageRun` requires `type` field	`"png"`, `"jpg"`, etc.
Always use `WidthType.DXA` for tables	`WidthType.PERCENTAGE` breaks in Google Docs
Tables need dual widths: `columnWidths` array AND cell `width`	Incorrect rendering on some platforms
Table `width` must equal sum of `columnWidths`	Layout breaks
Use `ShadingType.CLEAR`, never `SOLID`	Black cell backgrounds
Heading styles must use `outlineLevel`	Required for `TableOfContents` to work

Sources: skills/docx/SKILL.md378-395

Page Size Reference

Dimensions are in DXA units (1440 DXA = 1 inch).

Paper	Width	Height	Content Width (1" margins)
US Letter	12,240	15,840	9,360
A4 (default)	11,906	16,838	9,026

Sources: skills/docx/SKILL.md82-114

Editing Existing Documents: Unpack / Edit / Repack

A .docx file is a ZIP archive. The edit workflow unpacks it to a directory of XML files, edits them directly, then repacks.

Diagram: XML Editing Workflow

Commands:

Sources: skills/docx/SKILL.md398-443

Auto-Repair Behaviors

pack.py performs two auto-repairs before packing:

Issue	Fix Applied
`durableId` >= `0x7FFFFFFF`	Regenerates a valid ID
Missing `xml:space="preserve"` on `<w:t>` with whitespace	Adds the attribute

Auto-repair does not fix: malformed XML, invalid element nesting, missing relationships, or schema violations.

Sources: skills/docx/SKILL.md441-448

Tracked Changes XML Patterns

Tracked changes use <w:ins> and <w:del> wrapper elements. The SKILL.md provides patterns for the common cases:

Operation	XML Element
Insert text	`<w:ins w:author="Claude" ...><w:r>...</w:r></w:ins>`
Delete text	`<w:del w:author="Claude" ...><w:r><w:delText>...</w:delText></w:r></w:del>`
Delete entire paragraph	Requires `<w:del/>` inside `<w:pPr><w:rPr>` to merge the paragraph mark
Reject another author's insertion	Nest `<w:del>` inside their `<w:ins>`
Restore another author's deletion	Add `<w:ins>` after their `<w:del>` (do not modify their element)

Inside <w:del>: use <w:delText> instead of <w:t>, and <w:delInstrText> instead of <w:instrText>.

Sources: skills/docx/SKILL.md463-528

Accepting Tracked Changes: `accept_changes.py`

scripts/accept_changes.py uses LibreOffice in headless mode to accept all tracked changes in a DOCX file, producing a clean output file.

How It Works

Diagram: accept_changes.py Execution Flow

Constants and Paths

Symbol	Value
`LIBREOFFICE_PROFILE`	`/tmp/libreoffice_docx_profile`
`MACRO_DIR`	`/tmp/libreoffice_docx_profile/user/basic/Standard`
Macro file	`Module1.xba`
Macro entry point	`AcceptAllTrackedChanges` (StarBasic `Sub`)

The StarBasic Macro

The macro embedded in ACCEPT_CHANGES_MACRO (skills/docx/scripts/accept_changes.py19-33) does three things:

Gets the current document frame via ThisComponent.CurrentController.Frame
Dispatches .uno:AcceptAllTrackedChanges via com.sun.star.frame.DispatchHelper
Calls ThisComponent.store() then ThisComponent.close(True)

LibreOffice is invoked with the macro as a URL argument:

vnd.sun.star.script:Standard.Module1.AcceptAllTrackedChanges?language=Basic&location=application

The script uses a 30-second timeout. A TimeoutExpired exception is treated as success because LibreOffice commonly exits after running the macro without returning a clean exit code.

Sources: skills/docx/scripts/accept_changes.py1-136

Comment Management: `comment.py`

Adding comments to DOCX requires coordinated updates across four XML files. scripts/comment.py handles all of this, exposing a single add_comment() function.

The Four-File Comment Chain

Diagram: comment.py — Four-File XML Chain

Sources: skills/docx/scripts/comment.py218-290

Identifiers Generated Per Comment

Each comment requires two randomly generated hex IDs:

Variable	Format	Used In
`para_id`	8-digit hex (`_generate_hex_id()`)	`w14:paraId` in `comments.xml`; `w15:paraId` in `commentsExtended.xml`; `w16cid:paraId` in `commentsIds.xml`
`durable_id`	8-digit hex (`_generate_hex_id()`)	`w16cid:durableId` in `commentsIds.xml`; `w16cex:durableId` in `commentsExtensible.xml`

Sources: skills/docx/scripts/comment.py230-231 skills/docx/scripts/comment.py68-69

Reply Threading

For a reply comment, parent_id is passed to add_comment(). The function:

Calls _find_para_id(comments, parent_id) to look up the w14:paraId of the parent comment's paragraph
Writes a <w15:commentEx> entry with w15:paraIdParent set to that parent para_id

If parent_id is None, the <w15:commentEx> entry is written without w15:paraIdParent.

Sources: skills/docx/scripts/comment.py254-269

First-Comment Setup

On the first call, if comments.xml does not yet exist, the function:

Copies all four template files from scripts/templates/ into unpacked/word/
Calls _ensure_comment_relationships() — adds four Relationship entries to word/_rels/document.xml.rels (for comments.xml, commentsExtended.xml, commentsIds.xml, commentsExtensible.xml)
Calls _ensure_comment_content_types() — adds four Override entries to [Content_Types].xml

Both setup functions are idempotent: they check for the presence of existing entries before writing.

Sources: skills/docx/scripts/comment.py137-215

Placing Comment Markers in `document.xml`

After comment.py writes the metadata files, comment range markers must be manually placed in document.xml. The script prints the required XML snippet as output.

Critical constraint: <w:commentRangeStart> and <w:commentRangeEnd> must be direct children of <w:p>, never inside <w:r>.

Sources: skills/docx/scripts/comment.py52-65 skills/docx/SKILL.md428-435 skills/docx/SKILL.md532-554

Command-Line Interface

Text must be pre-escaped XML: & for &, ’ for smart apostrophes, etc. The _encode_smart_quotes() function also handles Unicode smart quote characters by converting them to XML character references before writing.

Sources: skills/docx/scripts/comment.py293-318 skills/docx/scripts/comment.py72-83

Reading and Conversion Paths

Task	Command
Extract text (with tracked changes)	`pandoc --track-changes=all document.docx -o output.md`
Raw XML inspection	`python scripts/office/unpack.py document.docx unpacked/`
Convert `.doc` → `.docx`	`python scripts/office/soffice.py --headless --convert-to docx document.doc`
Convert `.docx` → PDF	`python scripts/office/soffice.py --headless --convert-to pdf document.docx`
Convert PDF pages → JPEG images	`pdftoppm -jpeg -r 150 document.pdf page`

All LibreOffice invocations go through scripts/office/soffice.py, which configures the headless environment for sandboxed execution. get_soffice_env() (imported in accept_changes.py) provides the same environment setup for subprocess calls.

Sources: skills/docx/SKILL.md21-55 skills/docx/SKILL.md583-591 skills/docx/scripts/accept_changes.py12-13

Dependencies

Dependency	Role
`docx` (npm)	New document generation via `Document`, `Packer`, etc.
`pandoc`	Text extraction from `.docx`, tracked-change rendering
LibreOffice (`soffice`)	Accepting tracked changes; format conversion; headless PDF export
Poppler (`pdftoppm`)	Converting PDF pages to JPEG images
`defusedxml`	Safe XML parsing in `comment.py`

Sources: skills/docx/SKILL.md585-591 skills/docx/scripts/comment.py23

DOCX Skill

Relevant source files

For an overview of how all four document skills share a common architecture, see Document Skills. For the SKILL.md format specification, see SKILL.md Format Specification.

Skill Identity

Field	Value
`name`	`docx`
File	`skills/docx/SKILL.md`
License	Proprietary (see `LICENSE.txt`)
Trigger scope	Word documents: create, read, edit, convert, tracked changes, comments
Excluded	PDFs, spreadsheets, Google Docs, general coding tasks

Sources: skills/docx/SKILL.md1-5

Capability Map

The following diagram maps each user-facing capability to the specific tool or script that implements it.

Diagram: DOCX Skill — Capabilities to Code Entities

Sources: skills/docx/SKILL.md14-55

Creating New Documents with `docx-js`

New documents are generated by writing a JavaScript file that uses the docx npm package, then executing it with Node.js. The output is a .docx buffer written to disk.

Install: npm install -g docx

Core Entry Point

After generating, validate with scripts/office/validate.py.

Key Components and Rules

Diagram: docx-js Component Hierarchy

Sources: skills/docx/SKILL.md56-395

Critical Rules Summary

The SKILL.md contains specific rules that prevent common rendering failures:

Rule	Consequence if broken
Always set `page.size` explicitly	docx-js defaults to A4, not US Letter
Landscape: pass portrait dimensions	docx-js swaps `width`/`height` internally
Never use `\n` in `TextRun`	Use separate `Paragraph` elements
Never use unicode bullets (`•`)	Use `LevelFormat.BULLET` with `numbering.config`
`PageBreak` must be inside a `Paragraph`	Standalone `PageBreak` creates invalid XML
`ImageRun` requires `type` field	`"png"`, `"jpg"`, etc.
Always use `WidthType.DXA` for tables	`WidthType.PERCENTAGE` breaks in Google Docs
Tables need dual widths: `columnWidths` array AND cell `width`	Incorrect rendering on some platforms
Table `width` must equal sum of `columnWidths`	Layout breaks
Use `ShadingType.CLEAR`, never `SOLID`	Black cell backgrounds
Heading styles must use `outlineLevel`	Required for `TableOfContents` to work

Sources: skills/docx/SKILL.md378-395

Page Size Reference

Dimensions are in DXA units (1440 DXA = 1 inch).

Paper	Width	Height	Content Width (1" margins)
US Letter	12,240	15,840	9,360
A4 (default)	11,906	16,838	9,026

Sources: skills/docx/SKILL.md82-114

Editing Existing Documents: Unpack / Edit / Repack

A .docx file is a ZIP archive. The edit workflow unpacks it to a directory of XML files, edits them directly, then repacks.

Diagram: XML Editing Workflow

Commands:

Sources: skills/docx/SKILL.md398-443

Auto-Repair Behaviors

pack.py performs two auto-repairs before packing:

Issue	Fix Applied
`durableId` >= `0x7FFFFFFF`	Regenerates a valid ID
Missing `xml:space="preserve"` on `<w:t>` with whitespace	Adds the attribute

Auto-repair does not fix: malformed XML, invalid element nesting, missing relationships, or schema violations.

Sources: skills/docx/SKILL.md441-448

Tracked Changes XML Patterns

Tracked changes use <w:ins> and <w:del> wrapper elements. The SKILL.md provides patterns for the common cases:

Operation	XML Element
Insert text	`<w:ins w:author="Claude" ...><w:r>...</w:r></w:ins>`
Delete text	`<w:del w:author="Claude" ...><w:r><w:delText>...</w:delText></w:r></w:del>`
Delete entire paragraph	Requires `<w:del/>` inside `<w:pPr><w:rPr>` to merge the paragraph mark
Reject another author's insertion	Nest `<w:del>` inside their `<w:ins>`
Restore another author's deletion	Add `<w:ins>` after their `<w:del>` (do not modify their element)

Inside <w:del>: use <w:delText> instead of <w:t>, and <w:delInstrText> instead of <w:instrText>.

Sources: skills/docx/SKILL.md463-528

Accepting Tracked Changes: `accept_changes.py`

scripts/accept_changes.py uses LibreOffice in headless mode to accept all tracked changes in a DOCX file, producing a clean output file.

How It Works

Diagram: accept_changes.py Execution Flow

Constants and Paths

Symbol	Value
`LIBREOFFICE_PROFILE`	`/tmp/libreoffice_docx_profile`
`MACRO_DIR`	`/tmp/libreoffice_docx_profile/user/basic/Standard`
Macro file	`Module1.xba`
Macro entry point	`AcceptAllTrackedChanges` (StarBasic `Sub`)

The StarBasic Macro

The macro embedded in ACCEPT_CHANGES_MACRO (skills/docx/scripts/accept_changes.py19-33) does three things:

Gets the current document frame via ThisComponent.CurrentController.Frame
Dispatches .uno:AcceptAllTrackedChanges via com.sun.star.frame.DispatchHelper
Calls ThisComponent.store() then ThisComponent.close(True)

LibreOffice is invoked with the macro as a URL argument:

vnd.sun.star.script:Standard.Module1.AcceptAllTrackedChanges?language=Basic&location=application

The script uses a 30-second timeout. A TimeoutExpired exception is treated as success because LibreOffice commonly exits after running the macro without returning a clean exit code.

Sources: skills/docx/scripts/accept_changes.py1-136

Comment Management: `comment.py`

Adding comments to DOCX requires coordinated updates across four XML files. scripts/comment.py handles all of this, exposing a single add_comment() function.

The Four-File Comment Chain

Diagram: comment.py — Four-File XML Chain

Sources: skills/docx/scripts/comment.py218-290

Identifiers Generated Per Comment

Each comment requires two randomly generated hex IDs:

Variable	Format	Used In
`para_id`	8-digit hex (`_generate_hex_id()`)	`w14:paraId` in `comments.xml`; `w15:paraId` in `commentsExtended.xml`; `w16cid:paraId` in `commentsIds.xml`
`durable_id`	8-digit hex (`_generate_hex_id()`)	`w16cid:durableId` in `commentsIds.xml`; `w16cex:durableId` in `commentsExtensible.xml`

Sources: skills/docx/scripts/comment.py230-231 skills/docx/scripts/comment.py68-69

Reply Threading

For a reply comment, parent_id is passed to add_comment(). The function:

Calls _find_para_id(comments, parent_id) to look up the w14:paraId of the parent comment's paragraph
Writes a <w15:commentEx> entry with w15:paraIdParent set to that parent para_id

If parent_id is None, the <w15:commentEx> entry is written without w15:paraIdParent.

Sources: skills/docx/scripts/comment.py254-269

First-Comment Setup

On the first call, if comments.xml does not yet exist, the function:

Copies all four template files from scripts/templates/ into unpacked/word/
Calls _ensure_comment_relationships() — adds four Relationship entries to word/_rels/document.xml.rels (for comments.xml, commentsExtended.xml, commentsIds.xml, commentsExtensible.xml)
Calls _ensure_comment_content_types() — adds four Override entries to [Content_Types].xml

Both setup functions are idempotent: they check for the presence of existing entries before writing.

Sources: skills/docx/scripts/comment.py137-215

Placing Comment Markers in `document.xml`

After comment.py writes the metadata files, comment range markers must be manually placed in document.xml. The script prints the required XML snippet as output.

Critical constraint: <w:commentRangeStart> and <w:commentRangeEnd> must be direct children of <w:p>, never inside <w:r>.

Sources: skills/docx/scripts/comment.py52-65 skills/docx/SKILL.md428-435 skills/docx/SKILL.md532-554

Command-Line Interface

Sources: skills/docx/scripts/comment.py293-318 skills/docx/scripts/comment.py72-83

Reading and Conversion Paths

Task	Command
Extract text (with tracked changes)	`pandoc --track-changes=all document.docx -o output.md`
Raw XML inspection	`python scripts/office/unpack.py document.docx unpacked/`
Convert `.doc` → `.docx`	`python scripts/office/soffice.py --headless --convert-to docx document.doc`
Convert `.docx` → PDF	`python scripts/office/soffice.py --headless --convert-to pdf document.docx`
Convert PDF pages → JPEG images	`pdftoppm -jpeg -r 150 document.pdf page`

Sources: skills/docx/SKILL.md21-55 skills/docx/SKILL.md583-591 skills/docx/scripts/accept_changes.py12-13

Dependencies

Dependency	Role
`docx` (npm)	New document generation via `Document`, `Packer`, etc.
`pandoc`	Text extraction from `.docx`, tracked-change rendering
LibreOffice (`soffice`)	Accepting tracked changes; format conversion; headless PDF export
Poppler (`pdftoppm`)	Converting PDF pages to JPEG images
`defusedxml`	Safe XML parsing in `comment.py`

Sources: skills/docx/SKILL.md585-591 skills/docx/scripts/comment.py23

DOCX Skill

Skill Identity

Capability Map

Creating New Documents with docx-js

Core Entry Point

Key Components and Rules

Critical Rules Summary

Page Size Reference

Editing Existing Documents: Unpack / Edit / Repack

Auto-Repair Behaviors

Tracked Changes XML Patterns

Accepting Tracked Changes: accept_changes.py

How It Works

Constants and Paths

The StarBasic Macro

Comment Management: comment.py

The Four-File Comment Chain

Identifiers Generated Per Comment

Reply Threading

First-Comment Setup

Placing Comment Markers in document.xml

Command-Line Interface

Reading and Conversion Paths

Dependencies

On this page

DOCX Skill

Skill Identity

Capability Map

Creating New Documents with docx-js

Core Entry Point

Key Components and Rules

Critical Rules Summary

Page Size Reference

Editing Existing Documents: Unpack / Edit / Repack

Auto-Repair Behaviors

Tracked Changes XML Patterns

Accepting Tracked Changes: accept_changes.py

How It Works

Constants and Paths

The StarBasic Macro

Comment Management: comment.py

The Four-File Comment Chain

Identifiers Generated Per Comment

Reply Threading

First-Comment Setup

Placing Comment Markers in document.xml

Command-Line Interface

Reading and Conversion Paths

Dependencies

On this page

Creating New Documents with `docx-js`

Accepting Tracked Changes: `accept_changes.py`

Comment Management: `comment.py`

Placing Comment Markers in `document.xml`

Creating New Documents with `docx-js`

Accepting Tracked Changes: `accept_changes.py`

Comment Management: `comment.py`

Placing Comment Markers in `document.xml`