office-custom
Files
Install
npx skills add alisonaquinas/llm-doc-skills --skill 'office-custom' -g -y
/plugin install doc-skills@llm-skills
npx skills when you only want this skill, or install the bundle once to make every included skill available through the plugin marketplace flow. Browse the full skill bundle repository at github.com/alisonaquinas/llm-doc-skills.Invoke
/doc-skills:office-custom
SKILL.md
name: office-custom description: Use when unpacking, editing, repacking, validating, or converting Office Open XML files (.docx, .pptx, .xlsx)
Office Open XML Utilities
This skill provides the shared scripts used by the docx-custom, pptx-custom,
and xlsx-custom skills to work with Office Open XML (OOXML) files — the ZIP-based
XML format that underlies all modern Microsoft Office documents.
Intent Router
No separate reference files. All workflows are documented inline in this
SKILL.md:
- Unpack/repack an OOXML file →
## Scriptssection (unpack.py,pack.py) - Validate OOXML structure →
## Scriptssection (validate.py) - Convert to PDF via LibreOffice →
## Scriptssection (soffice.py)
Quick Start
Use these commands from the repo root:
# Inspect OOXML package structure
python office-custom/scripts/validate.py document.docx
# Unpack for XML edits
python office-custom/scripts/unpack.py presentation.pptx unpacked/ --merge-runs false
# Repack after edits and preserve source ZIP metadata where possible
python office-custom/scripts/pack.py unpacked/ output.pptx --original presentation.pptx
# Convert to PDF through the LibreOffice wrapper
python office-custom/scripts/soffice.py --headless --convert-to pdf output.pptx
For fragile .pptx files, run pptx-custom/scripts/check_fragility.py before
any unpack/repack or python-pptx save. Some third-party exporters require the
byte-preserving patch workflow instead of normal OOXML repacking.
What Is OOXML?
A .docx, .pptx, or .xlsx file is a ZIP archive containing XML files:
document.docx (ZIP)
├── [Content_Types].xml ← declares every part's MIME type
├── _rels/.rels ← top-level relationships
└── word/
├── document.xml ← main body content
├── styles.xml ← style definitions
├── settings.xml ← document settings
├── _rels/document.xml.rels
└── media/ ← embedded images, etc.
Editing an OOXML file means: unpack → edit XML → repack.
Scripts
All scripts live in office-custom/scripts/ and are runnable from the repo root.
unpack.py — Extract an OOXML file for editing
python office-custom/scripts/unpack.py <source> <dest_dir>
python office-custom/scripts/unpack.py document.docx unpacked/
python office-custom/scripts/unpack.py presentation.pptx unpacked/ --merge-runs false
What it does:
- Extracts the ZIP to a working directory
- Pretty-prints every
.xml/.relsfile (2-space indent, Unix line endings) - Escapes smart-quote characters (
""''–—) to XML entities so tools that rewrite encoding don't corrupt them - For
.docxfiles (optional, default on): merges adjacent<w:r>runs that share identical formatting — makes find-and-replace reliable across run boundaries
Options:
| Option | Default | Description |
|---|---|---|
--merge-runs true|false | true | Merge adjacent same-format runs in .docx |
pack.py — Repack an edited directory back into an OOXML file
python office-custom/scripts/pack.py <unpacked_dir> <output_file> [--original <original>]
python office-custom/scripts/pack.py unpacked/ output.docx --original document.docx
python office-custom/scripts/pack.py unpacked/ output.pptx --original presentation.pptx
What it does:
- Walks the unpacked directory and writes every file into a new ZIP
- Writes
[Content_Types].xmlfirst (OOXML spec requirement) - Applies two auto-repair passes to every XML file:
- durableId fix — regenerates
w:durableIdvalues ≥0x7FFFFFFF(Word rejects these; they appear when content is copy-pasted from other documents) xml:space="preserve"fix — adds the attribute to<w:t>elements whose text has leading/trailing spaces (Word strips the spaces without it)
- durableId fix — regenerates
- Condenses pretty-printed XML back to compact form before writing
- Runs
validate.pyon the output (can be suppressed with--validate false)
Options:
| Option | Default | Description |
|---|---|---|
--original PATH | — | Copy ZIP metadata/comment from the original file |
--validate true|false | true | Run validate.py after packing |
validate.py — Validate an OOXML file
python office-custom/scripts/validate.py document.docx
python office-custom/scripts/validate.py presentation.pptx spreadsheet.xlsx
python office-custom/scripts/validate.py *.docx --quiet
What it does — three checks in order:
- ZIP integrity — can the file be opened as a valid ZIP archive?
- Required parts — are
[Content_Types].xmland_rels/.relspresent? - XML well-formedness — does every
.xml/.relsmember parse without error?
Output:
OK document.docx (14 XML members validated)
FAIL broken.docx
ERROR: XML parse error in word/document.xml: ...
Exit codes: 0 = all passed, 1 = any failure.
Options:
| Option | Description |
|---|---|
--quiet | Suppress per-file output; only print summary |
soffice.py — LibreOffice CLI wrapper
# Used programmatically by other scripts; can also be run directly:
python office-custom/scripts/soffice.py --headless --convert-to pdf document.docx
python office-custom/scripts/soffice.py --headless --convert-to docx document.doc
python office-custom/scripts/soffice.py --headless --convert-to pdf output.pptx
What it does:
- Locates the
sofficebinary across macOS, Linux, and sandboxed environments:- macOS app bundle:
/Applications/LibreOffice.app/Contents/MacOS/soffice - Linux packages:
/usr/bin/soffice,/usr/bin/libreoffice - Snap:
/snap/bin/libreoffice - PATH fallback
- macOS app bundle:
- Creates an isolated temporary user-profile directory so LibreOffice doesn't
need write access to
~/.config/libreoffice(critical in CI / sandboxes) - Drop-in pass-through: any arguments after the script name are forwarded
verbatim to the
sofficebinary
Dependency: LibreOffice must be installed separately.
- macOS:
brew install --cask libreoffice - Ubuntu:
apt install libreoffice - Windows:
winget install --id TheDocumentFoundation.LibreOffice --source winget
Standard Edit Workflow
┌─────────────┐ unpack.py ┌───────────────┐ edit XML ┌────────────────┐
│ input.docx │ ──────────── ▶│ unpacked/ │ ──────────── ▶│ unpacked/ │
│ (ZIP) │ │ word/ │ │ word/ │
└─────────────┘ │ document.xml │ │ document.xml │
│ styles.xml │ │ (modified) │
└───────────────┘ └────────┬───────┘
│
┌───────────────┐ pack.py │
│ output.docx │ ◀──────────────────────┘
│ (ZIP, clean) │ (auto-repair + validate)
└───────────────┘
Step-by-step
# 1. Unpack
python office-custom/scripts/unpack.py document.docx unpacked/
# 2. Edit XML directly — use the Edit tool on files inside unpacked/
# e.g. unpacked/word/document.xml, unpacked/word/styles.xml
# 3. Repack
python office-custom/scripts/pack.py unpacked/ output.docx --original document.docx
# Optional: validate manually
python office-custom/scripts/validate.py output.docx
OOXML Structure Reference
Required parts (all formats)
| Path | Purpose |
|---|---|
[Content_Types].xml | Maps ZIP entry paths to MIME content types |
_rels/.rels | Top-level package relationships (points to main document part) |
Word document (.docx)
| Path | Purpose |
|---|---|
word/document.xml | Main body — paragraphs, tables, runs |
word/styles.xml | Named styles (Normal, Heading 1, etc.) |
word/settings.xml | Document-level settings (compatibility, rsid tracking) |
word/numbering.xml | List/bullet numbering definitions |
word/fontTable.xml | Font declarations |
word/comments.xml | Comment bodies (created by docx-custom/scripts/comment.py) |
word/theme/theme1.xml | Colour and font theme |
word/media/ | Embedded images and other binary assets |
word/_rels/document.xml.rels | Relationships for document.xml |
PowerPoint presentation (.pptx)
| Path | Purpose |
|---|---|
ppt/presentation.xml | Presentation-level metadata and slide list |
ppt/slides/slide1.xml | Individual slide content |
ppt/slideLayouts/slideLayout1.xml | Layout templates |
ppt/slideMasters/slideMaster1.xml | Master slide |
ppt/theme/theme1.xml | Colour and font theme |
ppt/media/ | Embedded images |
Excel workbook (.xlsx)
| Path | Purpose |
|---|---|
xl/workbook.xml | Sheet list and workbook metadata |
xl/worksheets/sheet1.xml | Individual sheet data and formulas |
xl/styles.xml | Cell formatting |
xl/sharedStrings.xml | String table (shared across all cells) |
xl/calcChain.xml | Formula calculation order |
xl/theme/theme1.xml | Colour and font theme |
Common XML Namespaces
| Prefix | URI | Used in |
|---|---|---|
w: | http://schemas.openxmlformats.org/wordprocessingml/2006/main | .docx content |
a: | http://schemas.openxmlformats.org/drawingml/2006/main | Drawing (all formats) |
r: | http://schemas.openxmlformats.org/officeDocument/2006/relationships | Relationships |
p: | http://schemas.openxmlformats.org/presentationml/2006/main | .pptx content |
x: | http://schemas.openxmlformats.org/spreadsheetml/2006/main | .xlsx content |
mc: | http://schemas.openxmlformats.org/markup-compatibility/2006 | Markup compatibility |
w14: | http://schemas.microsoft.com/office/word/2010/wordml | Word 2010+ extensions |
Auto-repair Details
pack.py repairs two common issues automatically:
1. Out-of-range w:durableId
Word assigns durableId values (persistent run identifiers) as 31-bit integers.
Values ≥ 0x7FFFFFFF (2,147,483,648) are invalid and cause Word 2016+ to refuse
to open the file. These appear when content is copy-pasted from malformed documents.
Fix: Replace any out-of-range value with a random valid integer in [1, 0x7FFFFFFE].
2. Missing xml:space="preserve" on <w:t>
The XML spec says parsers may strip leading/trailing whitespace from text nodes
unless xml:space="preserve" is present. Word relies on this for spaces between
runs (e.g. "Hello " + "world").
Fix: Add xml:space="preserve" to any <w:t> whose text starts or ends with a space.
Smart Quote Entities
When editing XML directly, use these XML entities instead of Unicode characters to prevent encoding corruption:
| Character | Entity | Description |
|---|---|---|
" | “ | Left double quotation mark |
" | ” | Right double quotation mark |
' | ‘ | Left single quotation mark |
' | ’ | Right single quotation mark / apostrophe |
– | – | En dash |
— | — | Em dash |
  | Non-breaking space |
unpack.py converts these automatically on extraction; pack.py preserves them.
Dependencies
| Tool | Install | Used by |
|---|---|---|
| LibreOffice | brew install --cask libreoffice / apt install libreoffice | soffice.py |
| Python stdlib | (built-in) | unpack.py, pack.py, validate.py |
No third-party Python packages are required for the office utilities themselves.
See Also
$raw-document— specification-level reference for when the unpack/repack/validate cycle surfaces XML errors that require looking up OOXML or ODF schemas, namespace definitions, or element-level specification details.