Alison Aquinas logoAlison's LLM Plugins

office-custom

Included in plugindoc-skillsView on GitHub ↗

Files

SKILL.mdagentsassetsscripts

Install

Install the containing plugin
/plugin install doc-skills@llm-skills
Invoke this skill after installation
/doc-skills:office-custom
Download office-custom-skill.zip
This skill is bundled inside doc-skills. Install the plugin once, then Claude Code can use any of its included skills. Browse the full plugin repository at github.com/alisonaquinas/llm-doc-skills.

SKILL.md


name: office-custom description: Common utilities for unpacking, editing, repacking, and validating Office Open XML files (.docx, .pptx, .xlsx)

Office Open XML Utilities

This skill provides the shared scripts used by the docx-custom, pptx-custom, and xlsx-custom skills to work with Office Open XML (OOXML) files — the ZIP-based XML format that underlies all modern Microsoft Office documents.

Intent Router

No separate reference files. All workflows are documented inline below:

  • Unpack/repack an OOXML file → ## Scripts section (unpack.py, pack.py)
  • Validate OOXML structure → ## Scripts section (validate.py)
  • Convert to PDF via LibreOffice → ## Scripts section (soffice.py)

What Is OOXML?

A .docx, .pptx, or .xlsx file is a ZIP archive containing XML files:

document.docx (ZIP)
├── [Content_Types].xml       ← declares every part's MIME type
├── _rels/.rels               ← top-level relationships
└── word/
    ├── document.xml          ← main body content
    ├── styles.xml            ← style definitions
    ├── settings.xml          ← document settings
    ├── _rels/document.xml.rels
    └── media/                ← embedded images, etc.

Editing an OOXML file means: unpack → edit XML → repack.


Scripts

All scripts live in office-custom/scripts/ and are runnable from the repo root.

unpack.py — Extract an OOXML file for editing

python office-custom/scripts/unpack.py <source> <dest_dir>
python office-custom/scripts/unpack.py document.docx unpacked/
python office-custom/scripts/unpack.py presentation.pptx unpacked/ --merge-runs false

What it does:

  • Extracts the ZIP to a working directory
  • Pretty-prints every .xml / .rels file (2-space indent, Unix line endings)
  • Escapes smart-quote characters (" " ' ' ) to XML entities so tools that rewrite encoding don't corrupt them
  • For .docx files (optional, default on): merges adjacent <w:r> runs that share identical formatting — makes find-and-replace reliable across run boundaries

Options:

OptionDefaultDescription
--merge-runs true|falsetrueMerge adjacent same-format runs in .docx

pack.py — Repack an edited directory back into an OOXML file

python office-custom/scripts/pack.py <unpacked_dir> <output_file> [--original <original>]
python office-custom/scripts/pack.py unpacked/ output.docx --original document.docx
python office-custom/scripts/pack.py unpacked/ output.pptx --original presentation.pptx

What it does:

  • Walks the unpacked directory and writes every file into a new ZIP
  • Writes [Content_Types].xml first (OOXML spec requirement)
  • Applies two auto-repair passes to every XML file:
    1. durableId fix — regenerates w:durableId values ≥ 0x7FFFFFFF (Word rejects these; they appear when content is copy-pasted from other documents)
    2. xml:space="preserve" fix — adds the attribute to <w:t> elements whose text has leading/trailing spaces (Word strips the spaces without it)
  • Condenses pretty-printed XML back to compact form before writing
  • Runs validate.py on the output (can be suppressed with --validate false)

Options:

OptionDefaultDescription
--original PATHCopy ZIP metadata/comment from the original file
--validate true|falsetrueRun validate.py after packing

validate.py — Validate an OOXML file

python office-custom/scripts/validate.py document.docx
python office-custom/scripts/validate.py presentation.pptx spreadsheet.xlsx
python office-custom/scripts/validate.py *.docx --quiet

What it does — three checks in order:

  1. ZIP integrity — can the file be opened as a valid ZIP archive?
  2. Required parts — are [Content_Types].xml and _rels/.rels present?
  3. XML well-formedness — does every .xml / .rels member parse without error?

Output:

OK    document.docx  (14 XML members validated)
FAIL  broken.docx
      ERROR: XML parse error in word/document.xml: ...

Exit codes: 0 = all passed, 1 = any failure.

Options:

OptionDescription
--quietSuppress per-file output; only print summary

soffice.py — LibreOffice CLI wrapper

# Used programmatically by other scripts; can also be run directly:
python office-custom/scripts/soffice.py --headless --convert-to pdf document.docx
python office-custom/scripts/soffice.py --headless --convert-to docx document.doc
python office-custom/scripts/soffice.py --headless --convert-to pdf output.pptx

What it does:

  • Locates the soffice binary across macOS, Linux, and sandboxed environments:
    • macOS app bundle: /Applications/LibreOffice.app/Contents/MacOS/soffice
    • Linux packages: /usr/bin/soffice, /usr/bin/libreoffice
    • Snap: /snap/bin/libreoffice
    • PATH fallback
  • Creates an isolated temporary user-profile directory so LibreOffice doesn't need write access to ~/.config/libreoffice (critical in CI / sandboxes)
  • Drop-in pass-through: any arguments after the script name are forwarded verbatim to the soffice binary

Dependency: LibreOffice must be installed separately.

  • macOS: brew install --cask libreoffice
  • Ubuntu: apt install libreoffice

Standard Edit Workflow

┌─────────────┐   unpack.py   ┌───────────────┐   edit XML   ┌────────────────┐
│  input.docx │ ──────────── ▶│  unpacked/    │ ──────────── ▶│  unpacked/     │
│  (ZIP)      │               │  word/        │               │  word/         │
└─────────────┘               │  document.xml │               │  document.xml  │
                              │  styles.xml   │               │  (modified)    │
                              └───────────────┘               └────────┬───────┘
                                                                        │
                              ┌───────────────┐   pack.py              │
                              │  output.docx  │ ◀──────────────────────┘
                              │  (ZIP, clean) │   (auto-repair + validate)
                              └───────────────┘

Step-by-step

# 1. Unpack
python office-custom/scripts/unpack.py document.docx unpacked/

# 2. Edit XML directly — use the Edit tool on files inside unpacked/
#    e.g. unpacked/word/document.xml, unpacked/word/styles.xml

# 3. Repack
python office-custom/scripts/pack.py unpacked/ output.docx --original document.docx

# Optional: validate manually
python office-custom/scripts/validate.py output.docx

OOXML Structure Reference

Required parts (all formats)

PathPurpose
[Content_Types].xmlMaps ZIP entry paths to MIME content types
_rels/.relsTop-level package relationships (points to main document part)

Word document (.docx)

PathPurpose
word/document.xmlMain body — paragraphs, tables, runs
word/styles.xmlNamed styles (Normal, Heading 1, etc.)
word/settings.xmlDocument-level settings (compatibility, rsid tracking)
word/numbering.xmlList/bullet numbering definitions
word/fontTable.xmlFont declarations
word/comments.xmlComment bodies (created by docx-custom/scripts/comment.py)
word/theme/theme1.xmlColour and font theme
word/media/Embedded images and other binary assets
word/_rels/document.xml.relsRelationships for document.xml

PowerPoint presentation (.pptx)

PathPurpose
ppt/presentation.xmlPresentation-level metadata and slide list
ppt/slides/slide1.xmlIndividual slide content
ppt/slideLayouts/slideLayout1.xmlLayout templates
ppt/slideMasters/slideMaster1.xmlMaster slide
ppt/theme/theme1.xmlColour and font theme
ppt/media/Embedded images

Excel workbook (.xlsx)

PathPurpose
xl/workbook.xmlSheet list and workbook metadata
xl/worksheets/sheet1.xmlIndividual sheet data and formulas
xl/styles.xmlCell formatting
xl/sharedStrings.xmlString table (shared across all cells)
xl/calcChain.xmlFormula calculation order
xl/theme/theme1.xmlColour and font theme

Common XML Namespaces

PrefixURIUsed in
w:http://schemas.openxmlformats.org/wordprocessingml/2006/main.docx content
a:http://schemas.openxmlformats.org/drawingml/2006/mainDrawing (all formats)
r:http://schemas.openxmlformats.org/officeDocument/2006/relationshipsRelationships
p:http://schemas.openxmlformats.org/presentationml/2006/main.pptx content
x:http://schemas.openxmlformats.org/spreadsheetml/2006/main.xlsx content
mc:http://schemas.openxmlformats.org/markup-compatibility/2006Markup compatibility
w14:http://schemas.microsoft.com/office/word/2010/wordmlWord 2010+ extensions

Auto-repair Details

pack.py repairs two common issues automatically:

1. Out-of-range w:durableId

Word assigns durableId values (persistent run identifiers) as 31-bit integers. Values ≥ 0x7FFFFFFF (2,147,483,648) are invalid and cause Word 2016+ to refuse to open the file. These appear when content is copy-pasted from malformed documents.

Fix: Replace any out-of-range value with a random valid integer in [1, 0x7FFFFFFE].

2. Missing xml:space="preserve" on <w:t>

The XML spec says parsers may strip leading/trailing whitespace from text nodes unless xml:space="preserve" is present. Word relies on this for spaces between runs (e.g. "Hello " + "world").

Fix: Add xml:space="preserve" to any <w:t> whose text starts or ends with a space.


Smart Quote Entities

When editing XML directly, use these XML entities instead of Unicode characters to prevent encoding corruption:

CharacterEntityDescription
"&#x201C;Left double quotation mark
"&#x201D;Right double quotation mark
'&#x2018;Left single quotation mark
'&#x2019;Right single quotation mark / apostrophe
&#x2013;En dash
&#x2014;Em dash
 &#xA0;Non-breaking space

unpack.py converts these automatically on extraction; pack.py preserves them.


Dependencies

ToolInstallUsed by
LibreOfficebrew install --cask libreoffice / apt install libreofficesoffice.py
Python stdlib(built-in)unpack.py, pack.py, validate.py

No third-party Python packages are required for the office utilities themselves.


See Also

  • $raw-document — specification-level reference for when the unpack/repack/validate cycle surfaces XML errors that require looking up OOXML or ODF schemas, namespace definitions, or element-level specification details.
← Back to marketplace