CodexPDF CodexPDF

Getting started


title: "CLI" description: "Command reference for extract, probe, render, serve, parity, schema, contract, and validate workflows." group: "Getting started" order: 3

CLI

codex-pdf exposes a contract-oriented CLI built with argparse. The same code path that the HTTP API uses runs in-process when you invoke the CLI, so output is byte-for-byte identical to the deployed surface.

Commands

CommandPurpose
extract <pdf>Emit the full CodexDocument JSON.
probe <pdf>Two-event metadata probe (page count, dimensions, info dict, pdf_sha256).
schema [name]Print a published JSON schema (default: codex-document).
contractPrint the machine-readable contract manifest (endpoint inventory + section schema versions).
validate <codex_json>Validate a codex JSON payload against the published schema.
parityCompare codex projections against a baseline command.
render pageRender one page to PNG.
render separationsRender every separation channel for one page.
render heatmapRender a TAC heatmap PNG plus a per-run header.
render layerRender one OCG-isolated layer to RGBA PNG.
serveStart the codex HTTP API (uvicorn, in-process).

Common usage

uv run codex-pdf extract input.pdf --pretty > out.json
uv run codex-pdf validate out.json
uv run codex-pdf probe input.pdf --json
uv run codex-pdf contract --pretty

Streaming probe / extract (HTTP only)

The CLI's probe and extract are synchronous. The deployed HTTP API also exposes streaming variants that emit Phase 1 results as soon as PyMuPDF is finished and Phase 2 once pikepdf adds the slower fields:

  • POST /v1/probe — server-sent events with two frames (probe-min immediately, probe-std after the secondary parse).
  • POST /v1/extract/stream — same shape for full extraction; pass ?granular=1 to get per-section progress events.

The TypeScript client's probeStream() and extractStream() wrap this directly; the Python codex_pdf.client.HttpClient also has streaming helpers when used against a remote API.

Render usage

uv run codex-pdf render page input.pdf --page 0 --dpi 144 -o page.png
uv run codex-pdf render separations input.pdf --page 0 -o seps/
uv run codex-pdf render heatmap input.pdf --page 0 -o tac.png
uv run codex-pdf render layer input.pdf --page 0 --ocg "Dieline" -o dieline.png

Parity usage

uv run codex-pdf parity \
  --fixtures-root tests/fixtures \
  --profile deep \
  --max-files 10

Baseline command mode:

uv run codex-pdf parity \
  --fixtures-root /path/to/pdfs \
  --profile summary \
  --baseline-command "<command with {pdf} placeholder>"

Local server

uv run codex-pdf serve --host 0.0.0.0 --port 8080
curl localhost:8080/v1/version

The same image, in production, runs under gunicorn + uvicorn workers via the Dockerfile's CMD — see docs/deploy.md.