Document intelligence

Turn any PDF or image URL into structured data — metadata, extracted text, sliced page ranges, OCR for scanned docs, decoded barcodes / QR codes — without falling back to a vision LLM guess. Built for the messy 30% of documents where pdf-to-markdown alone returns nothing useful.

When to use this pack

An agent gets a PDF link from a webhook (invoice, contract, receipt, regulatory filing) or an image URL (shipping label, scanned form, photographed ticket) and needs to extract structured fields deterministically. content-extraction handles the easy path; document-intel adds metadata inspection, page slicing, OCR fallback for scanned PDFs, embedded barcode / QR decoding, and PDF reassembly for downstream sharing.

Tools in this pack

PDF info $0.002 POST /api/pdf-info Inspect a PDF without downloading the whole thing into your model: page count, title, author, subject, creator, producer, creation/modification dates, encryption flag, and byte size. Body: {"url":"https://…/file.pdf"}.
PDF to Markdown $0.01 POST /api/pdf-to-markdown Convert a PDF to clean markdown: headings, paragraphs, and bullets reconstructed from the text layer — ready to drop into a model's context. Body: {"url":"https://…/file.pdf"}.
Extract / split PDF pages $0.003 POST /api/pdf-extract-pages Pull a subset of pages into a new PDF (split). Body: {"url":"https://…/file.pdf","pages":"1-3,5"}. Returns the new PDF as base64.
Image OCR $0.01 POST /api/image-ocr Extract text from an image (PNG/JPEG): returns the full text, overall confidence (0-100), and per-line bounding boxes. Send either {image: base64} or {url: 'https://…'}. Pure-CPU Tesseract via tesseract.js — no upstream API, no keys. Default lang 'eng'; pass 'lang' (ISO 639-2) for others.
Barcode / QR decode $0.003 POST /api/barcode-decode Decode a barcode or QR code from an image. Send a base64 PNG or JPEG (or a data: URL); returns the decoded text and the symbology. Reads QR, DataMatrix, and 1D barcodes (EAN/UPC/Code39/Code128/ITF/Codabar). Deterministic, no network, no model.
Merge PDFs $0.004 POST /api/pdf-merge Combine several PDFs into one, in order. Body: {"urls":["https://…/a.pdf","https://…/b.pdf"]}. Returns the merged PDF as base64 plus page count and size.
Images to PDF $0.004 POST /api/images-to-pdf Combine PNG/JPEG images into a single PDF, one image per page. Body: {"urls":["https://…/1.png","https://…/2.jpg"]}. Returns the PDF as base64.

Workflow

Start with pdf-info — confirms the URL actually serves a PDF (some webhooks lie about content-type), returns the page count for scoping, and surfaces flags like `encrypted` so you don't waste a pdf-to-markdown call that will fail. Skip only if you already know the document's shape.
Run pdf-to-markdown for the happy path. Digital-native PDFs — invoices generated by accounting software, Word/Google-Docs exports, EDGAR filings — come back as clean markdown in one call. This handles ~70% of real-world PDF intake; the next steps are for the other 30%.
If the document is long (>20 pages) and you only need a slice — the signature page on a contract, the line-item table on an invoice, an appendix from a research report — call pdf-extract-pages with the page range first. Then run pdf-to-markdown on the extracted slice. Cheaper, faster, and the smaller payload reduces noise downstream.
If pdf-to-markdown returns <50 characters of text, the PDF is a raster (a scanned document, a photo-of-a-receipt PDF, or a contract that was printed and re-scanned). Fall back to image-ocr — feed it the rendered page image. Tesseract-grade OCR is deterministic and surfaces the text that pdf-to-markdown couldn't.
For invoices, shipping labels, event tickets, and packaging, the high-value structured payload is often encoded in a barcode or QR code rather than visible text. Run barcode-decode on the page image — it returns the raw payload (shipping tracking numbers, EAN/UPC product codes, base64 / JWT ticket payloads). Feed JWT-shaped payloads to the decode-blob pack for further unwrapping.
Use pdf-merge when you've extracted slices from multiple PDFs and want to combine them into a single artifact — building a deal package (term sheet + signature page + appendix), or stitching a multi-vendor invoice export back together for accounting.
Use images-to-pdf when the source material was a set of phone photos (receipts, whiteboard captures, scanned pages handed to you out-of-order) and you need to wrap them into one shareable PDF — either as the final deliverable or as the input to a re-run of this same pipeline at higher quality.

Run it in Claude

claude mcp add agent402 -s user -- npx -y agent402-mcp@latest

Then paste this prompt into Claude:

Process this invoice with Agent402: https://example.com/invoice.pdf. (1) Run pdf-info to confirm it's a PDF, get the page count, check the `encrypted` flag. (2) If not encrypted, call pdf-to-markdown with the URL. (3) Inspect the returned markdown — if it has <50 chars of text, the PDF is scanned: call pdf-extract-pages to get each page as an image, then run image-ocr on each. (4) If you still can't find a tracking number after parsing the OCR text, run barcode-decode on page 1 to surface an embedded QR / barcode payload. (5) Return a single JSON object: {invoiceNumber, totalAmount, vendor, lineItems, trackingNumber, source: "pdf-to-markdown" | "image-ocr" | "barcode-decode"} — populate `source` based on which extraction path actually produced the data. Budget ≤ $0.05 per document; all of these tools are wallet-only (paid per call).

← All skill packs