Document intelligence

Turn any PDF or image URL into structured data — metadata, extracted text, sliced page ranges, OCR for scanned docs, decoded barcodes / QR codes — without falling back to a vision LLM guess. Built for the messy 30% of documents where pdf-to-markdown alone returns nothing useful.

When to use this pack

An agent gets a PDF link from a webhook (invoice, contract, receipt, regulatory filing) or an image URL (shipping label, scanned form, photographed ticket) and needs to extract structured fields deterministically. content-extraction handles the easy path; document-intel adds metadata inspection, page slicing, OCR fallback for scanned PDFs, embedded barcode / QR decoding, and PDF reassembly for downstream sharing.

Tools in this pack

Workflow

  1. Start with pdf-info — confirms the URL actually serves a PDF (some webhooks lie about content-type), returns the page count for scoping, and surfaces flags like `encrypted` so you don't waste a pdf-to-markdown call that will fail. Skip only if you already know the document's shape.
  2. Run pdf-to-markdown for the happy path. Digital-native PDFs — invoices generated by accounting software, Word/Google-Docs exports, EDGAR filings — come back as clean markdown in one call. This handles ~70% of real-world PDF intake; the next steps are for the other 30%.
  3. If the document is long (>20 pages) and you only need a slice — the signature page on a contract, the line-item table on an invoice, an appendix from a research report — call pdf-extract-pages with the page range first. Then run pdf-to-markdown on the extracted slice. Cheaper, faster, and the smaller payload reduces noise downstream.
  4. If pdf-to-markdown returns <50 characters of text, the PDF is a raster (a scanned document, a photo-of-a-receipt PDF, or a contract that was printed and re-scanned). Fall back to image-ocr — feed it the rendered page image. Tesseract-grade OCR is deterministic and surfaces the text that pdf-to-markdown couldn't.
  5. For invoices, shipping labels, event tickets, and packaging, the high-value structured payload is often encoded in a barcode or QR code rather than visible text. Run barcode-decode on the page image — it returns the raw payload (shipping tracking numbers, EAN/UPC product codes, base64 / JWT ticket payloads). Feed JWT-shaped payloads to the decode-blob pack for further unwrapping.
  6. Use pdf-merge when you've extracted slices from multiple PDFs and want to combine them into a single artifact — building a deal package (term sheet + signature page + appendix), or stitching a multi-vendor invoice export back together for accounting.
  7. Use images-to-pdf when the source material was a set of phone photos (receipts, whiteboard captures, scanned pages handed to you out-of-order) and you need to wrap them into one shareable PDF — either as the final deliverable or as the input to a re-run of this same pipeline at higher quality.

Run it in Claude

claude mcp add agent402 -s user -- npx -y agent402-mcp@latest

Then paste this prompt into Claude:

Process this invoice with Agent402: https://example.com/invoice.pdf. (1) Run pdf-info to confirm it's a PDF, get the page count, check the `encrypted` flag. (2) If not encrypted, call pdf-to-markdown with the URL. (3) Inspect the returned markdown — if it has <50 chars of text, the PDF is scanned: call pdf-extract-pages to get each page as an image, then run image-ocr on each. (4) If you still can't find a tracking number after parsing the OCR text, run barcode-decode on page 1 to surface an embedded QR / barcode payload. (5) Return a single JSON object: {invoiceNumber, totalAmount, vendor, lineItems, trackingNumber, source: "pdf-to-markdown" | "image-ocr" | "barcode-decode"} — populate `source` based on which extraction path actually produced the data. Budget ≤ $0.05 per document; all of these tools are wallet-only (paid per call).

← All skill packs