Line items stay line items
Descriptions, quantities, units, tax rates, and row totals can be returned in structured arrays instead of merged strings.
Many OCR tools can find the invoice total. Fewer return line items in a shape finance systems can actually trust. LeapOCR helps teams extract invoice rows, units, pricing, taxes, and totals into schema-fit JSON while keeping the page reviewable.
Extract item descriptions, SKUs, quantities, units, unit prices, taxes, and row totals.
The useful result is a row array, not a paragraph of OCR text that still has to be split apart downstream.
{ "url": "https://example.com/invoice.pdf", "file_name": "invoice.pdf", "format": "structured", "instructions": "Extract vendor metadata, invoice totals, and every invoice line item with quantity, unit price, tax, and line total.", "schema": { "type": "object", "properties": { "invoice_number": { "type": "string" }, "vendor_name": { "type": "string" }, "line_items": { "type": "array" } } }}
Why it works
Finance workflows usually break at the table level, not the invoice-title level.
Descriptions, quantities, units, tax rates, and row totals can be returned in structured arrays instead of merged strings.
The schema can reflect how your AP or ERP system expects line items, not how the PDF happens to present them.
Markdown helps reviewers verify a row-level discrepancy without losing the structured extraction path.
What you control
The fields below are the ones most likely to decide whether the invoice can be posted without manual repair.
Where vendors expose SKUs or service codes, teams often want them captured with the line item for downstream mapping.
These fields are what let finance systems validate the row instead of treating the invoice as one total amount.
Capturing tax and row totals helps AP teams reconcile invoice lines before posting or approval.
Line items are only useful when the invoice identity and vendor context travel with them.
Examples
Most teams either need row-ready JSON for posting logic or a readable invoice table for review and exception handling.
This is the common path for teams that want invoice rows to survive extraction without another parser or manual data-entry pass.
{ "invoice_number": "INV-8813", "vendor_name": "Harbor Office Supply", "invoice_total": 610.0, "line_items": [ { "description": "Consulting service", "quantity": 1, "unit_price": 100.0, "tax_rate": 10.0, "line_total": 100.0 } ]}
When AP teams need to verify a questionable row, markdown provides a cleaner review surface than raw OCR text.
# Invoice INV-8813- Vendor: Harbor Office Supply- Invoice total: 610.00## Line items| Description | Qty | Unit price | Tax | Line total || --- | ---: | ---: | ---: | ---: || Consulting service | 1 | 100.00 | 10% | 100.00 |
FAQ
Straight answers for teams evaluating how this workflow fits into production.
Yes. The workflow is designed to return line items as arrays with row-level values like quantity, unit price, tax, and totals.
Because many tools can find totals, but line-item tables are where real AP workflows usually break. This page targets that deeper extraction problem directly.
Yes. Markdown stays useful for row-level QA and exception handling while JSON powers the downstream workflow.
Ready to test
The best evaluation is simple: run a real invoice with line items and see whether the result is ready for AP or still needs another parsing layer.