JSON that fits downstream systems
Return typed fields shaped for validators, automations, internal services, or any workflow that expects a fixed contract.
Use structured JSON when the next step is code, validation, or automation. LeapOCR reads the page, maps it to your schema, and lets you add instructions or bbox only when the workflow needs them.
Schema-first extraction without post-processing glue
Every step is priced per page. Start with base OCR, then add customization and bbox only when the page calls for it.
Extract the page into JSON shaped for a predictable downstream contract.
Apply instructions like translate values, normalize dates, or coerce optional fields to null.
Attach coordinates to selected fields when review tooling or overlays need page geometry.
What you can ask for
Return typed fields shaped for validators, automations, internal services, or any workflow that expects a fixed contract.
Translate values, coerce dates, flatten labels, or emit nulls for missing fields without building a custom cleanup layer around OCR.
Keep bbox out of the response by default, then add it when review tooling, auditing, or UI overlays need page coordinates.
Examples
Each example includes a sample schema and a real extracted payload so you can see what schema-first OCR looks like on invoices, receipts, IDs, forms, and logistics paperwork.
This is a straight schema-first extraction of the invoice with semantic section bbox attached to the major document blocks.
Hard part: Header fields plus line items
{ "type": "object", "properties": { "invoice_number": { "type": "string" }, "invoice_date": { "type": "string", "format": "date" }, "due_date": { "type": "string", "format": "date" }, "customer_name": { "type": "string" }, "customer_id": { "type": "string" }, "purchase_order_number": { "type": "string" }, "line_items": { "type": "array", "items": { "type": "object", "properties": { "description": { "type": "string" }, "quantity": { "type": "number" }, "unit_price": { "type": "number" }, "total": { "type": "number" } } } }, "totals": { "type": "object", "properties": { "subtotal": { "type": "number" }, "sales_tax": { "type": "number" }, "total": { "type": "number" }, "previous_unpaid_balance": { "type": "number" }, "total_due": { "type": "number" } } } }}
{ "invoice_number": "INV-100", "invoice_date": "2019-11-15", "due_date": "2019-12-15", "customer_name": "MICROSOFT CORPORATION", "customer_id": "CID-12345", "purchase_order_number": "PO-3333", "line_items": [ { "description": "Consulting service", "quantity": 1, "unit_price": 100.0, "total": 100.0 } ], "totals": { "subtotal": 100.0, "sales_tax": 10.0, "total": 110.0, "previous_unpaid_balance": 500.0, "total_due": 610.0 }, "bbox_sections": { "header": [0.55, 0.78, 0.39, 0.14], "parties": [0.04, 0.54, 0.84, 0.28], "line_items": [0.03, 0.36, 0.91, 0.11], "totals": [0.56, 0.23, 0.39, 0.15] }}
This payload reflects the actual photographed receipt, including the handwritten tip and total, with bbox attached to semantic sections.
Hard part: Photographed receipt with handwriting
Translate only the merchant-facing labels for review UI while preserving the raw merchant and line item values.
{ "type": "object", "properties": { "merchant": { "type": "string" }, "transaction_datetime": { "type": "string" }, "sales_associate": { "type": "string" }, "line_items": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "quantity": { "type": ["number", "null"] }, "amount": { "type": ["number", "null"] } } } }, "subtotal": { "type": "number" }, "tax": { "type": "number" }, "tip": { "type": ["number", "object"] }, "total": { "type": ["number", "object"] } }}
{ "merchant": "Contoso", "transaction_datetime": "2019-06-10T13:59:00", "sales_associate": "Paul", "line_items": [ { "name": "Cappuccino", "quantity": 1, "amount": 2.2 }, { "name": "BACON & EGGS", "quantity": 1, "amount": 9.5 }, { "name": "Sunny-side-up", "quantity": null, "amount": null } ], "subtotal": 11.7, "tax": 1.17, "tip": 1.63, "total": 14.5, "bbox_sections": { "merchant": [0.12, 0.64, 0.62, 0.28], "line_items": [0.11, 0.41, 0.66, 0.19], "totals": [0.22, 0.14, 0.58, 0.25] }}
The license image is mapped directly into structured identity fields with no extra instruction layer.
Hard part: Dense labels in a compact card layout
{ "type": "object", "properties": { "document_type": { "type": "string" }, "issuing_state": { "type": "string" }, "document_number": { "type": "string" }, "first_name": { "type": "string" }, "last_name": { "type": "string" }, "date_of_birth": { "type": "string", "format": "date" }, "issue_date": { "type": "string", "format": "date" }, "expiry_date": { "type": "string", "format": "date" }, "address": { "type": "string" } }}
{ "document_type": "driver_license", "issuing_state": "WA", "document_number": "WDLABCD456DG", "first_name": "LIAM R.", "last_name": "TALBOT", "date_of_birth": "1958-01-06", "issue_date": "2015-01-06", "expiry_date": "2020-08-12", "address": "123 STREET ADDRESS, YOUR CITY WA 99999-1234", "class": "B", "sex": "M", "height": "5'-08"", "eyes": "BLU", "weight_lb": 165, "restrictions": "B", "endorsement": "L", "veteran": true}
This extraction turns the scanned proposal into a more useful commercial object while keeping the actual values from the form.
Hard part: Checkboxes, dense text blocks, and a noisy scan
Convert checkbox marks into booleans and keep the footer as a short note instead of a long OCR dump.
{ "type": "object", "properties": { "proposal_number": { "type": "string" }, "proposal_date": { "type": "string", "format": "date" }, "customer": { "type": "string" }, "contact_name": { "type": "string" }, "item": { "type": "string" }, "specs": { "type": "object", "properties": { "material": { "type": "string" }, "size": { "type": "string" }, "gauge": { "type": "string" }, "colors": { "type": "array", "items": { "type": "string" } } } }, "commercial_terms": { "type": "object", "properties": { "quantity": { "type": "number" }, "unit_price": { "type": "number" }, "tooling_charge": { "type": "number" } } } }}
{ "proposal_number": "10675", "proposal_date": "1987-10-16", "customer": "Lorillard Corporation", "contact_name": "Mr. Robert Kennedy", "item": "Harley Davidson Metal Plaque", "specs": { "material": "Aluminum", "size": "17 1/2 x 23 1/2", "gauge": ".025", "colors": ["Transparent gold", "opaque black", "white", "orange"], "single_face": true, "holes": 4, "corners": "square", "edges": "hemmed", "stamp_frame": true, "embossed": true }, "commercial_terms": { "quantity": 500, "unit_price": 9.18, "tooling_charge": 3015.0, "steel_tips": 1045.0 }, "billing": "bill as manufacture", "warehousing": "ship immediately", "notes": "Footer contains standard price-adjustment, freight, and liability language."}
Because the sample is an empty template, the structured output reflects a blank shipping form rather than a populated shipment.
Hard part: Logistics parties plus shipment detail
{ "type": "object", "properties": { "shipment_number": { "type": "string" }, "page": { "type": "string" }, "ship_from": { "type": "object" }, "ship_to": { "type": "object" }, "third_party_bill_to": { "type": "object" }, "carrier_information": { "type": "object" }, "freight_charge_terms": { "type": "object" }, "customer_order_rows": { "type": "array" } }}
{ "shipment_number": null, "page": "1 of 1", "ship_from": { "name": "[Name]", "street_address": "[Street Address]", "city_state_zip": "[City, ST ZIP Code]", "sid_number": null }, "ship_to": { "name": "[Name]", "street_address": "[Street Address]", "city_state_zip": "[City, ST ZIP Code]", "cid_number": null }, "third_party_bill_to": { "name": "[Name]", "street_address": "[Street Address]", "city_state_zip": "[City, ST ZIP Code]" }, "carrier_information": { "carrier_name": null, "trailer_number": null, "serial_numbers": null, "pro_number": null }, "freight_charge_terms": { "prepaid": false, "collect": false, "third_party": false }, "customer_order_rows": [], "special_instructions": null}
Ready to test