PDF to JSON OCR API

Extract schema-fit JSON from PDFs and scans without adding a second parsing layer

Use structured JSON when the next step is code, validation, or automation. LeapOCR reads the page, maps it to your schema, and lets you add instructions or bbox only when the workflow needs them.

Standard-v1 credit ladder

Schema-first extraction without post-processing glue

Give LeapOCR the page and the schema. Base OCR handles extraction, instructions clean up edge cases, and bbox stays optional so the payload stays lean when geometry is not required.
Standard-v1 pricing

Every step is priced per page. Start with base OCR, then add customization and bbox only when the page calls for it.

Base OCR
1 credit / page

Extract the page into JSON shaped for a predictable downstream contract.

Step
01
Customize
+1 credit / page

Apply instructions like translate values, normalize dates, or coerce optional fields to null.

Step
02
Bounding boxes
+1 credit / page

Attach coordinates to selected fields when review tooling or overlays need page geometry.

Step
03

What you can ask for

Keep the base flow simple, then add instructions where it matters

Translate values to French Coerce all dates to ISO-8601 Emit null when a field is missing Attach bbox to selected sections

JSON that fits downstream systems

Return typed fields shaped for validators, automations, internal services, or any workflow that expects a fixed contract.

Instructions for harder cases

Translate values, coerce dates, flatten labels, or emit nulls for missing fields without building a custom cleanup layer around OCR.

Optional geometry beside the payload

Keep bbox out of the response by default, then add it when review tooling, auditing, or UI overlays need page coordinates.

Examples

Five schema-driven JSON examples from real documents

Each example includes a sample schema and a real extracted payload so you can see what schema-first OCR looks like on invoices, receipts, IDs, forms, and logistics paperwork.

Sample invoice document from Azure
Open full image
Invoice

Invoice mapped straight into payables JSON

This is a straight schema-first extraction of the invoice with semantic section bbox attached to the major document blocks.

Hard part: Header fields plus line items

Base OCR Customize BBox
Sample JSON schema
json
  {  "type": "object",  "properties": {    "invoice_number": { "type": "string" },    "invoice_date": { "type": "string", "format": "date" },    "due_date": { "type": "string", "format": "date" },    "customer_name": { "type": "string" },    "customer_id": { "type": "string" },    "purchase_order_number": { "type": "string" },    "line_items": {      "type": "array",      "items": {        "type": "object",        "properties": {          "description": { "type": "string" },          "quantity": { "type": "number" },          "unit_price": { "type": "number" },          "total": { "type": "number" }        }      }    },    "totals": {      "type": "object",      "properties": {        "subtotal": { "type": "number" },        "sales_tax": { "type": "number" },        "total": { "type": "number" },        "previous_unpaid_balance": { "type": "number" },        "total_due": { "type": "number" }      }    }  }}
Extracted JSON
json
  {  "invoice_number": "INV-100",  "invoice_date": "2019-11-15",  "due_date": "2019-12-15",  "customer_name": "MICROSOFT CORPORATION",  "customer_id": "CID-12345",  "purchase_order_number": "PO-3333",  "line_items": [    {      "description": "Consulting service",      "quantity": 1,      "unit_price": 100.0,      "total": 100.0    }  ],  "totals": {    "subtotal": 100.0,    "sales_tax": 10.0,    "total": 110.0,    "previous_unpaid_balance": 500.0,    "total_due": 610.0  },  "bbox_sections": {    "header": [0.55, 0.78, 0.39, 0.14],    "parties": [0.04, 0.54, 0.84, 0.28],    "line_items": [0.03, 0.36, 0.91, 0.11],    "totals": [0.56, 0.23, 0.39, 0.15]  }}
Sample receipt document from Azure
Open full image
Receipt

Receipt mapped into an expense object with section bbox

This payload reflects the actual photographed receipt, including the handwritten tip and total, with bbox attached to semantic sections.

Hard part: Photographed receipt with handwriting

Instruction

Translate only the merchant-facing labels for review UI while preserving the raw merchant and line item values.

Base OCR Customize BBox
Sample JSON schema
json
  {  "type": "object",  "properties": {    "merchant": { "type": "string" },    "transaction_datetime": { "type": "string" },    "sales_associate": { "type": "string" },    "line_items": {      "type": "array",      "items": {        "type": "object",        "properties": {          "name": { "type": "string" },          "quantity": { "type": ["number", "null"] },          "amount": { "type": ["number", "null"] }        }      }    },    "subtotal": { "type": "number" },    "tax": { "type": "number" },    "tip": { "type": ["number", "object"] },    "total": { "type": ["number", "object"] }  }}
Extracted JSON with bbox
json
  {  "merchant": "Contoso",  "transaction_datetime": "2019-06-10T13:59:00",  "sales_associate": "Paul",  "line_items": [    { "name": "Cappuccino", "quantity": 1, "amount": 2.2 },    { "name": "BACON & EGGS", "quantity": 1, "amount": 9.5 },    { "name": "Sunny-side-up", "quantity": null, "amount": null }  ],  "subtotal": 11.7,  "tax": 1.17,  "tip": 1.63,  "total": 14.5,  "bbox_sections": {    "merchant": [0.12, 0.64, 0.62, 0.28],    "line_items": [0.11, 0.41, 0.66, 0.19],    "totals": [0.22, 0.14, 0.58, 0.25]  }}
Sample driver's license document from Azure
Open full image
Identity document

Driver license mapped into identity fields

The license image is mapped directly into structured identity fields with no extra instruction layer.

Hard part: Dense labels in a compact card layout

Base OCR Customize BBox
Sample JSON schema
json
  {  "type": "object",  "properties": {    "document_type": { "type": "string" },    "issuing_state": { "type": "string" },    "document_number": { "type": "string" },    "first_name": { "type": "string" },    "last_name": { "type": "string" },    "date_of_birth": { "type": "string", "format": "date" },    "issue_date": { "type": "string", "format": "date" },    "expiry_date": { "type": "string", "format": "date" },    "address": { "type": "string" }  }}
Extracted JSON
json
  {  "document_type": "driver_license",  "issuing_state": "WA",  "document_number": "WDLABCD456DG",  "first_name": "LIAM R.",  "last_name": "TALBOT",  "date_of_birth": "1958-01-06",  "issue_date": "2015-01-06",  "expiry_date": "2020-08-12",  "address": "123 STREET ADDRESS, YOUR CITY WA 99999-1234",  "class": "B",  "sex": "M",  "height": "5'-08"",  "eyes": "BLU",  "weight_lb": 165,  "restrictions": "B",  "endorsement": "L",  "veteran": true}
Sample scanned proposal form from the FUNSD dataset
Open full image
Scanned form

Proposal form coerced into a structured order object

This extraction turns the scanned proposal into a more useful commercial object while keeping the actual values from the form.

Hard part: Checkboxes, dense text blocks, and a noisy scan

Instruction

Convert checkbox marks into booleans and keep the footer as a short note instead of a long OCR dump.

Base OCR Customize BBox
Sample JSON schema
json
  {  "type": "object",  "properties": {    "proposal_number": { "type": "string" },    "proposal_date": { "type": "string", "format": "date" },    "customer": { "type": "string" },    "contact_name": { "type": "string" },    "item": { "type": "string" },    "specs": {      "type": "object",      "properties": {        "material": { "type": "string" },        "size": { "type": "string" },        "gauge": { "type": "string" },        "colors": { "type": "array", "items": { "type": "string" } }      }    },    "commercial_terms": {      "type": "object",      "properties": {        "quantity": { "type": "number" },        "unit_price": { "type": "number" },        "tooling_charge": { "type": "number" }      }    }  }}
Extracted JSON
json
  {  "proposal_number": "10675",  "proposal_date": "1987-10-16",  "customer": "Lorillard Corporation",  "contact_name": "Mr. Robert Kennedy",  "item": "Harley Davidson Metal Plaque",  "specs": {    "material": "Aluminum",    "size": "17 1/2 x 23 1/2",    "gauge": ".025",    "colors": ["Transparent gold", "opaque black", "white", "orange"],    "single_face": true,    "holes": 4,    "corners": "square",    "edges": "hemmed",    "stamp_frame": true,    "embossed": true  },  "commercial_terms": {    "quantity": 500,    "unit_price": 9.18,    "tooling_charge": 3015.0,    "steel_tips": 1045.0  },  "billing": "bill as manufacture",  "warehousing": "ship immediately",  "notes": "Footer contains standard price-adjustment, freight, and liability language."}
Sample bill of lading document from ForwardersIns
Open full image
Bill of lading

Bill of lading template turned into a structured blank form

Because the sample is an empty template, the structured output reflects a blank shipping form rather than a populated shipment.

Hard part: Logistics parties plus shipment detail

Base OCR Customize BBox
Sample JSON schema
json
  {  "type": "object",  "properties": {    "shipment_number": { "type": "string" },    "page": { "type": "string" },    "ship_from": { "type": "object" },    "ship_to": { "type": "object" },    "third_party_bill_to": { "type": "object" },    "carrier_information": { "type": "object" },    "freight_charge_terms": { "type": "object" },    "customer_order_rows": { "type": "array" }  }}
Extracted JSON
json
  {  "shipment_number": null,  "page": "1 of 1",  "ship_from": {    "name": "[Name]",    "street_address": "[Street Address]",    "city_state_zip": "[City, ST ZIP Code]",    "sid_number": null  },  "ship_to": {    "name": "[Name]",    "street_address": "[Street Address]",    "city_state_zip": "[City, ST ZIP Code]",    "cid_number": null  },  "third_party_bill_to": {    "name": "[Name]",    "street_address": "[Street Address]",    "city_state_zip": "[City, ST ZIP Code]"  },  "carrier_information": {    "carrier_name": null,    "trailer_number": null,    "serial_numbers": null,    "pro_number": null  },  "freight_charge_terms": {    "prepaid": false,    "collect": false,    "third_party": false  },  "customer_order_rows": [],  "special_instructions": null}

Ready to test

Start with base OCR. Spend the next credits only if the page deserves them.