PDF to markdown API

Convert PDFs and scanned documents into clean markdown with a developer-first OCR API

Use LeapOCR when you need PDF to markdown output that stays readable in real workflows. Keep headings, tables, and sections intact, then add instructions for translation, cleanup, or normalization only when a page needs extra work.

Standard-v1 credit ladder

Readable by default. Controllable when needed.

Base OCR gives you structured markdown with headings, lists, and tables preserved. Custom instructions and bbox stay optional, so you only pay more when the output has to do more.
Standard-v1 pricing

Every step is priced per page. Start with base OCR, then add customization and bbox only when the page calls for it.

Base OCR
1 credit / page

Read the page into clean markdown with headings, tables, and lists kept intact.

Step
01
Customize
+1 credit / page

Apply instructions like translate to French, normalize labels, or collapse noisy sections.

Step
02
Bounding boxes
+1 credit / page

Return coordinates for fields, lines, tables, or signatures when layout matters downstream.

Step
03

What you can ask for

Keep the base flow simple, then add instructions where it matters

Translate section labels to French Normalize totals and dates Condense noisy footer language Attach bbox to key sections

Readable on the first pass

Use markdown when humans still need to scan the output quickly without losing structure like headings, tables, or section breaks.

Adapt the output with instructions

Translate, normalize labels, condense noisy sections, or enforce a cleaner structure without changing the ingest pipeline.

Add layout only when it helps

Attach bbox when reviewers, overlays, or internal tools need geometry. Leave it off when plain markdown is enough.

Examples

Five markdown outputs rebuilt from real documents

Each example starts from OCR on a real document image, then shows how markdown can stay useful for review, translation, cleanup, and layout-aware workflows.

Sample invoice document from Azure
Open full image
Invoice

Invoice as markdown with section bbox

Direct OCR from the invoice sample, organized into markdown with semantic section bbox attached to the main document blocks and no instruction layer.

Hard part: Tables plus finance totals

Base OCR Customize BBox
Markdown output with bbox
md
  # Invoice INV-100## Seller <!-- bbox section.seller: [0.04, 0.80, 0.24, 0.12] -->- Company: CONTOSO LTD.- Address: Contoso Headquarters, 123 456th St, New York, NY, 10001## Customer <!-- bbox section.customer: [0.57, 0.80, 0.37, 0.11] -->- Customer name: MICROSOFT CORPORATION- Customer ID: CID-12345- Customer address: Microsoft Corp, 123 Other St, Redmond WA, 98052## Billing and delivery <!-- bbox section.billing_and_delivery: [0.05, 0.54, 0.74, 0.13] -->- Bill to: Microsoft Finance, 123 Bill St, Redmond WA, 98052- Ship to: Microsoft Delivery, 123 Ship St, Redmond WA, 98052- Service address: Microsoft Services, 123 Service St, Redmond WA, 98052## Invoice metadata- Invoice date: 11/15/2019- Due date: 12/15/2019- Service period: 10/14/2019 - 11/14/2019- P.O. number: PO-3333## Line items <!-- bbox section.line_items: [0.03, 0.36, 0.91, 0.11] -->| Quantity | Description        | Unit price | Total   || -------- | ------------------ | ---------- | ------- || 1        | Consulting service | $100.00    | $100.00 |## Totals <!-- bbox section.totals: [0.56, 0.23, 0.39, 0.15] -->- Subtotal: $100.00- Sales tax: $10.00- Total: $110.00- Previous unpaid balance: $500.00- Total due: $610.00## Remit to <!-- bbox section.remit_to: [0.04, 0.09, 0.23, 0.10] -->- Contoso Billing- 123 Remit St- New York, NY, 10001
Sample receipt document from Azure
Open full image
Receipt

Receipt translated to French with section bbox

The receipt is photographed at an angle and includes handwritten values, with semantic bbox returned for the main document sections.

Hard part: Skewed image plus handwritten value

Instruction

Translate section labels to French after OCR, but keep the merchant and item names unchanged.

Base OCR Customize BBox
Markdown output with bbox
md
  # Recu <!-- bbox section.header: [0.12, 0.64, 0.62, 0.28] -->- Commercant: Contoso- Adresse: 123 Main Street, Redmond, WA 98052- Telephone: 987-654-3210- Date: 6/10/2019 13:59- Vendeur: Paul## Articles <!-- bbox section.items: [0.11, 0.41, 0.66, 0.19] -->| Qté | Article       | Montant || --- | ------------- | ------- || 1   | Cappuccino    | $2.20   || 1   | BACON & EGGS  | $9.5    ||     | Sunny-side-up |         |## Totaux <!-- bbox section.totals: [0.22, 0.14, 0.58, 0.25] -->- Sous-total: $11.70- Taxe: $1.17- Pourboire: $1.63- Total: $14.50
Sample driver's license document from Azure
Open full image
Identity document

Driver license as direct OCR markdown

A compact card layout with dense labels, small text, and several adjacent identity fields.

Hard part: Compact layout and critical identity fields

Base OCR Customize BBox
Markdown output
md
  # Driver License- State: WASHINGTON USA- Document type: DRIVER LICENSE- Notice: FEDERAL LIMITS APPLY- License number: WDLABCD456DG- Class: B- Donor: yes- Last name: TALBOT- First name: LIAM R.- DOB: 01/06/1958- ISS: 01/06/2015- EXP: 08/12/2020- Address: 123 STREET ADDRESS, YOUR CITY WA 99999-1234- Sex: M- Height: 5'-08"- Eyes: BLU- Weight: 165 lb- Restrictions: B- Endorsement: L- Veteran: yes
Sample scanned proposal form from the FUNSD dataset
Open full image
Scanned form

Proposal form with dense footer condensed

This noisy form mixes typed fields, checkboxes, handwritten marks, and a long legal footer.

Hard part: Noisy scan, checkboxes, handwritten signature

Instruction

Preserve the commercial form sections, but compress the long legal footer into a short note.

Base OCR Customize BBox
Markdown output
md
  # Proposal 10675- Company: STOUT INDUSTRIES, INC.- Date: October 16, 1987- Customer: Lorillard Corporation- Address: 666 Fifth Avenue, New York, New York 10103- Attention: Mr. Robert Kennedy- Representative: Mr. A. D. Steinberg## Item- Item: Harley Davidson Metal Plaque- Size: 17 1/2 x 23 1/2- Gauge: .025- Colors: Transparent gold, opaque black, white and orange## Material and finish- Material: Aluminum- Base color: Aluminum- Single face: checked- Holes: yes- Number of holes: 4- Corners: square- Edges: hemmed- Stamp frame: yes- Embossed: yes## Commercial terms- Quantity: 500 plaques- Price: $9.18 each- One-time tooling: $3,015.00- Steel tips: $1,045.00- Billing: bill as manufacture- Warehousing: ship immediately- Terms: NET 10 DAYS## Notes- Price is based on reproduction of customer supplied "Pack" box.- Footer contains standard price-adjustment, freight, and liability language.
Sample bill of lading document from ForwardersIns
Open full image
Bill of lading

Blank bill of lading template as markdown

This sample is an unfilled bill of lading template, so the real OCR result is mostly section headers and placeholders.

Hard part: Logistics layout with party and shipment blocks

Base OCR Customize BBox
Markdown output
md
  # Bill of Lading - Short Form - Not Negotiable- Page: 1 of 1- Bill of lading number: [blank]## Ship from- Name: [Name]- Street address: [Street Address]- City, state, ZIP: [City, ST ZIP Code]- SID No.: [blank]## Ship to- Name: [Name]- Street address: [Street Address]- City, state, ZIP: [City, ST ZIP Code]- CID No.: [blank]## Third party freight charges bill to- Name: [Name]- Street address: [Street Address]- City, state, ZIP: [City, ST ZIP Code]## Carrier details- Carrier name: [blank]- Trailer number: [blank]- Serial number(s): [blank]- SPAC: [blank]- Pro number: [blank]## Freight charge terms- Prepaid: unchecked- Collect: unchecked- 3rd Party: unchecked## Customer order information- Customer order no.: [blank]- Package rows: blank template- Additional shipper information: blank template## Signatures- Shipper signature/date: blank- Trailer loaded: by shipper / by driver- Freight counted: by shipper / by driver

Ready to test

Start with base OCR. Spend the next credits only if the page deserves them.