Markdown that still feels like the source document
Markdown keeps headings, sections, tables, and line flow intact, which makes it useful for QA, handoff, and LLM context building.
LeapOCR starts from one upload surface, then lets you choose the result shape that actually fits the next step. Use markdown when a person or an LLM needs a clean document view. Use structured when a system needs fields it can trust immediately.
Start from file upload or remote URL ingestion in the same OCR workflow.
The parsing workflow stays simple on purpose: pick a source, choose the output shape, add optional guidance, and run.
Upload
Send a file or URL, choose output settings, and run OCR
LeapOCR fetches the remote file and processes it the same way as a direct upload.
{ "invoice_number": "string", "invoice_date": "date", "total": "number"}Request
URL upload with model, format, instructions, and schema ready
Why it works
The key decision is not technical jargon. It is whether the next consumer needs a readable document, a structured object, or both with review context attached.
Markdown keeps headings, sections, tables, and line flow intact, which makes it useful for QA, handoff, and LLM context building.
Structured mode returns a stable object instead of page prose, which means less post-processing, fewer brittle parsers, and cleaner integrations.
Bounding boxes are exposed as a boolean enhancement. Keep them off for pure parsing and on for review tools, overlays, and human-in-the-loop queues.
What you control
These are the decisions teams actually make when they turn OCR into a production workflow instead of a raw text dump.
Use markdown when the consumer needs a coherent page representation with headings, tables, and sections that still read naturally after OCR.
Use structured mode when another system wants fields, arrays, and nested objects it can validate and write directly without another parsing pass.
Use inline instructions for light behavior changes. Move heavier rules into templates once the workflow needs more room or more reuse.
Inline structured jobs need schema when you are not using a template. That is what keeps the result useful beyond a demo.
This is the cleanest path for production. It moves prompt, schema, and model choices out of every request body and into one reusable config.
Bbox is not its own parsing mode. It is a toggle that adds page coordinates when the consuming workflow needs visual grounding.
Examples
Most teams settle into one of two patterns: readable markdown for human or LLM handoff, and structured extraction for systems that need direct writes.
This is the cleanest route when the next consumer still needs to read the page, quote it, or pass it into another model with the layout preserved in text form.
{ "url": "https://example.com/claim-form.pdf", "file_name": "claim-form.pdf", "format": "markdown", "model": "standard-v1", "instructions": "Keep section headings and normalize dates.", "extract_bounding_boxes": false}
{ "job_id": "job_01", "status": "completed", "result_format": "markdown", "pages": [ { "page_number": 1, "result": "# Claim form\n\n## Policy holder\n- Name: ..." } ]}
This is the route for teams that need the API to return a stable object immediately instead of another layer of page text to parse.
{ "file_name": "purchase-order.pdf", "content_type": "application/pdf", "file_size": 918443, "format": "structured", "model": "standard-v1", "instructions": "Extract the order header and SKU table.", "schema": { "type": "object", "properties": { "po_number": { "type": "string" }, "order_date": { "type": "string" }, "items": { "type": "array", "items": { "type": "object", "properties": { "sku": { "type": "string" }, "quantity": { "type": "number" } } } } } }, "extract_bounding_boxes": true}
{ "job_id": "job_02", "status": "completed", "result_format": "structured", "pages": [ { "page_number": 1, "result": { "po_number": "PO-10441", "order_date": "2026-03-10", "items": [ { "sku": "AX-44", "quantity": 12 } ] } } ]}
FAQ
Straight answers for teams evaluating how this workflow fits into production.
Choose markdown when the result needs to stay readable. Choose structured when the next consumer is code, a queue, or a business system expecting named fields.
Use templates once a workflow repeats and the setup is stable. They let you keep model, instructions, schema, and review behavior in one reusable place.
No. It is an optional enhancement you can layer onto markdown or structured workflows when page geometry matters downstream.
Ready to test
Start with the output shape that matches the next step. When the workflow repeats, store the winning configuration as a template and stop rebuilding it.