Back to blog Technical guide

Building a Custom JSON Schema for Supply Chain Documents: A Practical Tutorial

A schema-first approach for invoices, packing lists, BOLs, and more. Learn how to structure strict contracts for messy data.

logistics json schema data automation tech-tutorial
Published
January 25, 2026
Read time
3 min
Word count
592
Building a Custom JSON Schema for Supply Chain Documents: A Practical Tutorial preview

JSON Schema Architecture

Building a Custom JSON Schema for Supply Chain Documents

Supply chain documents are notoriously messy. Layouts shift, terminology varies (“Vendor” vs “Supplier”), and handwriting invades printed forms.

Yet, your downstream systems (ERP, TMS, WMS) demand rigid, structured data.

How do you bridge the gap? JSON Schema.

A well-designed JSON Schema is the contract that makes automation reliable. It forces the chaotic output of OCR engines into a strict structure that your code can trust.

The “Composable” Approach

Don’t write one giant schema for every document. And don’t write entirely separate schemas for every variation. Use composition.

Most logistics documents share about 60% of their DNA.

Schema Inheritance Hierarchy

1. Define Shared Definitions

Create a library of core components. Reuse these everywhere.

// definitions.json
{
  "definitions": {
    "address": {
      "type": "object",
      "properties": {
        "street": { "type": "string" },
        "city": { "type": "string" },
        "country_code": { "type": "string", "pattern": "^[A-Z]{2}$" }
      },
      "required": ["country_code"]
    },
    "monetary_amount": {
      "type": "object",
      "properties": {
        "value": { "type": "number" },
        "currency": { "type": "string", "enum": ["USD", "EUR", "CNY", "GBP"] }
      },
      "required": ["value", "currency"]
    }
  }
}

2. Document-Specific Schemas

Now, assemble your specific document schemas using these building blocks.

The Commercial Invoice

Focuses on Financials.

{
  "$id": "https://example.com/schemas/invoice.json",
  "type": "object",
  "properties": {
    "invoice_number": { "type": "string" },
    "vendor": { "$ref": "definitions.json#/address" },
    "total_amount": { "$ref": "definitions.json#/monetary_amount" },
    "line_items": {
      "type": "array",
      "items": {
        "properties": {
          "description": { "type": "string" },
          "unit_price": { "type": "number" },
          "quantity": { "type": "integer" }
        }
      }
    }
  },
  "required": ["invoice_number", "total_amount"]
}

The Bill of Lading

Focuses on Movement.

{
  "$id": "https://example.com/schemas/bol.json",
  "type": "object",
  "properties": {
    "bol_number": { "type": "string" },
    "shipper": { "$ref": "definitions.json#/address" },
    "consignee": { "$ref": "definitions.json#/address" },
    "vessel_name": { "type": "string" },
    "containers": {
      "type": "array",
      "items": {
        "properties": {
          "id": { "type": "string", "pattern": "^[A-Z]{4}[0-9]{7}$" },
          "seal_number": { "type": "string" }
        }
      }
    }
  }
}

Notice how vendor, shipper, and consignee all use the same address definition. This ensures that no matter the source document, an “Address” always looks the same to your database.

Validation Strategy: The Gatekeeper

Validation is not just about structure; it represents your business rules.

Validation Gate Workflow

Implement a “Validation Gate” before data persistence:

  1. Structure: Is it valid JSON?
  2. Types: Is total_amount a number, not a string?
  3. Constraints: Is the country_code exactly 2 letters?
  4. Logic: Does subtotal + tax = total? (This usually requires a custom validation layer on top of JSON Schema).

If a document fails the schema, fail fast. Do not try to “fix” it silently. Route it to a human queue or reject the request. Partial data is often worse than no data.

Schema Governance & Versioning

Your business changes. Your documents will too.

If you add a new field HS_Code to your line items, you are changing the contract.

Schema Versioning Timeline

  • Semantic Versioning: Use v1.0, v1.1, v2.0.
  • Backward Compatibility: Adding an optional field is safe (minor version). Renaming a required field breaks integrations (major version).
  • Registry: Keep all your schemas in a central registry (even a private GitHub repo works). Do not hardcode schemas inside application code.

Bottom Line

Schema design is the foundation of document automation.

  • Reuse common components to keep maintenance low.
  • Validate strictly to keep data quality high.
  • Version clearly to keep your sanity as the system grows.

Get the schema right once, and the rest of your automation pipeline becomes a solved problem.

Try LeapOCR on your own documents

Start with 100 free credits and see how your workflow holds up on real files.

Eligible paid plans include a 3-day trial with 100 credits after you add a credit card, so you can test actual PDFs, scans, and forms before committing to a rollout.

Keep reading

Related notes for the same operating context

More implementation guides, benchmarks, and workflow notes for teams building document pipelines.