Back to blog Technical guide

Building a Custom ESG Template: A Step-by-Step Guide to Template Builder

Practical tutorial on using LeapOCR's template builder to define and extract fields from a specific ESG document type.

tutorial template builder how-to practical developer
Published
January 18, 2025
Read time
10 min
Word count
2,132
Building a Custom ESG Template: A Step-by-Step Guide to Template Builder preview

Custom Template Header

Building a Custom ESG Template: A Step-by-Step Guide to Template Builder

You have a new ESG document type to process. Let’s say it’s Waste Management Reports from your facilities. Each report contains:

  • Facility identification
  • Reporting period
  • Waste generated by category (recycling, landfill, hazardous)
  • Disposal methods and costs
  • Regulatory compliance status

You could manually extract data from 50 monthly reports. Or you could build a custom template once and automate the entire process.

This guide shows you how to build ESG extraction templates using LeapOCR’s Template Builder. No ML expertise required.

JSON Schema validation typically achieves 80-90% schema compliance and 65-90% semantic accuracy for complex ESG documents, compared to 3-10% with traditional extraction methods. Once you create and validate a template, it can process thousands of documents with 99%+ accuracy.

What is a Template?

A template is a reusable configuration that tells LeapOCR three things:

  1. What data to extract (JSON Schema)
  2. How to extract it (natural language instructions)
  3. Where to find it (document understanding via VLM)

You can use a single template to process thousands of documents consistently.

Template Components

Let’s break down what makes up a template.

Anatomy of a LeapOCR Template: Schema, Instructions, and Model FIG 1.0 — The three pillars of a robust extraction template

Basic Information

{
  "name": "esg-waste-management-report",
  "description": "Extract waste data from monthly facility reports",
  "format": "structured"
}

JSON Schema

The JSON Schema defines your output structure.

{
  "type": "object",
  "properties": {
    "facility_id": { "type": "string" },
    "reporting_month": { "type": "string", "format": "date" },
    "waste_categories": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "category": {
            "enum": ["recycling", "landfill", "hazardous", "organic"]
          },
          "weight_kg": { "type": "number", "minimum": 0 },
          "disposal_method": { "type": "string" },
          "cost_eur": { "type": "number", "minimum": 0 }
        }
      }
    },
    "compliance_status": {
      "type": "string",
      "enum": ["compliant", "non-compliant", "pending"]
    }
  },
  "required": ["facility_id", "reporting_month", "waste_categories"]
}

Instructions

Provide natural language guidance for the AI.

Extract waste management data from facility reports:

Key fields to extract:
- Facility ID (often in header)
- Reporting month (look for "Report for:" or "Month:")
- Waste categories (look for tables with: category, weight, disposal method, cost)
  - Categories: recycling, landfill, hazardous, organic
  - Weight in kilograms (kg)
  - Disposal method (incineration, landfill, composting, etc.)
  - Cost in EUR
- Compliance status (look for "compliant," "violation," "pending review")

Table structure:
- Usually a table with rows for each waste category
- Columns might include: Category, Weight (kg), Disposal Method, Cost (€)
- Handle variations: "Weight" vs "Mass," "Cost" vs "Fee"

If multiple tables, look for the summary table with totals.

Model Selection

Choose the right model for your document:

ModelBest ForCost
standard-v1Clean, standard layouts1 credit/page
english-pro-v1Complex layouts, English2 credits/page
pro-v1Handwriting, poor scans, multilingual3 credits/page

Waste reports often contain handwritten notes and varying formats, so use pro-v1.

Step-by-Step Tutorial

Now let’s build a template together.

Step 1: Access Template Builder

  1. Log into LeapOCR Dashboard
  2. Navigate to TemplatesCreate New Template
  3. Select Structured format

Step 2: Define Basic Information

Template name: esg-waste-management-report

Description: Extract waste data from monthly facility reports including waste categories, weights, disposal methods, and compliance status

Step 3: Build JSON Schema

You can use the visual builder or write JSON directly.

Visual Schema Builder (No-Code)

Template Builder UI showing schema field configuration FIG 3.0 — Using the visual builder to define schema fields

  1. Click Add Field
  2. Enter field name: facility_id
  3. Select type: String
  4. Check Required
  5. Repeat for all fields

JSON Schema Editor (Code)

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "facility_id": {
      "type": "string",
      "description": "Unique facility identifier"
    },
    "reporting_month": {
      "type": "string",
      "format": "date",
      "description": "Reporting month in YYYY-MM format"
    },
    "total_waste_kg": {
      "type": "number",
      "minimum": 0,
      "description": "Total waste generated in kilograms"
    },
    "waste_breakdown": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "category": {
            "type": "string",
            "enum": ["recycling", "landfill", "hazardous", "organic", "other"]
          },
          "weight_kg": {
            "type": "number",
            "minimum": 0
          },
          "percentage": {
            "type": "number",
            "minimum": 0,
            "maximum": 100
          },
          "disposal_method": {
            "type": "string"
          },
          "cost_eur": {
            "type": "number",
            "minimum": 0
          },
          "supplier": {
            "type": "string"
          }
        }
      }
    },
    "compliance": {
      "type": "object",
      "properties": {
        "status": {
          "type": "string",
          "enum": ["compliant", "non-compliant", "pending_review"]
        },
        "violations": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "type": { "type": "string" },
              "description": { "type": "string" },
              "severity": {
                "type": "string",
                "enum": ["low", "medium", "high"]
              }
            }
          }
        }
      }
    }
  },
  "required": ["facility_id", "reporting_month", "total_waste_kg", "waste_breakdown"]
}

Step 4: Write Natural Language Instructions

Give the AI clear guidance about what to extract.

Extract waste management data from monthly facility reports.

DOCUMENT STRUCTURE
Most reports follow this structure:
- Header: Facility name/ID, reporting month
- Table 1: Waste breakdown by category (recycling, landfill, hazardous, organic)
- Table 2: Disposal methods and suppliers
- Footer: Compliance status, violations if any

KEY FIELDS TO EXTRACT

1. Facility Identification:
   - Look for "Facility," "Site," "Location," or ID codes like "FAC-001"
   - Extract exactly as shown (preserve format)

2. Reporting Month:
   - Look for "Report for:" "Month:" "Period:"
   - Extract in YYYY-MM format (convert if necessary)
   - Examples: "January 2024" → "2024-01", "01/2024" → "2024-01"

3. Total Waste:
   - Look for "Total Waste," "Total Generated," "Grand Total"
   - Extract weight in kilograms (kg)
   - Convert if given in tonnes (1 tonne = 1,000 kg)

4. Waste Breakdown (Array):
   Look for a table with columns like:
   - Category / Type / Waste Type
   - Weight / Mass (kg)
   - Percentage / % of Total
   - Disposal Method
   - Cost / Fee (€)
   - Supplier / Waste Management Company

   Extract each row as an object in the waste_breakdown array.

5. Compliance Status:
   - Look for "Compliance Status," "Regulatory Status," "Inspection Result"
   - Values: "compliant," "non-compliant," "pending_review"
   - If violations exist, extract:
     * Violation type (e.g., "improper disposal," "missing documentation")
     * Description (what went wrong)
     * Severity (low, medium, high)

HANDLING VARIATIONS

- Different column names: "Weight" = "Mass" = "Amount"
- Different units: tonnes → kg (multiply by 1,000)
- Different formats: "1.234,56 kg" → 1234.56 (European decimal)
- Missing data: If category not listed, don't include in array
- Multiple tables: Use the table with the most complete data

MULTI-LANGUAGE SUPPORT

If document is not English:
- German: "Abfall" (waste), " recycling" (recycling), "Deponie" (landfill)
- French: "déchets" (waste), "recyclage" (recycling), "décharge" (landfill)
- Spanish: "residuos" (waste), "reciclaje" (recycling), "vertedero" (landfill)

Extract data regardless of language, output in English.

Step 5: Configure Model & Settings

Model: pro-v1

Confidence Threshold: 0.95

Tags: ["esg", "waste", "facility"]

Color: green

Step 6: Test Your Template

Before deploying, test with sample documents.

  1. Click Test Template
  2. Upload 3-5 sample waste reports
  3. Review extraction results

Extraction flow showing waste report to JSON transformation FIG 2.0 — Logic flow: From unstructured report to validated JSON

Here’s what a good output looks like:

{
  "facility_id": "MUC-01",
  "reporting_month": "2024-01",
  "total_waste_kg": 12500,
  "waste_breakdown": [
    {
      "category": "recycling",
      "weight_kg": 5000,
      "percentage": 40,
      "disposal_method": "Material recovery facility",
      "cost_eur": 250,
      "supplier": "GreenRecycling GmbH"
    },
    {
      "category": "landfill",
      "weight_kg": 4500,
      "percentage": 36,
      "disposal_method": "Municipal landfill",
      "cost_eur": 675,
      "supplier": "City Waste Management"
    },
    {
      "category": "organic",
      "weight_kg": 2000,
      "percentage": 16,
      "disposal_method": "Composting",
      "cost_eur": 100,
      "supplier": "BioCompost AG"
    },
    {
      "category": "hazardous",
      "weight_kg": 1000,
      "percentage": 8,
      "disposal_method": "Specialized hazardous waste facility",
      "cost_eur": 800,
      "supplier": "HazardousWaste Solutions"
    }
  ],
  "compliance": {
    "status": "compliant",
    "violations": []
  },
  "confidence_score": 0.97
}

Check that all required fields are present, data types are correct, values are within expected ranges, and categories match your enum values.

Step 7: Refine Template

If extraction isn’t perfect, adjust your instructions based on what you’re seeing.

Missing waste categories

Add more specific instructions:

If waste breakdown table is split across multiple pages, combine all rows.
If category names vary (e.g., "Plastic recycling" vs "Recycling"), map to standard categories.

Wrong compliance status

Clarify what to look for:

Compliance status is NOT "compliant" if:
- Violations are listed
- "Pending review" is mentioned
- "Under investigation" is stated

Look for explicit statements like "fully compliant" or "no violations."

Handwritten notes not extracted

Make sure you’re using pro-v1 and add:

Extract handwritten notes in margins or on separate pages.
Look for handwritten annotations on compliance status or special handling instructions.

Step 8: Save & Deploy

Once you’re satisfied with the test results:

  1. Click Save Template
  2. Note your template_slug: esg-waste-management-report
  3. Test with 10-20 more documents
  4. Monitor accuracy metrics
  5. Deploy to production when ready

Using Your Template

You can use templates through the API or the dashboard.

Via API

from leapocr import LeapOCR

client = LeapOCR(api_key=os.getenv("LEAPOCR_API_KEY"))

# Process waste report using custom template
job = client.ocr.process_file(
  file_path="waste_report_jan2024.pdf",
  format="structured",
  template_slug="esg-waste-management-report"
)

result = client.ocr.wait_until_done(job["job_id"])

if result["status"] == "completed":
  waste_data = result["pages"][0]["result"]
  confidence = result["pages"][0"].get("confidence_score", 0)

  if confidence >= 0.95:
    # Auto-approve
    save_to_database(waste_data)
  else:
    # Flag for review
    flag_for_review(waste_data, confidence)

Via Dashboard

  1. Navigate to Documents → Upload
  2. Select one or more waste reports
  3. Choose template: esg-waste-management-report
  4. Click Process
  5. Review results in the Extractions table

Advanced Template Features

Once you’ve mastered the basics, you can add more sophisticated features.

Conditional Extraction

Extract fields only when certain conditions are met:

{
  "if": {
    "properties": {
      "compliance_status": { "const": "non-compliant" }
    }
  },
  "then": {
    "required": ["violations", "corrective_action_plan"]
  }
}

Array Validation

Make sure array items meet your criteria:

{
  "waste_breakdown": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "category": { "type": "string" },
        "weight_kg": { "type": "number", "minimum": 0 }
      },
      "required": ["category", "weight_kg"]
    },
    "minItems": 1,
    "uniqueItems": true
  }
}

Cross-Field Validation

Validate relationships between fields:

{
  "allOf": [
    {
      "if": {
        "properties": {
          "total_waste_kgh": { "type": "number" }
        },
        "required": ["total_waste_kg"]
      },
      "then": {
        "properties": {
          "total_waste_kg": {
            "const": { "$data": "1/waste_breakdown/sum(weight_kg)" }
          }
        }
      }
    }
  ]
}

Note: Cross-field validation typically requires application-level validation.

Template Best Practices

Here are some practices that work well in practice.

Start Simple

Build a basic template first, then add complexity gradually.

  • Step 1: Extract facility_id and reporting_month
  • Step 2: Add total_waste_kg
  • Step 3: Add waste_breakdown array
  • Step 4: Add compliance status
  • Step 5: Refine with conditional logic

Be Specific in Instructions

Vague instructions lead to inconsistent results.

Instead of:

Extract waste data.

Try:

Extract waste data from the table titled "Waste Summary."
Look for columns: Category, Weight (kg), Disposal Method, Cost (€).
Extract each row as an object in the waste_breakdown array.

Provide Examples

Example waste categories:
- "Paper recycling" → category: "recycling"
- "General waste" → category: "landfill"
- "Food waste" → category: "organic"
- "Chemical waste" → category: "hazardous"

Map variations to standard categories.

Handle Edge Cases

If weight is given in tonnes, convert to kg (multiply by 1,000).
If cost is missing, set to null or 0. Don't omit the field.
If a category is listed but weight is 0, still include it.

Test Continuously

  • Test on 10+ documents before deploying
  • Test on edge cases (handwritten, poor quality, multilingual)
  • Monitor confidence scores and error rates
  • Refine template based on feedback

Measuring Template Performance

Track these metrics to understand how well your template is working.

Key Metrics

MetricTargetHow to Measure
Field-level accuracy>95%Sample 20 fields, verify against source
Document-level accuracy>90%All fields correct
Confidence calibration±5%95% confidence = 95% actual accuracy
Processing speed<20 secAverage processing time
Auto-approval rate>85%% with confidence ≥95%

Monitoring Dashboard

Keep an eye on these metrics for each template:

  • Documents processed
  • Average confidence score
  • Low-confidence extractions (flagged for review)
  • Error rate by field
  • Processing time trends

Conclusion

Custom templates automate repetitive ESG document processing. A well-designed template achieves 99%+ accuracy on consistent documents, processes in under 20 seconds, and scales across thousands of documents.

Consider the time savings: manually processing 50 documents at 30 minutes each takes 25 hours. The same 50 documents take about 25 minutes with automation.

Your ESG documents have unique requirements, and your templates should reflect that.


Next Steps:

Try LeapOCR on your own documents

Start with 100 free credits and see how your workflow holds up on real files.

Eligible paid plans include a 3-day trial with 100 credits after you add a credit card, so you can test actual PDFs, scans, and forms before committing to a rollout.

Keep reading

Related notes for the same operating context

More implementation guides, benchmarks, and workflow notes for teams building document pipelines.