The Ultimate Guide to Automated Supplier Due Diligence for ESG
Focus on the 'S' in ESG—social compliance. Automating the review of supplier codes of conduct and certifications.
The Ultimate Guide to Automated Supplier Due Diligence for ESG
Your procurement team just received 200 supplier questionnaires. Each one contains a code of conduct compliance checklist, various certifications (ISO 14001, SA8000, EcoVadis), health and safety policies running 50+ pages, and human rights disclosures in different formats.
Reviewing a single supplier takes about 45 minutes. For 200 suppliers, that adds up to 150 person-days. Few teams have that kind of time to spare.
This guide shows you how to automate social compliance due diligence using AI-powered document processing.
The Regulatory Landscape
The EU Corporate Sustainability Due Diligence Directive (CS3D), adopted in May 2024, requires companies with 1,000+ employees and €450 million in global turnover to conduct due diligence. Compliance is mandatory by 2027. Similar regulations are appearing worldwide, making automation essential for scaling supplier assessments.
By 2026-2027, over 50,000 companies will need to perform supplier due diligence. The KPMG Global ESG Due Diligence Study 2024 found that ESG considerations are becoming a priority in transactions, with leading investors integrating ESG factors into their investment decisions.
The “S” in ESG: Why Social Compliance Matters
Social compliance covers three main areas.
Labor practices include child labor prohibitions, forced labor prevention, working hours compliance, and fair wages and benefits.
Human rights encompass non-discrimination policies, freedom of association, collective bargaining rights, and workplace safety.
Ethics cover anti-bribery and corruption measures, whistleblower protections, data privacy, and supply chain transparency.
Several regulations drive these requirements:
- EU CS3D (Corporate Sustainability Due Diligence Directive): Affects companies with 1,000+ employees and €450M turnover. Compliance required by 2027.
- German Supply Chain Due Diligence Act (LkSG): Mandatory for 3,000+ German companies
- UK Modern Slavery Act: Requires annual statements
- California Transparency in Supply Chains Act: Mandates specific disclosures
What Documents Need Review?
Supplier Codes of Conduct
You’ll need to extract the policy existence and date, coverage scope (employees, contractors), specific provisions (child labor, forced labor, discrimination), enforcement mechanisms, and signature details. The challenge here is that these documents come in varying formats—PDFs, Word documents, web pages.
Certifications and Audits
Common certifications include ISO 14001 (environmental management), ISO 45001 (occupational health and safety), SA8000 (social accountability), EcoVadis (sustainability rating), and SEDEX (supplier ethical data exchange).
For each certification, extract the certificate number, issue and expiry dates, certification body, scope (which facilities are covered), and any scores or ratings.
Research from 2025 shows that 79% of companies struggle with supplier data availability. AI-powered extraction can improve response rates from 40% (manual processes) to over 65%, while cutting review time by 90%.
Policy Documents
Key policies to review include health and safety, human rights, anti-bribery and corruption, whistleblower protection, and data privacy. Extract whether each policy exists, when it was last updated, the key commitments it makes, and who approved it.
Questionnaire Responses
Suppliers may complete questionnaires through frameworks like EcoVadis, CDP supply chain, or custom assessments. Extract each response, whether supporting evidence was provided, self-assessed scores, and any improvement targets they’ve set.
How to Automate the Process
Step 1: Classify Each Document
First, automatically identify what type of document you’re working with.
# Schema for supplier compliance documents
classification_schema = {
"type": "object",
"properties": {
"document_type": {
"enum": [
"code_of_conduct",
"certification",
"policy_document",
"questionnaire_response",
"audit_report"
]
},
"supplier_id": {"type": "string"},
"supplier_name": {"type": "string"},
"document_date": {"type": "string", "format": "date"}
}
}
classification_template = {
"name": "supplier-doc-classifier",
"schema": classification_schema,
"instructions": """
Classify the supplier compliance document:
- Code of conduct: Look for "code of conduct," "supplier code," "ethical standards"
- Certification: Look for "certificate," "ISO," "certified by"
- Policy: Look for "policy," "procedure," "standard"
- Questionnaire: Look for questions, responses, ratings
- Audit report: Look for "audit," "assessment," "findings"
"""
}
Step 2: Extract Data Based on Document Type
Once you know the document type, extract the relevant fields using schemas specific to each document category.
For codes of conduct, you’ll capture whether a policy exists, when it was dated, who it covers (employees and contractors), what it prohibits (child labor, forced labor, discrimination), how it’s enforced, and whether it’s signed.
coc_schema = {
"type": "object",
"properties": {
"policy_exists": {"type": "boolean"},
"policy_date": {"type": "string", "format": "date"},
"covers_employees": {"type": "boolean"},
"covers_contractors": {"type": "boolean"},
"prohibitions": {
"child_labor": {"type": "boolean"},
"forced_labor": {"type": "boolean"},
"discrimination": {"type": "boolean"}
},
"enforcement_mechanism": {
"exists": {"type": "boolean"},
"description": {"type": "string"}
},
"signature": {
"signed": {"type": "boolean"},
"signatory": {"type": "string"},
"date": {"type": "string", "format": "date"}
}
}
}
For certifications, extract the certificate type, number, dates, certification body, scope, facilities covered, and scores.
certification_schema = {
"type": "object",
"properties": {
"certificate_type": {
"enum": ["ISO_14001", "ISO_45001", "SA8000", "EcoVadis", "SEDEX", "Other"]
},
"certificate_number": {"type": "string"},
"issued_date": {"type": "string", "format": "date"},
"expiry_date": {"type": "string", "format": "date"},
"certification_body": {"type": "string"},
"scope": {"type": "string"},
"facilities_covered": {"type": "array", "items": {"type": "string"}},
"score": {
"type": "object",
"properties": {
"overall": {"type": "number"},
"environmental": {"type": "number"},
"social": {"type": "number"}
}
}
}
}
For questionnaires, capture the questionnaire type, each question and response, whether evidence was provided, self-assigned scores, and overall completeness.
questionnaire_schema = {
"type": "object",
"properties": {
"questionnaire_type": {
"enum": ["EcoVadis", "CDP", "Custom"]
},
"responses": {
"type": "array",
"items": {
"type": "object",
"properties": {
"question_id": {"type": "string"},
"question": {"type": "string"},
"response": {"type": "string"},
"evidence_provided": {"type": "boolean"},
"self_score": {"type": "number"}
}
}
},
"overall_score": {"type": "number"},
"completeness_percentage": {"type": "number"}
}
}
Step 3: Calculate Risk Scores
With extracted data in hand, calculate a social compliance risk score for each supplier. This example uses a 0-100 scale where higher scores indicate greater risk.
FIG 2.0 — Algorithm for calculating supplier risk scores
def calculate_social_risk_score(extracted_data: dict) -> dict:
"""Calculate social compliance risk score (0-100, higher = riskier)."""
risk_score = 50 # Base score
# Code of conduct (20 points)
if not extracted_data.get("code_of_conduct", {}).get("policy_exists"):
risk_score += 20
# Certifications (15 points)
if not extracted_data.get("certifications", []):
risk_score += 15
# Questionnaire responses (15 points)
completeness = extracted_data.get("questionnaire", {}).get("completeness_percentage", 100)
if completeness < 80:
risk_score += 15
# Policy gaps (10 points)
missing_policies = extracted_data.get("missing_policies", [])
risk_score += len(missing_policies) * 2
# Audit findings (15 points)
critical_findings = extracted_data.get("audit_findings", {}).get("critical", 0)
risk_score += critical_findings * 5
risk_score = min(100, risk_score)
# Determine risk level
if risk_score >= 80:
risk_level = "critical"
recommendation = "Immediate action required, consider replacement"
elif risk_score >= 60:
risk_level = "high"
recommendation = "Require improvement plan, monitor closely"
elif risk_score >= 40:
risk_level = "medium"
recommendation = "Monitor and encourage improvement"
else:
risk_level = "low"
recommendation = "Standard monitoring"
return {
"risk_score": risk_score,
"risk_level": risk_level,
"recommendation": recommendation
}
Building the Pipeline
Here’s how the complete workflow fits together:
FIG 1.0 — End-to-end automated due diligence pipeline
Complete Implementation
This Python function shows the complete workflow for processing supplier documents.
from leapocr import LeapOCR
client = LeapOCR(api_key=os.getenv("LEAPOCR_API_KEY"))
def process_supplier_documents(supplier_id: str, documents: list[str]):
"""Process all compliance documents for a supplier."""
extracted_data = {
"supplier_id": supplier_id,
"documents": []
}
for doc_path in documents:
# Classify document
classification = client.ocr.process_file(
file_path=doc_path,
format="structured",
template_slug="supplier-doc-classifier"
)
class_result = client.ocr.wait_until_done(classification["job_id"])
doc_type = class_result["pages"][0]["result"]["document_type"]
# Extract based on type
extraction_template = f"supplier-{doc_type}-extractor"
extraction = client.ocr.process_file(
file_path=doc_path,
format="structured",
template_slug=extraction_template
)
extract_result = client.ocr.wait_until_done(extraction["job_id"])
data = extract_result["pages"][0]["result"]
extracted_data["documents"].append({
"type": doc_type,
"data": data,
"confidence": extract_result["pages"][0].get("confidence_score", 0)
})
# Calculate risk score
risk_score = calculate_social_risk_score(extracted_data)
# Save to database
save_supplier_assessment(extracted_data, risk_score)
return {
"supplier_id": supplier_id,
"risk_score": risk_score["risk_score"],
"risk_level": risk_score["risk_level"]
}
Real-World Example
An automotive company needed to assess 150 suppliers for German LkSG compliance. Each supplier submitted 5-8 documents including codes of conduct, certifications, policies, and questionnaires.
Manual processing would have required 45 minutes per supplier, totaling 112.5 person-days. At €1,000 per day, that’s €112,500 and a four-month timeline.
Using automated processing, they spent 5 minutes per supplier (upload plus review), totaling 12.5 person-days. The cost came to €12,500 for labor plus €750 for API usage—a €13,250 total. The entire project took three weeks.
The results: an 89% cost reduction, 75% time savings, 12 high-risk suppliers identified for engagement, and 37 suppliers flagged as missing critical certifications.
Costs and Benefits
For a project involving 150 suppliers, here’s how the numbers compare:
FIG 3.0 — Cost comparison: Manual vs. Automated Due Diligence
| Cost Component | Manual | Automated | Savings |
|---|---|---|---|
| Labor | €112,500 | €12,500 | €100,000 |
| API costs | €0 | €750 | -€750 |
| Tools | €0 | €2,500 | -€2,500 |
| Total | €112,500 | €15,750 | €96,750 (86% savings) |
Beyond cost savings, automation provides several strategic advantages:
Risk mitigation: Identify high-risk suppliers before problems occur, address compliance gaps proactively, and protect your brand reputation.
Regulatory readiness: Generate LkSG compliance documentation, maintain an audit trail for all assessments, and apply a standardized scoring methodology.
Supplier development: Engage with suppliers based on their risk scores, track improvements over time, and recognize leaders.
Scalability: Assess new suppliers in hours rather than days, re-assess existing suppliers annually, and expand the program to Tier 2 and Tier 3 suppliers.
Getting Started
30-Day Implementation Plan
Week 1: Template Development
- Days 1-3: Build classification template
- Days 4-5: Build extraction templates (code of conduct, certifications, questionnaires)
- Days 6-7: Test on 20 sample documents
Week 2: Pipeline Integration
- Days 8-10: Build risk scoring logic
- Days 11-12: Set up database schema and API
- Days 13-14: Develop dashboard
Week 3: Pilot Testing
- Days 15-17: Process 50 suppliers as a pilot
- Days 18-19: Validate results against manual review
- Days 20-21: Refine templates and scoring based on findings
Week 4: Rollout
- Days 22-24: Process remaining suppliers
- Days 25-26: Generate reports for procurement team
- Days 27-28: Plan engagement with high-risk suppliers
- Days 29-30: Present to stakeholders and plan Phase 2
Wrapping Up
Social compliance due diligence is both a regulatory requirement and a business necessity. Manual review doesn’t scale, but automation makes it feasible.
For 150 suppliers, automation delivers an 86% cost reduction (from €112K to €16K) and 75% faster assessments (from four months to three weeks). You get consistent scoring without human bias and audit-ready documentation with full traceability.
Companies that automate supplier due diligence now gain a competitive advantage: lower risk, faster supplier onboarding, and better supplier relationships.
Next Steps:
- Read Why Your ESG Data Quality is Low
- Explore Supplier Templates
- Try Supplier Due Diligence
Try LeapOCR on your own documents
Start with 100 free credits and see how your workflow holds up on real files.
Eligible paid plans include a 3-day trial with 100 credits after you add a credit card, so you can test actual PDFs, scans, and forms before committing to a rollout.
Keep reading
Related notes for the same operating context
More implementation guides, benchmarks, and workflow notes for teams building document pipelines.
LeapOCR vs. In-House RPA: Why VLM is a Better Investment for Logistics Automation
Robotic Process Automation (RPA) was a bridge technology. Learn why flexible Vision Language Models (VLM) are replacing brittle scripts in modern supply chains.
Real-Time Supply Chain Visibility: The Role of Structured Data from Warehouse Receipts
The warehouse receipt is the moment of truth for inventory. Learn how converting these documents into real-time structured data feeds eliminates shortage claims and speeds up order fulfillment.
Reducing Detention and Demurrage Costs with Automated Document Processing
Detention and demurrage fees are the silent killers of logistics margins. See how automated document processing stops the clock and saves $100+ per container daily.