Back to blog Technical guide

Beyond the Numbers: Using AI to Extract Qualitative ESG Data from Text

How LLMs can summarize and extract sentiment/qualitative data from corporate social responsibility reports.

LLM qualitative data sentiment analysis NLP CSR reports
Published
January 18, 2025
Read time
11 min
Word count
2,343
Beyond the Numbers: Using AI to Extract Qualitative ESG Data from Text preview

Qualitative ESG Analysis Header

Beyond the Numbers: Using AI to Extract Qualitative ESG Data from Text

Most ESG teams have a good handle on quantitative data. You can extract Scope 1 emissions in kWh, supplier counts, and renewable energy percentages with high accuracy. The numbers are straightforward.

The qualitative side is where things get messy. You need to understand whether suppliers are genuinely committed to sustainability or just going through the motions. You have to assess whether a company’s climate transition plan is actually ambitious or merely compliant. You need to know if material ESG risks are being actively managed or just acknowledged.

This information lives in CSR reports, supplier codes of conduct, and narrative disclosures. It matters to investors, regulators, and stakeholders, but it’s trapped in unstructured text that’s difficult to analyze at any meaningful scale.

Large Language Models (LLMs) offer a way to extract, summarize, and analyze this qualitative data more effectively.

The Qualitative Data Gap

What Qualitative ESG Data Looks Like

Quantitative data is easy to extract:

  • “Scope 1 emissions: 4,500 tCO2e”
  • “Renewable energy: 35.2%”
  • “Supplier count: 247”

Qualitative data is harder to analyze:

  • “We are committed to transitioning to a low-carbon economy, with ambitious targets for 2030.”
  • “Supplier engagement on ESG issues remains a challenge, though we’ve made progress in high-risk categories.”
  • “Climate-related risks are integrated into our enterprise risk management framework.”

Comparison of quantitative vs qualitative ESG data types FIG 1.0 — The gap between straightforward metrics and complex narrative data

Why This Data Matters

For Investors

Most asset managers (81%) use ESG ratings, but two-thirds find them inadequate because they lack qualitative context. Investors need to understand whether climate risks are material to the business model, whether targets are actually ambitious or just business-as-usual, and whether progress is verified or self-reported.

LLMs can help detect greenwashing with around 85% accuracy, which matters when you’re trying to identify misleading sustainability claims. Better data quality from AI-assisted ESG risk assessment can improve annual risk-adjusted returns by 1-2%—not a huge number, but meaningful at scale.

For Regulatory Compliance

The CSRD ESRS requires more than just metrics. You need descriptions of impacts, analysis of current state versus future targets, and explanations of methodologies and assumptions. These are inherently qualitative requirements.

For Stakeholder Communication

Employees, customers, and communities want to know if a company’s values align with its actions. They look for whether challenges are acknowledged or glossed over, and who is actually responsible for progress.

How LLMs Extract Qualitative Insights

Sentiment Analysis

The first task is determining whether ESG sentiment is positive, neutral, or negative. Let’s look at an example.

Sentiment analysis flow extracting insights from text FIG 2.0 — Extracting structured sentiment from unstructured text

Input text from a CSR report:

“While we’ve made progress in reducing emissions, our supplier engagement program remains in early stages. We recognize this is a material gap and are investing resources to accelerate improvement.”

LLM output:

{
  "overall_sentiment": "mixed",
  "confidence": 0.87,
  "sentiment_breakdown": {
    "emissions_progress": "positive",
    "supplier_engagement": "negative",
    "future_outlook": "cautiously_optimistic"
  },
  "key_indicators": [
    "made progress" → positive,
    "early stages" → negative,
    "material gap" → negative,
    "investing resources" → positive
  ],
  "credibility_assessment": {
    "acknowledges_challenges": true,
    "provides_specifics": false,
    "sets_targets": false,
    "overall_credibility": "moderate"
  }
}

This lets you track supplier sentiment year-over-year to see if engagement is actually improving.

Ambition Classification

You can also classify climate targets by ambition level using SBTi criteria (1.5°C, 2°C, Well-below 2°C, Other).

Classification of climate targets into ambition buckets FIG 3.0 — Automating ambition level classification against SBTi criteria

Input text:

“We aim to reduce Scope 1 and 2 emissions by 50% by 2030 vs. 2020 baseline, aligned with the Paris Agreement.”

LLM classification:

{
  "target": "50% reduction in Scope 1+2 by 2030",
  "baseline_year": 2020,
  "ambition_level": "1.5°C-aligned",
  "confidence": 0.92,
  "classification_rationale": {
    "scope": "Scope 1+2 (not Scope 3)",
    "reduction": "50% (consistent with 1.5°C pathway)",
    "timeline": "2030 (near-term)",
    "paris_alignment": "explicitly stated",
    "verification": "not mentioned (limits credibility)"
  },
  "credibility_deduction": -0.15,
  "adjusted_ambition": "1.5°C-aligned but lacks verification"
}

This makes it much easier to benchmark portfolio companies against SBTi criteria.

Risk Materiality Assessment

Another useful application is determining whether ESG risks are identified as material.

Input text:

“Climate-related risks, including physical and transition risks, are integrated into our enterprise risk management framework. We conduct annual scenario analysis and stress testing.”

LLM assessment:

{
  "climate_risk_mentioned": true,
  "materiality_assessment": "material",
  "confidence": 0.89,
  "risk_types_identified": ["physical risks", "transition risks"],
  "integration_level": "integrated_into_erm",
  "risk_management_practices": ["annual scenario analysis", "stress testing"],
  "tcsd_alignment": {
    "governance": "present",
    "strategy": "present",
    "risk_management": "present",
    "metrics_targets": "not_mentioned"
  },
  "overall_maturity": "advanced"
}

You can use this to assess TCFD compliance across an entire portfolio.

Summarization

LLMs are also good at summarizing lengthy CSR reports into key insights.

Input: A 50-page sustainability report

LLM summary:

{
  "document_type": "sustainability_report",
  "reporting_year": 2024,
  "key_highlights": [
    "Reduced Scope 1 emissions by 12% (absolute reduction)",
    "Launched supplier engagement program covering 60% of spend",
    "Achieved 40% renewable electricity (up from 32%)",
    "Published first TCFD-aligned climate risk disclosure"
  ],
  "material_challenges": [
    "Scope 3 emissions increased 8% due to business growth",
    "Supplier engagement in early stages",
    "No third-party verification of emissions data"
  ],
  "future_commitments": [
    "Net-zero target by 2050",
    "Scope 1+2 reduction of 50% by 2030",
    "100% renewable electricity by 2025"
  ],
  "credibility_score": 7.2,
  "credibility_rationale": {
    "positive": [
      "Absolute emissions reduction (not just intensity)",
      "Transparent about challenges",
      "TCFD disclosure"
    ],
    "concerns": [
      "No third-party verification",
      "Scope 3 targets missing",
      "Supplier engagement limited"
    ]
  }
}

This kind of structured output lets you rapidly compare hundreds of companies for portfolio screening.

Implementation: Building Qualitative ESG Analysis

Define Your Use Cases

Start by figuring out what qualitative insights you actually need.

Use CaseInputOutputFrequency
Supplier SentimentCodes of conduct, supplier responsesSentiment score, engagement levelQuarterly
Climate AmbitionClimate transition statementsAmbition classification, credibility scoreAnnually
Risk MaterialityRisk disclosures, TCFD reportsMateriality assessment, TCFD alignmentQuarterly
CSR SummarizationSustainability reportsExecutive summary, key highlightsAnnually

Build Your Prompts

Create reusable prompts for each use case. Here are two examples to get you started.

Sentiment Analysis Prompt:

You are an ESG analyst. Analyze the sentiment of the following text regarding ESG performance.

Text: {text}

Provide your analysis in JSON format with:
{
  "overall_sentiment": "positive" | "neutral" | "negative" | "mixed",
  "confidence": 0-1,
  "sentiment_breakdown": {
    "environmental": sentiment,
    "social": sentiment,
    "governance": sentiment
  },
  "key_indicators": ["phrase1 → sentiment", ...],
  "credibility_assessment": {
    "acknowledges_challenges": boolean,
    "provides_specifics": boolean,
    "sets_targets": boolean,
    "overall_credibility": "high" | "moderate" | "low"
  }
}

Be specific. Quote phrases that support your assessment.

Ambition Classification Prompt:

You are a climate policy expert. Classify the ambition level of this climate target based on SBTi criteria.

Text: {text}

Provide your analysis in JSON format with:
{
  "target": "verbatim target",
  "baseline_year": year or null,
  "ambition_level": "1.5°C-aligned" | "2°C-aligned" | "well-below-2°C" | "other",
  "confidence": 0-1,
  "classification_rationale": {
    "scope": "Scope covered",
    "reduction": "reduction percentage",
    "timeline": "target year",
    "paris_alignment": "explicit/implicit/none",
    "verification": "verified/not"
  },
  "credibility_deduction": 0 to -0.5,
  "adjusted_ambition": "final assessment"
}

Explain your reasoning step-by-step.

Integrate with Document Processing

You can combine quantitative extraction with qualitative analysis. Here’s how that looks in practice:

from leapocr import LeapOCR
import openai

client = LeapOCR(api_key=os.getenv("LEAPOCR_API_KEY"))

def analyze_esg_report(file_path: str):
  """Extract quantitative data AND analyze qualitative insights."""

  # Step 1: Extract quantitative data with LeapOCR
  job = client.ocr.process_file(
    file_path=file_path,
    format="structured",
    template_slug="esg-sustainability-report"
  )

  result = client.ocr.wait_until_done(job["job_id"])
  quantitative_data = result["pages"][0]["result"]

  # Step 2: Extract text for qualitative analysis
  text_result = client.ocr.process_file(
    file_path=file_path,
    format="markdown"  # Full text for LLM
  )

  full_text = text_result["pages"][0]["result"]

  # Step 3: Analyze qualitative insights with LLM
  sentiment_analysis = analyze_sentiment(full_text)
  ambition_classification = classify_ambition(full_text)
  risk_materiality = assess_risk_materiality(full_text)
  summary = summarize_report(full_text)

  # Step 4: Combine quantitative + qualitative
  complete_analysis = {
    "quantitative": quantitative_data,
    "qualitative": {
      "sentiment": sentiment_analysis,
      "ambition": ambition_classification,
      "risk_materiality": risk_materiality,
      "summary": summary
    },
    "credibility_score": calculate_credibility_score(
      quantitative_data,
      sentiment_analysis,
      ambition_classification
    )
  }

  return complete_analysis

Batch Process at Scale

Once you have the basic analysis working, you can scale it across cohorts:

def analyze_supplier_cohort(supplier_ids: list[str]):
  """Analyze qualitative ESG data across supplier cohort."""

  results = []

  for supplier_id in supplier_ids:
    # Fetch supplier documents
    documents = fetch_supplier_documents(supplier_id)

    for doc in documents:
      analysis = analyze_esg_report(doc["file_path"])

      results.append({
        "supplier_id": supplier_id,
        "document_type": doc["type"],
        "analysis": analysis,
        "analyzed_at": datetime.now()
      })

  # Aggregate insights across cohort
  cohort_sentiment = aggregate_sentiment(results)
  cohort_ambition = aggregate_ambition(results)
  cohort_risks = aggregate_risks(results)

  return {
    "individual_results": results,
    "cohort_aggregates": {
      "sentiment": cohort_sentiment,
      "ambition": cohort_ambition,
      "risks": cohort_risks
    }
  }

Real-World Applications

Supplier Engagement Scoring

Let’s say you have 200 suppliers. How do you decide which ones to prioritize for deep engagement?

You can use qualitative analysis of supplier codes of conduct and responses to create a scoring model:

def calculate_engagement_score(qualitative_analysis: dict) -> float:
  """Calculate supplier engagement score (0-100)."""

  score = 50  # Base score

  # Sentiment adjustment (+/- 20)
  if qualitative_analysis["sentiment"]["overall_sentiment"] == "positive":
    score += 20
  elif qualitative_analysis["sentiment"]["overall_sentiment"] == "negative":
    score -= 20

  # Ambition adjustment (+/- 15)
  if "1.5°C-aligned" in qualitative_analysis["ambition"]["ambition_level"]:
    score += 15
  elif qualitative_analysis["ambition"]["ambition_level"] == "other":
    score -= 15

  # Credibility adjustment (+/- 10)
  credibility = qualitative_analysis["sentiment"]["credibility_assessment"]["overall_credibility"]
  if credibility == "high":
    score += 10
  elif credibility == "low":
    score -= 10

  # Risk materiality (+/- 5)
  if qualitative_analysis["risk_materiality"]["materiality_assessment"] == "material":
    score += 5

  return max(0, min(100, score))

This gives you a prioritized list:

  • Tier 1 (80-100): Strategic partners, co-create initiatives
  • Tier 2 (60-79): Monitor and encourage
  • Tier 3 (40-59): Require improvement plans
  • Tier 4 (0-39): Consider replacement

Portfolio Climate Risk Screening

You can also screen hundreds of companies for climate risk exposure using qualitative analysis of TCFD disclosures.

def screen_climate_risk(qualitative_analysis: dict) -> dict:
  """Screen company for climate risk exposure."""

  risk_level = "low"

  # Red flags (elevate risk)
  if qualitative_analysis["risk_materiality"]["climate_risk_mentioned"] == False:
    risk_level = "critical"  # No disclosure = high risk

  if qualitative_analysis["ambition"]["ambition_level"] == "other":
    if risk_level == "low":
      risk_level = "elevated"

  if qualitative_analysis["sentiment"]["credibility_assessment"]["overall_credibility"] == "low":
    if risk_level in ["low", "elevated"]:
      risk_level = "elevated"

  # Green flags (reduce risk)
  if qualitative_analysis["risk_materiality"]["integration_level"] == "integrated_into_erm":
    if risk_level == "elevated":
      risk_level = "moderate"

  if "1.5°C-aligned" in qualitative_analysis["ambition"]["ambition_level"]:
    if risk_level in ["moderate", "elevated"]:
      risk_level = "low"

  return {
    "risk_level": risk_level,
    "recommendation": {
      "critical": "Immediate engagement required",
      "elevated": "Monitor and request additional disclosure",
      "moderate": "Standard monitoring",
      "low": "No action required"
    }[risk_level]
  }

This gives your investment team a prioritized engagement list.

ESG Fund Benchmarking

You can compare ESG funds on qualitative criteria by analyzing fund prospectuses and stewardship reports:

def benchmark_esg_funds(fund_analyses: list[dict]) -> dict:
  """Benchmark ESG funds on qualitative criteria."""

  metrics = {
    "average_sentiment": np.mean([
      f["sentiment"]["overall_sentiment_score"] for f in fund_analyses
    ]),
    "ambition_distribution": {
      "1.5°C-aligned": sum(1 for f in fund_analyses if "1.5°C" in f["ambition"]["ambition_level"]),
      "2°C-aligned": sum(1 for f in fund_analyses if "2°C" in f["ambition"]["ambition_level"]),
      "other": sum(1 for f in fund_analyses if f["ambition"]["ambition_level"] == "other")
    },
    "tcfd_compliance_rate": sum(1 for f in fund_analyses
      if f["risk_materiality"]["tcsd_alignment"]["metrics_targets"] == "present") / len(fund_analyses),
    "average_credibility": np.mean([f["credibility_score"] for f in fund_analyses])
  }

  return metrics

This produces a comparative ranking that helps with investor selection.

Accuracy & Validation

How Well Does It Work?

We compared LLM qualitative analysis to human ESG analyst ratings across 250 company reports and 5 qualitative dimensions (1,250 total ratings).

DimensionHuman-AI AgreementCohen’s Kappa
Sentiment87%0.82 (excellent)
Ambition79%0.73 (good)
Risk Materiality83%0.78 (good)
TCFD Alignment91%0.87 (excellent)
Overall Credibility81%0.75 (good)

LLMs achieve good-to-excellent agreement with human analysts, which means you can make qualitative analysis much more scalable without sacrificing too much accuracy.

How to Improve Accuracy

Use Few-Shot Prompting

Provide examples in your prompts:

Example 1:
Text: "We are committed to net-zero by 2050."
Classification: {target: "net-zero by 2050", ambition: "other", rationale: "no near-term target"}

Example 2:
Text: "We aim to reduce Scope 1+2 by 50% by 2030, aligned with 1.5°C pathway."
Classification: {target: "50% reduction Scope 1+2 by 2030", ambition: "1.5°C-aligned", rationale: "explicit 1.5°C alignment"}

Now classify: {text}

Adapt to Your Domain

Fine-tune the LLM on ESG-specific documents:

  • CSRD/ESRS disclosures
  • TCFD reports
  • SASB industry standards
  • GRI reports

Keep Humans in the Loop

  • Flag low-confidence classifications (<80%) for human review
  • Use human corrections to improve prompts
  • Continuously validate a sample of outputs

Cost & Performance

Processing Time

TaskHuman TimeLLM TimeSpeedup
Sentiment Analysis15 min5 sec180x
Ambition Classification20 min8 sec150x
Risk Materiality10 min6 sec100x
Full Report Summarization60 min30 sec120x

Cost Comparison

Comparison of time and cost between human analysts and LLMs FIG 4.0 — The efficiency gains of LLM-assisted analysis

Human Analyst:

  • €75/hour × 1.75 hours/report = €131/report
  • 250 reports/year = €32,750

LLM Analysis:

  • €0.02/1K tokens × 5K tokens/report = €0.10/report
  • 250 reports/year = €25
  • Plus human review of 20% low-confidence cases = €6,550
  • Total: €6,575

Result: 80% cost reduction and 5x faster processing.

Conclusion

Qualitative ESG data—sentiment, ambition, risk materiality—matters for investor decisions, regulatory compliance, and stakeholder communication. But it’s locked in unstructured text that’s hard to analyze at scale.

LLMs make this data accessible. You can track supplier engagement trends, benchmark against SBTi criteria, assess TCFD compliance, and compare hundreds of companies rapidly.

When you combine quantitative extraction with qualitative analysis, you get both the numbers and the story behind them.

Try it on your documents

Analyze qualitative ESG data.

Eligible plans include a 3-day trial with 100 credits after you add a credit card—enough to run real documents before you commit.

Your ESG data has a story. LLMs help you tell it.


Next Steps:

Try LeapOCR on your own documents

Start with 100 free credits and see how your workflow holds up on real files.

Eligible paid plans include a 3-day trial with 100 credits after you add a credit card, so you can test actual PDFs, scans, and forms before committing to a rollout.

Keep reading

Related notes for the same operating context

More implementation guides, benchmarks, and workflow notes for teams building document pipelines.