POC Case Study

From AI Decisions to Measurable Risk

Insights from the assessment of LLM-assisted insurance underwriting:
how systematic testing revealed hidden robustness failures before they reached production.

The Challenge

A major insurer deployed an LLM-based system to accelerate underwriting, extracting information from medical documents, summarizing histories, and supporting risk decisions. The system performed well under normal conditions, but there was no systematic way to verify whether it remained reliable under real-world document variations.

The Assessment

confora labs ran a robustness assessment using Confora Insight. Using 10 adversarial alteration types, we generated perturbed test cases from real document conditions: rotation, blur, noise, contrast changes, and more. We systematically stress-tested the extraction and summarization pipeline.

Assessment results visualization - screenshot placeholder

Key Finding

Rotated documents led to a sharp increase in extraction and summarization errors. Medical conditions were incorrectly omitted or added in underwriting summaries. This failure mode was invisible in normal operation: the system produced plausible output, but the content was wrong.

Outcome

Document quality flagging added before processing
Extraction pipeline retrained for rotation robustness
Robustness checks integrated into CI/CD pipeline
Findings documented in EU AI Act-aligned audit trail
Continuous monitoring set up to catch future regressions

"AI systems that work in normal conditions can fail silently under realistic variations.
Systematic testing turns invisible risks into documented, auditable findings."

confora labs assessment team

Ready to assess your AI risks?

Book a Demo

confora labs by spotixx GmbH