POC Case Study
From AI Decisions to Measurable Risk
Insights from the assessment of LLM-assisted insurance underwriting:
how systematic testing revealed hidden robustness failures before they reached production.
The Challenge
A major insurer deployed an LLM-based system to accelerate underwriting, extracting information from medical documents, summarizing histories, and supporting risk decisions. The system performed well under normal conditions, but there was no systematic way to verify whether it remained reliable under real-world document variations.
The Assessment
confora labs ran a robustness assessment using Confora Insight. Using 10 adversarial alteration types, we generated perturbed test cases from real document conditions: rotation, blur, noise, contrast changes, and more. We systematically stress-tested the extraction and summarization pipeline.
Assessment results visualization - screenshot placeholder
Key Finding
Rotated documents led to a sharp increase in extraction and summarization errors. Medical conditions were incorrectly omitted or added in underwriting summaries. This failure mode was invisible in normal operation: the system produced plausible output, but the content was wrong.
Outcome
→ Document quality flagging added before processing
→ Extraction pipeline retrained for rotation robustness
→ Robustness checks integrated into CI/CD pipeline
→ Findings documented in EU AI Act-aligned audit trail
→ Continuous monitoring set up to catch future regressions
"AI systems that work in normal conditions can fail silently under realistic variations.
Systematic testing turns invisible risks into documented, auditable findings."
confora labs assessment team
Ready to assess your AI risks?
Book a Demo
confora labs by spotixx GmbH