Open
2

Model Uncertainty Quantification for Safety-Critical Decisions

alignmentevalsmonitoringbenchmarks
Difficulty
Advanced
Verification
Automatic Verification
Compute
Inference Only

Description

Build reliable confidence estimates for when AI should defer to humans

Develop methods for accurately quantifying model uncertainty in safety-critical contexts, enabling systems to know when to defer to human judgment. **Background:** Current LLMs are often overconfident in their responses, even when wrong. For safety-critical applications (medical advice, legal guidance, financial decisions), models need reliable uncertainty estimates. **Expected Output:** - Uncertainty quantification method that works with black-box API models - Calibration metrics showing confidence correlates with accuracy - Deferral policy that triggers human review when uncertainty exceeds threshold - Evaluation on domains with ground truth (e.g., medical QA, legal facts) **Success Criteria:** - Expected Calibration Error (ECE) < 0.05 - Deferral captures >90% of model errors - Minimal unnecessary deferrals (<10% of correct answers)

Created: 1/20/2026

Last updated: 1/20/2026