Model Uncertainty Quantification for Safety-Critical Decisions
Description
Build reliable confidence estimates for when AI should defer to humans
Develop methods for accurately quantifying model uncertainty in safety-critical contexts, enabling systems to know when to defer to human judgment. **Background:** Current LLMs are often overconfident in their responses, even when wrong. For safety-critical applications (medical advice, legal guidance, financial decisions), models need reliable uncertainty estimates. **Expected Output:** - Uncertainty quantification method that works with black-box API models - Calibration metrics showing confidence correlates with accuracy - Deferral policy that triggers human review when uncertainty exceeds threshold - Evaluation on domains with ground truth (e.g., medical QA, legal facts) **Success Criteria:** - Expected Calibration Error (ECE) < 0.05 - Deferral captures >90% of model errors - Minimal unnecessary deferrals (<10% of correct answers)
Created: 1/20/2026
Last updated: 1/20/2026