Link Search Menu Expand Document


TRUST-LAPSE: Quantifying Model Mistrust

Advised by: Daniel Rubin, MD, MS

Professor of Biomedical Data Science, Radiology, and Medicine (Biomedical Informatics)

and (by courtesy) Computer Science and Ophthalmology,

Stanford University

Advised by: Christopher Lee-Messer, MD, PhD

Clinical Associate Professor

Neurology & Neurological Sciences and Pediatrics

Stanford School of Medicine


Deep learning models offer great promise for improving speed and quality of diagnosis and treatment in medicine. However, a major flaw with these methods is that they tend to be overconfident in cases where humans would quickly realize that they were out of their depth. This is due to an underlying assumption that variability encountered by the model after being deployed is drawn from the same distribution as the variability present in its training data.

In practice, it is difficult to ensure all real-world samples are drawn from the same distribution as the training data. The consequences are typically minor in consumer applications, but in medical cases, this overconfidence could lead to misdiagnosis, injury or even death. It is thus critical for a model used in medical applications to detect when to trust a model’s prediction and when to defer to a human expert.

Such a system needs to be actionable, post-hoc, explainable and high-performing to be useful in practice. We developed TRUST-LAPSE, a mistrust-scoring framework designed on complementary metrics on induced hierarchical latent-spaces and sequential tracking to determine trust. Check out our publications for more details.