Uncertainty estimations methods for a deep learning model to aid in clinical decision-making — a clinician’s perspective

Kavli Affiliate: Jing Wang

| First 5 Authors: Michael Dohopolski, Kai Wang, Biling Wang, Ti Bai, Dan Nguyen

| Summary:

Prediction uncertainty estimation has clinical significance as it can
potentially quantify prediction reliability. Clinicians may trust ‘blackbox’
models more if robust reliability information is available, which may lead to
more models being adopted into clinical practice. There are several deep
learning-inspired uncertainty estimation techniques, but few are implemented on
medical datasets — fewer on single institutional datasets/models. We sought to
compare dropout variational inference (DO), test-time augmentation (TTA),
conformal predictions, and single deterministic methods for estimating
uncertainty using our model trained to predict feeding tube placement for 271
head and neck cancer patients treated with radiation. We compared the area
under the curve (AUC), sensitivity, specificity, positive predictive value
(PPV), and negative predictive value (NPV) trends for each method at various
cutoffs that sought to stratify patients into ‘certain’ and ‘uncertain’
cohorts. These cutoffs were obtained by calculating the percentile
"uncertainty" within the validation cohort and applied to the testing cohort.
Broadly, the AUC, sensitivity, and NPV increased as the predictions were more
‘certain’ — i.e., lower uncertainty estimates. However, when a majority vote
(implementing 2/3 criteria: DO, TTA, conformal predictions) or a stricter
approach (3/3 criteria) were used, AUC, sensitivity, and NPV improved without a
notable loss in specificity or PPV. Especially for smaller, single
institutional datasets, it may be important to evaluate multiple estimations
techniques before incorporating a model into clinical practice.

| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=10