On Thresholds
Thresholds are about decisions, about action based where we ‘cut’ the scores/rankings to perform an intervention. Let’s start with ye olde Mister Bayes.
Bayes-Optimal Threshold
Let’s say you are not punished for being right: cost of TP and TN is . Let’s say you start with an equal cost for FN and FP of . We’re trying to figure out the Expected Value of the cost given (a) you predicted something correctly and (b) incorrectly.
Probability if getting it right is and wrong is .
Similarly,
Now you want to minimize the punishment/cost so you predict positive only if the Expected Cost of doing so is less than the Expected Cost of not doing so (TODO: why?)
Making this More General
What if FN and FP have different costs? Happens a lot in healthcare. “It is 10 times worse to miss a true cancer diagnosis” (FN). Let’s say and are the costs for FP and FN. Then,
and . Again, since you’re scared of high cost/punishment, you predict positve only when:
Say . This is the Bayes-Optimal Threshold that minimizes the Expected Cost.
MORE GENERAL
In many cases, FNs are more ‘expensive’ than FPs. So if , then
What are we saying with this?
Let’s say we set . What does that do to the costs of FN and FP? Simple math with .
We’re saying “A FN is 4 times worse than a FP.” Think about cancer diagnoses or using this threshold to rush someone to emergency surgery. Try a lower one. . Now you’re saying a FN is worse than a FP. And so on.
Net Benefit (Vickers et al)
Here’s the paper. This is the closest thing to “is the model clinically useful?” It simply asks:
How many True Positives did you gain after penalizing False Positives based on how ‘bad’ they are?
Behold the formula. The key to it is in the name, a rarity. The weighting reflects how much you’re willing to tolerate false positives to catch one true positive.
How nice! Just chip away the negatives from the positives to have a good idea of the true interventions.
What are we saying with this?
Let’s say . We’re saying “Using this model is equivalent to getting 10 true-positive interventions per 100 patiants with no unnecessary interventions (FPs).”
But really, you should plot Net Benefit across a range of thresholds and not just pick one willy-nilly. Always ask “For which clinical workflow/preference/culture/resourcing is this model useful?”
If I tell you that the Net Benefit is , your immediate question should be “Cool but at what threshold?” If you have a high threshold, NB will almost never recommend a treatment! So when you plot the curve, it will reveal where the model is clinically valuable. A single number won’t give you this.