Skip to main content

On Thresholds

Thresholds are about decisions, about action based where we ‘cut’ the scores/rankings to perform an intervention. Let’s start with ye olde Mister Bayes.

Bayes-Optimal Threshold

Let’s say you are not punished for being right: cost of TP and TN is 00. Let’s say you start with an equal cost for FN and FP of 11. We’re trying to figure out the Expected Value of the cost given (a) you predicted something correctly and (b) incorrectly.

Probability if getting it right is pp and wrong is (1p)(1-p).

E[Costy^=1]=CostRightP(Y=1X)+CostFalse +veP(Y=0X)=0P(Y=1X)+1P(Y=0X)=1(1p)=(1p)\begin{align*} E[\text{Cost}| \hat{y} = 1] &= \text{Cost}_{\text{Right}} \cdot P(Y=1|X) + \text{Cost}_{\text{False +ve}} \cdot P(Y=0|X) \\ &= 0 \cdot P(Y=1|X) + 1 \cdot P(Y=0|X) \\ &= 1 \cdot (1 - p) = (1 - p) \end{align*}

Similarly, E[Costy^=0]=pE[\text{Cost}| \hat{y} = 0] = p

Now you want to minimize the punishment/cost so you predict positive only if the Expected Cost of doing so is less than the Expected Cost of not doing so (TODO: why?)

E[Costy^=1]<E[Costy^=0]    (1p)<p    p>0.5\begin{align*} E[\text{Cost}| \hat{y} = 1] &< E[\text{Cost}| \hat{y} = 0] \\ \implies (1 - p) &< p \\ \implies p &> 0.5 \end{align*}

Making this More General

What if FN and FP have different costs? Happens a lot in healthcare. “It is 10 times worse to miss a true cancer diagnosis” (FN). Let’s say CFPC_{FP} and CFNC_{FN} are the costs for FP and FN. Then,

E[Costy^=1]=CostRightP(Y=1X)+CostFPP(Y=0X)=CFP(1p)\begin{align*} E[\text{Cost}| \hat{y} = 1] &= \text{Cost}_{\text{Right}} \cdot P(Y=1|X) + \text{Cost}_{\text{FP}} \cdot P(Y=0|X) \\ & = C_{FP} \cdot (1 - p) \end{align*}

and E[Costy^=0]=CFNpE[\text{Cost}| \hat{y} = 0] = C_{FN} \cdot p. Again, since you’re scared of high cost/punishment, you predict positve only when:

E[Costy^=1]<E[Costy^=0]    CFP(1p)<CFNp    p>CFPCFP+CFN\begin{align*} E[\text{Cost}| \hat{y} = 1] &< E[\text{Cost}| \hat{y} = 0] \\ \implies C_{FP} \cdot (1 - p) &< C_{FN} \cdot p \\ \implies p &> \frac{C_{FP}}{C_{FP} + C_{FN}} \end{align*}

Say τ=CFPCFP+CFN\tau^* = \frac{C_{FP}}{C_{FP} + C_{FN}}. This is the Bayes-Optimal Threshold that minimizes the Expected Cost.

MORE GENERAL

In many cases, FNs are more ‘expensive’ than FPs. So if r=CFNCFPr = \frac{C_{FN}}{C_{FP}}, then

τ=11+r\tau^* = \frac{1}{1+r}

What are we saying with this?

Let’s say we set τ=0.2\tau = 0.2. What does that do to the costs of FN and FP? Simple math with 11+r=0.2    CFN=4×CFP\frac{1}{1+r} = 0.2 \implies C_{FN} = 4 \times C{FP}.

We’re saying “A FN is 4 times worse than a FP.” Think about cancer diagnoses or using this threshold to rush someone to emergency surgery. Try a lower one. τ=0.05\tau = 0.05. Now you’re saying a FN is 20x20x worse than a FP. And so on.

Net Benefit (Vickers et al)

Here’s the paper. This is the closest thing to “is the model clinically useful?” It simply asks:

How many True Positives did you gain after penalizing False Positives based on how ‘bad’ they are?

Behold the formula. The key to it is in the name, a rarity. The weighting reflects how much you’re willing to tolerate false positives to catch one true positive.

Net Benefit, NB=(BenefitHarm or Cost)=TPNFPNτ1τ=TPNFPNCFPCFN\begin{align*} \text{Net Benefit, NB} &= (\text{Benefit} - \text{Harm or Cost}) \\ &= \frac{TP}{N} - \frac{FP}{N}\cdot\frac{\tau^*}{1 - \tau^*} \\ & = \frac{TP}{N} - \frac{FP}{N} \cdot \frac{C_{FP}}{C_{FN}} \end{align*}

How nice! Just chip away the negatives from the positives to have a good idea of the true interventions.

What are we saying with this?

Let’s say NB=0.10NB = 0.10. We’re saying “Using this model is equivalent to getting 10 true-positive interventions per 100 patiants with no unnecessary interventions (FPs).”

But really, you should plot Net Benefit across a range of thresholds and not just pick one willy-nilly. Always ask “For which clinical workflow/preference/culture/resourcing is this model useful?”

If I tell you that the Net Benefit is 0.40.4, your immediate question should be “Cool but at what threshold?” If you have a high threshold, NB will almost never recommend a treatment! So when you plot the curve, it will reveal where the model is clinically valuable. A single number won’t give you this.