Analysis of Cohort Studies II
Logistic Regression
This is specifically for binary outcomes. You use the logit function which is the natural log of the odds .
Probability and odds are similar when probability is low but odds grow very fast. There’s another reason too: Probability . Odds is . Apply a natural log and now logit is . So now you’re kinda back to linear regression. It’s a nice mathematical trick you use for when you want probabilities.
Now is the probability of seeing in .
So here is the log-odds of (and is the odds of ).
Odds are stationed around 1. If it’s 1.5, 50% more, if it’s 0.7, 30% less. So if you have a negative is a ‘protective’ factor when it comes to interventions.
Assumptions
- Outcome is binary.
- Logit(p) and predictors is linear.
- IID.
- Multicollinearity doesn’t exist in a multivariable setting.
- There should be a ‘sufficient’ sample size.
Some rule of thumb: if you have 5 covariates, you will need 50 labels. Rule of 10
Class Imbalance
What if you have a severe class imbalance?
- Downsample major class
- Upsample minor class
- Both?
- Assign Weights