Skip to main content

Hypothesis Testing

TODO: Finish this.


You're a lawyer in a courtroom prosecuting someone for insurance fraud.

The Law Says
"Innocent until proven guilty."

You Say
"Guilty."

Judge
"I have a really high bar to rule 'Beyond Reasonable Doubt' that someone's guilty."

Evidence
Emails, texts, tax records, spouse's statements.

What you need to prove
How unusual/unexpected/surprising the evidence would be if the defendant is not innocent.


You never accept the Alternative/Research Hypothesis HaH_a! You either reject or fail to reject the Null Hypothesis H0H_0. All you're doing with your test is measuring how surprised you are under H0H_0 (see the court analogy above).


Why the negation? Why shouldn't you be asking "How expected is the evidence that the defendant is guilty?"

The reason is that the null hypothesis H0H_0 is well-defined.


Do you ever set H0:μμ0H_0: \mu \ne \mu_0 ... ?


"Which Test?" TLDR (finish this!)

To pick a test, and generally speaking, you'll be asking

  • What is the nature of my Data1? Continuous? Categorical?
  • How many groups am I dealing with? One, two, or more than two?

Here's a nice little table from this excellent video (by a Columbia alum!)


1 Group2 Groups2+ Groups
Categorical DataProportion Test (ZZ-test approx.)
χ2\chi^2 Test
Proportion Test (ZZ-test approx.)
χ2\chi^2 Test
χ2\chi^2 Test
Continuous DataZZ-test & Variants
tt-test & Variants
ZZ-test & Variants
tt-test & Variants
ANOVA (FF-test, 1-way, 2-way)
Classic Assumptions Violated2Sign Test
Signed Rank Test
Wilcoxon–Mann–Whitney Test
Paired tt-test
McNemar’s Test
Kruskal–Wallis Test

Footnotes

  1. Always seek to understand your Data all the time 🙏

  2. Too many outliers, small sample size, correlated observations