Hypothesis Testing

We really want to know if our interventions actually caused things in the world. Associations/correlations are nice but establishing causality is nicer.

With hypotheses, you are examining this association (and/or causality). You’re making statements about the population too (i.e. your study is not about the 26 people enrolled in it).

FINER Criteria

Feasible
Interesting
Novel
Ethical
Relevant

Now if you suspect some association or causality but there’s no evidence out there (or it’s all over the place) you can perform a hypothesis generating study.

“Main outcome” is an emphasis in RCTs (the ‘gold standards’). Keep it simple. You focus on one primary outcome. You typically have one predictor and one outcome. You certainly can have multiple predictors and outcomes.

In defining a good hypothesis, you are engaged in the operationalization of all theoretical constructs into specific things (variables) you’ll be studying.

A good hypothesis precedes the experiment.

note

Aside on the crisis in statistics. TODO: Read Andrew Gelman’s blog post.

Falsifiability

A good hypothesis is falsifiable. “Science advances by rejecting inadequate theories” - Kuhn. 👉 This is why you start with the null hypothesis (which is what you want to disprove; there’s no relationships). The alternative hypothesis is the actual hypothesized association. Here, two-sidedness means there is an association that is unspecified (it’s bi-directional: your intervention made something better or worse) and it’s more conservative and preferred. Then there’s a one-sidedness which means association in a direction. If you do statistical hypothesis testing, you are trying to disprove the null hypothesis, the fantastic scientist that you are.

This is where the p-value comes in. It is:

\text{Probability}(\text{Seeing your results if Null Hypothesis is true}) \\ P(\text{Seeing Results}|H_0 = 1)

If there is a very low probability

Validity

Internal Validity refers to how sound your statistical treatment is.
External Validity refers to generalizability.

Effect Size

TODO: What do you do here? Sure there’s an association but how ‘big’ is it?

Type I and II Errors

Type I, $\alpha$ is the probability of rejecting the Null if it is true (NO association!)
Type II, $\beta$ is the probability of ~~accepting~~ failing to reject the Null if it is false (YES association!)

Sampling

There’s a lot of work here (given the importance/preponderance of the CLT). Think of this:

Subject
  -> Sample
      -> Accessible Population
          -> Target Population
              -> Population

So how do you pick? You establish Eligibility Criteria (inclusion and exclusion). Doing this limits the effect of extraneous variables (“clean signal”). RCTs specify stringent inclusion criteria btw.

Now you need a sample that is representative of your population else you end up with Systematic Sampling Error where the sample mean and population mean don’t agree with each other. If they agree but the sample mean is flatter than the population mean, you have a Random Sampling Error.

You can perform Random Sampling by (a) sampling randomly lol (b) stratifying and then sampling randomly lol (c) sampling in clusters of people with X and people without X, like where natural clusters emerge.

If you have an ordered list of the population (like a census) you can do Systematic Sampling.

Falsifiability​

Validity​

Effect Size​

Type I and II Errors​

Sampling​

Falsifiability

Validity

Effect Size

Type I and II Errors

Sampling