Skip to main content

Causal Inference

[…] Process of drawing a conclusion between Cause and Effect

— Vogt (2011)

Think pharmacovigilance, comparative effectiveness, health policy, causes of diseases and disease progression. Informatics is uniquely suited to examine causal inference.

You need to pick a single theoretical philosophy. This is known as an identification strategy. There are several! INUS Conditions is one of them. Counterfactual Reasoning is another (very important in biomedical reasoning.)

If you have a confounder, you distort the effect. Study may be significant statistically but yeah.

INUS Conditions

  • Insufficient: A single factor doesn’t cause the outcome. Need other factors. Lit match is Insufficient for a fire.
  • Necessary: Required within a specific causal combination! Match + gasoline.
  • Unnecessary: The combo is not the only one. Match + Curtains.
  • Sufficient: The combo is enough. Match + Paper + Gasoline.

A lit match is an INUS condition for fire.

Think a pie chart with Paper, Match, Fire. If you remove any one condition, the pie chart is incompelte and not a condition for fire.

Criteria Based

Hill’s (prominent, 1965) and Susser’s. Structured guidance to assess whether an observed relationship is likely to be causal. Hill’s is empirical and this is very theoretical (can happen a priori or post-hoc). Again, guidelines.

Hill

  • Strength of association: strong association is less likely due to bias.
  • Consistency: Across populations
  • Specificity: Single cause → Single Effect (HIV → AIDS, simple stuff)
  • Temporality: cause must precede effect
  • Biological Gradient/Dose-Response: increased exposure → increased risk
  • Plausibility: Specifically biologic Plausibility. Lit search.
  • Coherence: A causal claim cannot contradict knowledge
  • Experiment: Conduct a prospective study and effect should follow (remove it and see also)
  • Analogy: Similar agents must produce similar effects

Guidelines and not rigid rules. “None can be regarded as a sine qua non.”

Susser (1973)

Core Criteria

  1. Association: statistics, quantifiable. W/o association, no causation.
  2. Temporality
  3. Direction: It must go from cause to effect. This is more related to multifactorial causality.

Additional Supporting Criteria

  1. Mechanism: Plausibility really
  2. Consistency with existing knowledge
  3. Predictive Performance

Multifactorial causality and contextual thinking

  1. Sufficiency and Necessity: Kinda like INUS Conditions
  2. Interaction and Synergism: Causes can interact in non-causal ways
  3. Causal Chains and Webs: Causes can interact in complex ways

Probabilistic Causality

Probabilistic Causation

A causes B if A happening increases the probability of B happening. Exposure increases risk but not certainty.

P(BA)>P(B¬A)P(B|A) > P(B|\neg A)

A makes B more likely! Booze and car crashes. Some people crash without drinking.

Singular Causation

Individual effects. did a specific exposure cause a specific outcome in an individual case.

Counterfactual Reasoning

“What if X had happened instead under different conditions?” This is what models claim to do if you think about it.

Individual → Pill → Outcome (function of treatment and covariates)

Now what if Individual did not take the drug? If you could, you would compute the individual treatement effect (ITE) — with drug minus without drug. ATE is averaged across people.

E[Y(T=1)]E[Y(T=0)]E[Y(T=1)] - E[Y(T=0)]

You cannot do this! The Fundamental Problem of Causal Inference. You cannot observe the counterfactual.

You approximate it using the Rubin Causal Model.

E[YT=1]=Average(Yi(T=1,X=Xi))E[YT=0]=Average(Yi(T=0,X=Xi))E[Y|T=1] = Average(Y_i(T=1, X=X_i)) \\ E[Y|T=0] = Average(Y_i(T=0, X=X_i))

Now the Counterfactual Treatement Effect = Observational Treatement Effect when two assumptions are met:

  1. SUTVA — Stable Unit Treatment Value Assumption (violation: one patient’s treatment/exposure may affect another’s outcome).
  2. Strong Ignorability: Treatment assignment is unconfounded and completely ignorable. Think RCTs! Other features about you do not dictate your exposure.

So Randomized Experimentation is the key here. This is why it’s a “Gold Standard” for very practical reasons. RCTs meet the Rubin’s Causal Model (#1 is assumed and #2 in theory.)

Problems with RCTs:

  • Unethical to deny treatment!
  • Expensive
  • Blind to “Subpopulation Heterogenity of Treatment Effect” (what it sounds like) — bias and poor external validity.

Think of a ‘true’ distribution of ATE that has two humps and you only randomize across just one ‘hump’. You get Local ATE (LATE). It is internally valid but not externally valid!

So to prevent this you would have to claim that (a) treatment effect homogenous or (b) you know all the Subpopulations OR that you have full coverage (all Subpopulations in sample.)

Think about expenses: richer populations. You might be desperate/sicker to sign up for an RCT. That box keeps getting smaller…

Observational Data

Passively collected. Lots of advantages. Large NN, cheaper (stuff’s already done). Broader (LATE is more externally valid.)

Internal Validity —→ We are measuring what we think we are measuring.

Now the lack of randomization that renders Observational studies invalid. What if if the doctor and their assignment based on who you are is the confounder? LATE would be less internally valid but more externally valid.

The Heart

The real heart is: How can you make Observational studies behave like RCTs?

Matching

See other notes

Weighting

A generalization of matcing. Roots in survey sampling. Larger weights to underrepresentation and smaller weights to overrepresentation.

Type 1Type 2
T =180%20%
T= 020%80%

Lower left needs to be upweighted. Weight populations such that YIndependentTXY \text{Independent}T|X is Ignorability which is important for counterfactual reasoning!

It’s not as simple as you think and there are several methods. E.g. Weighting by Odds, Kernel Weighting, etc.

Inverse Prob of Treatment (IPTW) / Inverse Propensity Weighting

Prob of being assigned to the treatment is the propensity score, P(Ti=1Xi)P(T_i=1|X_i):

wi=1P(Ti=1Xi)+11P(Ti=1Xi)w_i = \frac{1}{P(T_i=1|X_i)} + \frac{1}{1 - P(T_i=1|X_i)}

Stabilized weights

wi=P(Ti=1)P(Ti=1Xi)+1P(Ti=1)1P(Ti=1Xi)w_i = \frac{P(T_i = 1)}{P(T_i=1|X_i)} + \frac{1 - P(T_i = 1)}{1 - P(T_i=1|X_i)}

There’s another called AIPW (Augmented Inverse Propensity Weighting)…

info

NOTE The formulas above are for two but there may be more than two groups!

Adjustment

It can mean a lot of things but we’re talking abotu statistical adjustment. Use with matching methods!

Support -→ Overlappting covariates in treatment and control arms. TODO:

You can misspecify… what if the relationship (exposure, outcome, covariates) is non-linear and you’re using linear models?

Sample size is another problem that limits DOF for covariate adjustment.

Stratification

Problem here is you’ll get high variance with small sample size.

Multivariate Modeling Methods

AKA Response Surface Modeling. You explicitly model the relationship between treatment, covariates, and outcome. Logistic/Linear Regression. You can use with more complex ML methods.

Think about this and how you add CC.