From Casual to Causal

Lucy D’Agostino McGowan

Causal questions

  • The heart of causal analysis is the causal question.
  • It dictates data analysis, methods, and target populations.

Goals of data analysis

Causal questions are part of a broader set of questions we can ask with statistical techniques related to the primary tasks of data science:

description

prediction

causal inference

Goals of data analysis

  • The goal is often muddled by both the techniques we use (regression, for instance, is helpful for all three tasks) and the way we talk about them.
  • When researchers are interested in causal inference from non-randomized data, we often use euphemistic language like “association” instead of declaring our intent to estimate a causal effect

Schrödinger’s Causality

  • “Associate” most common root word for effects.
  • Only 1% used “cause.”
  • Action recommendations in 33% of studies.
  • Stronger action recommendations than implied by effect description.
  • Only 4% used formal causal models.

Schrödinger’s Causality

“Our results suggest that Schrödinger’s causal inference - where studies avoid stating (or even explicitly deny) an interest in estimating causal effects yet are otherwise embedded with causal intent, inference, implications, and recommendations - is common.”

Goals of data analysis

description

prediction

causal inference

Goals of data analysis

description

prediction

causal inference

Goals of data analysis

description

What phenomena occur / occurred in the past?

  • What is the prevalence of diabetes in the United States?
  • What are the demographics of our customers?

prediction

causal inference

Goals of data analysis

description

What phenomena occur / occurred in the past?

Validity concerns: Measurement error, sampling error

Connection to causal inference: Understanding population characteristics, examining outcome distributions, checking if data structure matches research question

prediction

causal inference

Goals of data analysis

description

What phenomena occur / occurred in the past?

Validity concerns: Measurement error, sampling error

Connection to causal inference: Understanding population characteristics, examining outcome distributions, checking if data structure matches research question

prediction

Whether a certain phenomena will occur given a set of circumstances

Validity concerns: Predictive accuracy, measurement error

Connection to causal inference: Some techniques use model predictions to answer causal questions

causal inference

Goals of data analysis

description

What phenomena occur / occurred in the past?

Validity concerns: Measurement error, sampling error

Connection to causal inference: Understanding population characteristics, examining outcome distributions, checking if data structure matches research question

prediction

Whether a certain phenomena will occur given a set of circumstances

Validity concerns: Predictive accuracy, measurement error

Connection to causal inference: Some techniques use model predictions to answer causal questions

causal inference

Why does a phenomena occur

Validity concerns: Lots of assumptions (many that cannot be checked, coming soon!)

Why Prediction ≠ Causation

Predictive power doesn’t guarantee causal accuracy, especially when:

  • The outcome has many causes but the model focuses on one exposure
  • The true causal effect is small
  • The model excludes non-causal predictors for methodological reasons

The Ice Cream & Crime Example

  • You have ice cream sales data but no weather data
  • Ice cream sales correlate with crime rates and can predict them moderately well – suppose this correlation exists because both variables are caused by weather!

Variables that are invalid from a causal perspective (like ice cream sales) can still provide predictive power by acting as proxies for true causal factors (like weather)

The first step is asking a good causal question

Diagramming causal claims

Smoking causes lung cancer

Let’s try to get more specific

Target Trial

Asking good causal questions

The claim

The evidence

Asking good causal questions

The claim

Smoking causes lung cancer

The evidence

For people who smoking 15+cigarettes a day, reducing smoking by 50% reduces the risk of lung cancer over 5-10 years

Asking good causal questions

The question

Does smoking causes lung cancer?

The evidence

For people who smoking 15+cigarettes a day, reducing smoking by 50% reduces the risk of lung cancer over 5-10 years

Asking good causal questions

The question

For people who smoke 15+ cigarettes a day, does reducing smoking by 50% reduce the lung cancer risk over 5-10 years?

The evidence

For people who smoking 15+cigarettes a day, reducing smoking by 50% reduces the risk of lung cancer over 5-10 years

Application Exercise



bit.ly/sta-779-f23-ae3