Estimating counterfactuals

Lucy D’Agostino McGowan

Image generated with Gemini

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth
— Robert Frost

Potential outcomes

  • Prior to some “cause” occurring, the potential outcomes are all of the potential things that could occur depending on what you end up exposed to

Potential outcomes

  • Let’s assume an exposure has two levels:
    • \(X=1\) if you are exposed
    • \(X=0\) if you are not exposed

Potential outcomes

  • Under this simple scenario, there are two potential outcomes:
    • \(Y(1)\) the potential outcome if you are exposed
    • \(Y(0)\) the potential outcome if you are not exposed

Potential outcomes

  • Only one of these potential outcomes will actually be realized
  • It is important to remember here that these exposures are defined at a particular instance in time, so only one can happen to any individual
  • In the case of a binary exposure, this leaves one potential outcome as observable and one missing

Potential outcomes

  • Our causal effect of interest is often some difference in potential outcomes \(Y(1) - Y(0)\), averaged over a particular population

Counterfactuals

  • Early causal inference methods were often framed as missing data problems
  • We need to make certain assumptions about the missing counterfactuals, the value of the potential outcome corresponding to the exposure(s) that did not occur
  • We wish we could observe the conterfactual outcome that would have occurred in an alternate universe

Counterfactuals

  • To do this, we attempt to control for all factors that are related to an exposure and outcome such that we can construct (or estimate) such a counterfactual outcome.

Ice-T and Spike

Split Decision: Life Stories

Award-winning actor, rapper, and producer Ice-T unveils a compelling memoir of his early life robbing jewelry stores until he found fame and fortune—while a handful of bad choices sent his former crime partner down an incredibly different path.

Vicky, CC BY 2.0 https://creativecommons.org/licenses/by/2.0, via Wikimedia Commons

Ice-T and Spike

flowchart LR
A{Ice-T} --> |observed| B(Abandons criminal life)
A -.-> |missing counterfactual| C(Does one more heist)
C -.-> D[35 years in prison]
B --> E[Fame & Fortune]

classDef grey fill:#fff
class D,C grey

flowchart LR
A{Spike} -.-> |missing counterfactual| B(Abandons criminal life)
A --> |observed| C(Does one more heist)
C --> D[35 years in prison]
B -.-> E[Fame & Fortune]
classDef grey fill:#fff
class E,B grey

Ice-T and Spike

  • What would need to be true for us to draw a causal conclusion?
  • Can we really conclude that Spike’s life would have turned out exactly like Ice-T’s if he had made the exact same choices as Ice-T?

In practice

  • We could conduct an experiment where we randomize many individuals to leave criminal life (or not) and see how this impacts their outcomes on average
  • This randomized trial seems to present some ethical issues, perhaps we need to look to observational studies to help answer this question
  • We must rely on statistical techniques to help construct these unobservable counterfactuals

Does chocolate ice cream make you happier than vanilla?

Happiness Simulation

  • Some happiness index exists that ranges from 1-10
  • We want to assess whether eating chocolate ice cream versus vanilla will increase happiness

Happiness Simulation (🔮)

What is the average causal effect?

Code
data <- tibble(
  id = 1:10,
  y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),
  y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)
)

data <- data |>
  mutate(causal_effect = " ")

data |>
  gt() |>
  cols_label(
    id = "ID",
    y_chocolate = md("$Y_{\\text{id}}(\\text{chocolate})$"),
    y_vanilla = md("$Y_{\\text{id}}(\\text{vanilla})$"),
    causal_effect = md("$Y_{\\text{id}}(\\text{chocolate}) - Y_{\\text{id}}(\\text{vanilla})$")
  ) |>
  fmt_markdown(
    columns = c(y_chocolate, y_vanilla, causal_effect)
  ) |>
  tab_header(
    title = md("**Potential Outcomes and Causal Effect**")
  ) |>
  tab_spanner(
    label = "Potential Outcomes",
    columns = c(y_chocolate, y_vanilla)
  ) |>
  tab_spanner(
    label = "Causal Effect",
    columns = causal_effect
  )
Potential Outcomes and Causal Effect
ID Potential Outcomes Causal Effect
\(Y_{\text{id}}(\text{chocolate})\) \(Y_{\text{id}}(\text{vanilla})\) \(Y_{\text{id}}(\text{chocolate}) - Y_{\text{id}}(\text{vanilla})\)
1

4

1

2

4

3

3

6

4

4

5

5

5

6

5

6

5

6

7

6

8

8

7

6

9

5

3

10

6

5

Happiness Simulation 🌫

What is the average causal effect?

Code
## we are doing something *random* so let's
## set a seed so we always observe the
## same result each time we run the code
set.seed(11)
data_observed <- data |>
  mutate(
    # change the exposure to randomized, generated from
    # a binomial distribution with a probability of 0.5 for
    # being in either group
    exposure = if_else(
      rbinom(n(), 1, 0.5) == 1, "chocolate", "vanilla"
    ),
    observed_outcome = case_when(
      exposure == "chocolate" ~ y_chocolate,
      exposure == "vanilla" ~ y_vanilla
    )
  )

avg_chocolate <- data_observed |>
  filter(exposure == "chocolate") |>
  pull(observed_outcome) |>
  mean()

avg_vanilla <- data_observed |>
  filter(exposure == "vanilla") |>
  pull(observed_outcome) |>
  mean()

data_observed |>
  mutate(
    y_chocolate = if_else(exposure == "chocolate", y_chocolate, NA),
    y_vanilla = if_else(exposure == "vanilla", y_vanilla, NA),
    causal_effect = NA_real_
  ) |>
  select(-observed_outcome, -exposure) |>
  gt() |>
  cols_label(
    id = "ID",
    y_chocolate = md("$Y_{\\text{id}}(\\text{chocolate})$"),
    y_vanilla = md("$Y_{\\text{id}}(\\text{vanilla})$"),
    causal_effect = md("$Y_{\\text{id}}(\\text{chocolate}) - Y_{\\text{id}}(\\text{vanilla})$")
  ) |>
  fmt_markdown(columns = c(y_chocolate, y_vanilla, causal_effect)) |>
  sub_missing(
    columns = c(y_chocolate, y_vanilla, causal_effect),
    missing_text = md("---") # Format missing values as blank
  ) |>
  tab_header(
    title = md("**Potential Outcomes and Hidden Causal Effect**")
  ) |>
  tab_spanner(
    label = "Potential Outcomes",
    columns = c(y_chocolate, y_vanilla)
  ) |>
  tab_spanner(
    label = "Causal Effect",
    columns = causal_effect
  )
Potential Outcomes and Hidden Causal Effect
ID Potential Outcomes Causal Effect
\(Y_{\text{id}}(\text{chocolate})\) \(Y_{\text{id}}(\text{vanilla})\) \(Y_{\text{id}}(\text{chocolate}) - Y_{\text{id}}(\text{vanilla})\)
1

1

2

3

3

6

4

5

5

5

6

5

7

8

8

6

9

5

10

5

Happiness Simulation 🕵️‍♀️

data_observed |>
  group_by(exposure) |>
  summarise(avg_outcome = mean(observed_outcome))
# A tibble: 2 × 2
  exposure  avg_outcome
  <chr>           <dbl>
1 chocolate        5.33
2 vanilla          4.71

Why did that (approximately) work?