Estimating counterfactuals

Lucy D’Agostino McGowan

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth
— Robert Frost

Potential outcomes

Prior to some “cause” occurring, the potential outcomes are all of the potential things that could occur depending on what you end up exposed to

Potential outcomes

Let’s assume an exposure has two levels:
- \(X=1\) if you are exposed
- \(X=0\) if you are not exposed

Potential outcomes

Under this simple scenario, there are two potential outcomes:
- \(Y(1)\) the potential outcome if you are exposed
- \(Y(0)\) the potential outcome if you are not exposed

Potential outcomes

Only one of these potential outcomes will actually be realized
It is important to remember here that these exposures are defined at a particular instance in time, so only one can happen to any individual
In the case of a binary exposure, this leaves one potential outcome as observable and one missing

Potential outcomes

Our causal effect of interest is often some difference in potential outcomes \(Y(1) - Y(0)\), averaged over a particular population

Counterfactuals

Early causal inference methods were often framed as missing data problems
We need to make certain assumptions about the missing counterfactuals, the value of the potential outcome corresponding to the exposure(s) that did not occur
We wish we could observe the conterfactual outcome that would have occurred in an alternate universe

Counterfactuals

To do this, we attempt to control for all factors that are related to an exposure and outcome such that we can construct (or estimate) such a counterfactual outcome.

Ice-T and Spike

Split Decision: Life Stories

Award-winning actor, rapper, and producer Ice-T unveils a compelling memoir of his early life robbing jewelry stores until he found fame and fortune—while a handful of bad choices sent his former crime partner down an incredibly different path.

Vicky, CC BY 2.0 https://creativecommons.org/licenses/by/2.0, via Wikimedia Commons

Ice-T and Spike

flowchart LR
A{Ice-T} --> |observed| B(Abandons criminal life)
A -.-> |missing counterfactual| C(Does one more heist)
C -.-> D[35 years in prison]
B --> E[Fame & Fortune]

classDef grey fill:#fff
class D,C grey

flowchart LR
A{Spike} -.-> |missing counterfactual| B(Abandons criminal life)
A --> |observed| C(Does one more heist)
C --> D[35 years in prison]
B -.-> E[Fame & Fortune]
classDef grey fill:#fff
class E,B grey

Ice-T and Spike

What would need to be true for us to draw a causal conclusion?
Can we really conclude that Spike’s life would have turned out exactly like Ice-T’s if he had made the exact same choices as Ice-T?

In practice

We could conduct an experiment where we randomize many individuals to leave criminal life (or not) and see how this impacts their outcomes on average
This randomized trial seems to present some ethical issues, perhaps we need to look to observational studies to help answer this question
We must rely on statistical techniques to help construct these unobservable counterfactuals

Does chocolate ice cream make you happier than vanilla?

Happiness Simulation

Some happiness index exists that ranges from 1-10
We want to assess whether eating chocolate ice cream versus vanilla will increase happiness

Happiness Simulation (🔮)

What is the average causal effect?

Code

data <- tibble(
  id = 1:10,
  y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),
  y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)
)

data <- data |>
  mutate(causal_effect = " ")

data |>
  gt() |>
  cols_label(
    id = "ID",
    y_chocolate = md("$Y_{\\text{id}}(\\text{chocolate})$"),
    y_vanilla = md("$Y_{\\text{id}}(\\text{vanilla})$"),
    causal_effect = md("$Y_{\\text{id}}(\\text{chocolate}) - Y_{\\text{id}}(\\text{vanilla})$")
  ) |>
  fmt_markdown(
    columns = c(y_chocolate, y_vanilla, causal_effect)
  ) |>
  tab_header(
    title = md("**Potential Outcomes and Causal Effect**")
  ) |>
  tab_spanner(
    label = "Potential Outcomes",
    columns = c(y_chocolate, y_vanilla)
  ) |>
  tab_spanner(
    label = "Causal Effect",
    columns = causal_effect
  )

Potential Outcomes and Causal Effect
ID	Potential Outcomes		Causal Effect
ID	\(Y_{\text{id}}(\text{chocolate})\)	\(Y_{\text{id}}(\text{vanilla})\)	\(Y_{\text{id}}(\text{chocolate}) - Y_{\text{id}}(\text{vanilla})\)
1	4	1
2	4	3
3	6	4
4	5	5
5	6	5
6	5	6
7	6	8
8	7	6
9	5	3
10	6	5

Happiness Simulation 🌫

What is the average causal effect?

Code

## we are doing something *random* so let's
## set a seed so we always observe the
## same result each time we run the code
set.seed(11)
data_observed <- data |>
  mutate(
    # change the exposure to randomized, generated from
    # a binomial distribution with a probability of 0.5 for
    # being in either group
    exposure = if_else(
      rbinom(n(), 1, 0.5) == 1, "chocolate", "vanilla"
    ),
    observed_outcome = case_when(
      exposure == "chocolate" ~ y_chocolate,
      exposure == "vanilla" ~ y_vanilla
    )
  )

avg_chocolate <- data_observed |>
  filter(exposure == "chocolate") |>
  pull(observed_outcome) |>
  mean()

avg_vanilla <- data_observed |>
  filter(exposure == "vanilla") |>
  pull(observed_outcome) |>
  mean()

data_observed |>
  mutate(
    y_chocolate = if_else(exposure == "chocolate", y_chocolate, NA),
    y_vanilla = if_else(exposure == "vanilla", y_vanilla, NA),
    causal_effect = NA_real_
  ) |>
  select(-observed_outcome, -exposure) |>
  gt() |>
  cols_label(
    id = "ID",
    y_chocolate = md("$Y_{\\text{id}}(\\text{chocolate})$"),
    y_vanilla = md("$Y_{\\text{id}}(\\text{vanilla})$"),
    causal_effect = md("$Y_{\\text{id}}(\\text{chocolate}) - Y_{\\text{id}}(\\text{vanilla})$")
  ) |>
  fmt_markdown(columns = c(y_chocolate, y_vanilla, causal_effect)) |>
  sub_missing(
    columns = c(y_chocolate, y_vanilla, causal_effect),
    missing_text = md("---") # Format missing values as blank
  ) |>
  tab_header(
    title = md("**Potential Outcomes and Hidden Causal Effect**")
  ) |>
  tab_spanner(
    label = "Potential Outcomes",
    columns = c(y_chocolate, y_vanilla)
  ) |>
  tab_spanner(
    label = "Causal Effect",
    columns = causal_effect
  )

Potential Outcomes and Hidden Causal Effect
ID	Potential Outcomes		Causal Effect
ID	\(Y_{\text{id}}(\text{chocolate})\)	\(Y_{\text{id}}(\text{vanilla})\)	\(Y_{\text{id}}(\text{chocolate}) - Y_{\text{id}}(\text{vanilla})\)
1	—	1	—
2	—	3	—
3	6	—	—
4	—	5	—
5	—	5	—
6	5	—	—
7	—	8	—
8	—	6	—
9	5	—	—
10	—	5	—

Happiness Simulation 🕵️‍♀️

data_observed |>
  group_by(exposure) |>
  summarise(avg_outcome = mean(observed_outcome))

# A tibble: 2 × 2
  exposure  avg_outcome
  <chr>           <dbl>
1 chocolate        5.33
2 vanilla          4.71

Why did that (approximately) work?