Continuous exposures and g-computation

The Parametric G-Formula

  1. Fit a model for y ~ x + z where z is all confounders (maybe even y ~ x*z to allow for heterogeneous treatment effects)
  2. Create a duplicate of your dataset for each level of x
  3. Set x to a single value in each cloned dataset
  4. Make predictions using the model on each cloned dataset
  5. Calculate the estimate you want, e.g. mean(x_1) - mean(x_0)

Advantages of the parametric G-formula

Often more statistically precise than propensity-based methods

Incredibly flexible works for binary, continuous, and time-varying exposures

Does having Extra Magic Morning hours influence the average posted wait time for Seven Dwarfs Mine Train between 9 and 10am?

The Data

park_date park_extra_magic_morning park_ticket_season park_close park_temperature_high wait_minutes_posted_avg
2018-01-01 0 peak 23:00:00 58.63 60.00000
2018-01-02 0 peak 24:00:00 53.65 60.00000
2018-01-03 0 peak 24:00:00 51.11 60.00000
2018-01-04 0 regular 24:00:00 52.66 68.88889
2018-01-05 1 regular 24:00:00 54.29 70.55556
2018-01-06 0 regular 23:00:00 56.25 33.33333
2018-01-07 0 regular 21:00:00 65.21 46.36364
2018-01-08 0 value 20:00:00 70.77 69.54545
2018-01-09 0 value 20:00:00 75.15 64.28571
2018-01-10 0 value 20:00:00 74.25 74.28571

1. Fit the outcome model

gcomp_model <- lm(
  wait_minutes_posted_avg ~ 
    park_extra_magic_morning +
    park_ticket_season + 
    park_close + 
    park_temperature_high,
  data = seven_dwarfs_9
)

The model must include all variables in the adjustment set from the causal diagram.

2 & 3. Clone and set exposure

# set all park dates to have Extra Magic Morning
exposed <- seven_dwarfs_9 |> 
  mutate(park_extra_magic_morning = 1)

# set all park dates to have no Extra Magic Morning
unexposed <- seven_dwarfs_9 |> 
  mutate(park_extra_magic_morning = 0)

Each park date is “cloned” twice — once into a world where Extra Magic Morning occurred, and once where it did not. Covariates stay the same; only the exposure changes.

4. Make predictions

pred_exposed <- gcomp_model |> 
  augment(newdata = exposed) |> 
  select(pred_exposed = .fitted)

pred_unexposed <- gcomp_model |> 
  augment(newdata = unexposed) |> 
  select(pred_unexposed = .fitted)

5. Calculate the estimate

bind_cols(pred_exposed, pred_unexposed) |> 
  summarize(
    mean_exposed   = mean(pred_exposed),
    mean_unexposed = mean(pred_unexposed),
    difference     = mean_exposed - mean_unexposed
  )
# A tibble: 1 × 3
  mean_exposed mean_unexposed difference
         <dbl>          <dbl>      <dbl>
1         74.2           68.1       6.16

Or use {marginaleffects}

library(marginaleffects)
avg_comparisons(
  gcomp_model,
  variables = "park_extra_magic_morning"
)

 Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
     6.16       2.26 2.73  0.00636 7.3  1.74   10.6

Term: park_extra_magic_morning
Type:  response 
Comparison: 1 - 0

Continuous exposures

We recommend g-computation over propensity scores for continuous exposures because of stability issues

Continuous exposure

Do posted wait times at 8am affect actual wait times at 9am?

Set up the data

eight <- seven_dwarfs_train_2018 |>
  filter(wait_hour == 8) |>
  select(-wait_minutes_actual_avg)

nine <- seven_dwarfs_train_2018 |>
  filter(wait_hour == 9) |>
  select(park_date, wait_minutes_actual_avg)

wait_times <- eight |> 
  left_join(nine, by = "park_date") |> 
  drop_na(wait_minutes_actual_avg)

1. Fit the outcome model

fit_wait <- lm(
  wait_minutes_actual_avg ~ 
    splines::ns(wait_minutes_posted_avg, df = 3) +
    park_extra_magic_morning + 
    park_ticket_season + 
    park_close + 
    park_temperature_high,
  data = wait_times
)

We use a natural spline to flexibly model the potentially nonlinear relationship between posted and actual wait times.

2 & 3. Clone and set exposure

# what if posted wait was 30 minutes for everyone?
low  <- wait_times |> mutate(wait_minutes_posted_avg = 30)

# what if posted wait was 60 minutes for everyone?
high <- wait_times |> mutate(wait_minutes_posted_avg = 60)

For a continuous exposure, we choose specific values to compare — here, 30 vs 60 minutes posted wait.

4 & 5. Predict and estimate

pred_high <- fit_wait |> augment(newdata = high) |> select(pred_high = .fitted)
pred_low  <- fit_wait |> augment(newdata = low)  |> select(pred_low  = .fitted)

bind_cols(pred_high, pred_low) |> 
  summarize(
    mean_high  = mean(pred_high),
    mean_low   = mean(pred_low),
    difference = mean_high - mean_low
  )
# A tibble: 1 × 3
  mean_high mean_low difference
      <dbl>    <dbl>      <dbl>
1      29.8     40.7      -10.8

Or use {marginaleffects}

avg_comparisons(
  fit_wait,
  variables = list(wait_minutes_posted_avg = c(30, 60))
)

 Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
    -10.8       8.13 -1.33    0.182 2.5 -26.8   5.09

Term: wait_minutes_posted_avg
Type:  response 
Comparison: 60 - 30

Uncertainty quantification

Write a bootstrap function

fit_gcomp <- function(.split, ...) {
  .df <- as.data.frame(.split)
  
  # fit outcome model
  fit <- lm(
    wait_minutes_actual_avg ~ 
      splines::ns(wait_minutes_posted_avg, df = 3) +
      park_extra_magic_morning + park_ticket_season + 
      park_close + park_temperature_high,
    data = .df
  )
  
  # clone and set exposure
  low  <- .df |> mutate(wait_minutes_posted_avg = 30)
  high <- .df |> mutate(wait_minutes_posted_avg = 60)
  
  # predict
  pred_low  <- fit |> augment(newdata = low)  |> pull(.fitted)
  pred_high <- fit |> augment(newdata = high) |> pull(.fitted)
  
  # estimate
  tibble(
    term     = "difference",
    estimate = mean(pred_high) - mean(pred_low)
  )
}

Bootstrap with {rsample}

library(rsample)
boot_results <- bootstraps(wait_times, times = 1000, apparent = TRUE) |>
  mutate(results = map(splits, fit_gcomp))

int_pctl(boot_results, results) |> filter(term == "difference")
# A tibble: 1 × 6
  term       .lower .estimate .upper .alpha .method   
  <chr>       <dbl>     <dbl>  <dbl>  <dbl> <chr>     
1 difference  -30.3     -9.86   12.1   0.05 percentile

What about model misspecification and poor overlap?

G-computation and positivity

  • When overlap is poor, IPW weights blow up (the problem is visible!)
  • G-computation stays stable, but extrapolates from the outcome model into regions where one exposure group is rare
  • This can mask positivity violations rather than flag them

What about model misspecification and poor overlap?

If the outcome model is correctly specified, extrapolation works fine. If it’s misspecified, the bias is concentrated exactly where we have the least data to detect it.

Two recipes, same estimand

IPW

  • Model the exposure
  • Reweight to balance confounders
  • Often used for binary exposures (although methods exist for other types)
  • Positivity violations are visible (extreme weights)

G-computation

  • Model the outcome
  • Predict under counterfactual scenarios
  • Can handle any type of exposure
  • Positivity violations are silent (stable but potentially biased)

Remember! Neither method can conjure causal information that isn’t in the data.