Continuous exposures and g-computation

The Parametric G-Formula

Fit a model for y ~ x + z where z is all confounders (maybe even y ~ x*z to allow for heterogeneous treatment effects)
Create a duplicate of your dataset for each level of x
Set x to a single value in each cloned dataset
Make predictions using the model on each cloned dataset
Calculate the estimate you want, e.g. mean(x_1) - mean(x_0)

Advantages of the parametric G-formula

Often more statistically precise than propensity-based methods

Incredibly flexible works for binary, continuous, and time-varying exposures

Does having Extra Magic Morning hours influence the average posted wait time for Seven Dwarfs Mine Train between 9 and 10am?

The Data

park_date	park_extra_magic_morning	park_ticket_season	park_close	park_temperature_high	wait_minutes_posted_avg
2018-01-01	0	peak	23:00:00	58.63	60.00000
2018-01-02	0	peak	24:00:00	53.65	60.00000
2018-01-03	0	peak	24:00:00	51.11	60.00000
2018-01-04	0	regular	24:00:00	52.66	68.88889
2018-01-05	1	regular	24:00:00	54.29	70.55556
2018-01-06	0	regular	23:00:00	56.25	33.33333
2018-01-07	0	regular	21:00:00	65.21	46.36364
2018-01-08	0	value	20:00:00	70.77	69.54545
2018-01-09	0	value	20:00:00	75.15	64.28571
2018-01-10	0	value	20:00:00	74.25	74.28571

1. Fit the outcome model

gcomp_model <- lm(
  wait_minutes_posted_avg ~ 
    park_extra_magic_morning +
    park_ticket_season + 
    park_close + 
    park_temperature_high,
  data = seven_dwarfs_9
)

The model must include all variables in the adjustment set from the causal diagram.

2 & 3. Clone and set exposure

# set all park dates to have Extra Magic Morning
exposed <- seven_dwarfs_9 |> 
  mutate(park_extra_magic_morning = 1)

# set all park dates to have no Extra Magic Morning
unexposed <- seven_dwarfs_9 |> 
  mutate(park_extra_magic_morning = 0)

Each park date is “cloned” twice — once into a world where Extra Magic Morning occurred, and once where it did not. Covariates stay the same; only the exposure changes.

4. Make predictions

pred_exposed <- gcomp_model |> 
  augment(newdata = exposed) |> 
  select(pred_exposed = .fitted)

pred_unexposed <- gcomp_model |> 
  augment(newdata = unexposed) |> 
  select(pred_unexposed = .fitted)

5. Calculate the estimate

bind_cols(pred_exposed, pred_unexposed) |> 
  summarize(
    mean_exposed   = mean(pred_exposed),
    mean_unexposed = mean(pred_unexposed),
    difference     = mean_exposed - mean_unexposed
  )

# A tibble: 1 × 3
  mean_exposed mean_unexposed difference
         <dbl>          <dbl>      <dbl>
1         74.2           68.1       6.16

Or use `{marginaleffects}`

library(marginaleffects)
avg_comparisons(
  gcomp_model,
  variables = "park_extra_magic_morning"
)


 Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
     6.16       2.26 2.73  0.00636 7.3  1.74   10.6

Term: park_extra_magic_morning
Type:  response 
Comparison: 1 - 0

Continuous exposures

Continuous exposure

Do posted wait times at 8am affect actual wait times at 9am?

Set up the data

eight <- seven_dwarfs_train_2018 |>
  filter(wait_hour == 8) |>
  select(-wait_minutes_actual_avg)

nine <- seven_dwarfs_train_2018 |>
  filter(wait_hour == 9) |>
  select(park_date, wait_minutes_actual_avg)

wait_times <- eight |> 
  left_join(nine, by = "park_date") |> 
  drop_na(wait_minutes_actual_avg)

1. Fit the outcome model

fit_wait <- lm(
  wait_minutes_actual_avg ~ 
    splines::ns(wait_minutes_posted_avg, df = 3) +
    park_extra_magic_morning + 
    park_ticket_season + 
    park_close + 
    park_temperature_high,
  data = wait_times
)

We use a natural spline to flexibly model the potentially nonlinear relationship between posted and actual wait times.

2 & 3. Clone and set exposure

# what if posted wait was 30 minutes for everyone?
low  <- wait_times |> mutate(wait_minutes_posted_avg = 30)

# what if posted wait was 60 minutes for everyone?
high <- wait_times |> mutate(wait_minutes_posted_avg = 60)

For a continuous exposure, we choose specific values to compare — here, 30 vs 60 minutes posted wait.

4 & 5. Predict and estimate

pred_high <- fit_wait |> augment(newdata = high) |> select(pred_high = .fitted)
pred_low  <- fit_wait |> augment(newdata = low)  |> select(pred_low  = .fitted)

bind_cols(pred_high, pred_low) |> 
  summarize(
    mean_high  = mean(pred_high),
    mean_low   = mean(pred_low),
    difference = mean_high - mean_low
  )

# A tibble: 1 × 3
  mean_high mean_low difference
      <dbl>    <dbl>      <dbl>
1      29.8     40.7      -10.8

Or use `{marginaleffects}`

avg_comparisons(
  fit_wait,
  variables = list(wait_minutes_posted_avg = c(30, 60))
)


 Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
    -10.8       8.13 -1.33    0.182 2.5 -26.8   5.09

Term: wait_minutes_posted_avg
Type:  response 
Comparison: 60 - 30

Uncertainty quantification

Write a bootstrap function

fit_gcomp <- function(.split, ...) {
  .df <- as.data.frame(.split)
  
  # fit outcome model
  fit <- lm(
    wait_minutes_actual_avg ~ 
      splines::ns(wait_minutes_posted_avg, df = 3) +
      park_extra_magic_morning + park_ticket_season + 
      park_close + park_temperature_high,
    data = .df
  )
  
  # clone and set exposure
  low  <- .df |> mutate(wait_minutes_posted_avg = 30)
  high <- .df |> mutate(wait_minutes_posted_avg = 60)
  
  # predict
  pred_low  <- fit |> augment(newdata = low)  |> pull(.fitted)
  pred_high <- fit |> augment(newdata = high) |> pull(.fitted)
  
  # estimate
  tibble(
    term     = "difference",
    estimate = mean(pred_high) - mean(pred_low)
  )
}

Bootstrap with `{rsample}`

library(rsample)
boot_results <- bootstraps(wait_times, times = 1000, apparent = TRUE) |>
  mutate(results = map(splits, fit_gcomp))

int_pctl(boot_results, results) |> filter(term == "difference")

# A tibble: 1 × 6
  term       .lower .estimate .upper .alpha .method   
  <chr>       <dbl>     <dbl>  <dbl>  <dbl> <chr>     
1 difference  -30.3     -9.86   12.1   0.05 percentile

What about model misspecification and poor overlap?

G-computation and positivity

When overlap is poor, IPW weights blow up (the problem is visible!)
G-computation stays stable, but extrapolates from the outcome model into regions where one exposure group is rare
This can mask positivity violations rather than flag them

What about model misspecification and poor overlap?

If the outcome model is correctly specified, extrapolation works fine. If it’s misspecified, the bias is concentrated exactly where we have the least data to detect it.

Two recipes, same estimand

IPW

Model the exposure
Reweight to balance confounders
Often used for binary exposures (although methods exist for other types)
Positivity violations are visible (extreme weights)

G-computation

Model the outcome
Predict under counterfactual scenarios
Can handle any type of exposure
Positivity violations are silent (stable but potentially biased)

Remember! Neither method can conjure causal information that isn’t in the data.

Continuous exposures and g-computation

The Parametric G-Formula

Advantages of the parametric G-formula

Does having Extra Magic Morning hours influence the average posted wait time for Seven Dwarfs Mine Train between 9 and 10am?

The Data

1. Fit the outcome model

2 & 3. Clone and set exposure

4. Make predictions

5. Calculate the estimate

Or use {marginaleffects}

Continuous exposures

We recommend g-computation over propensity scores for continuous exposures because of stability issues

Continuous exposure

Set up the data

1. Fit the outcome model

2 & 3. Clone and set exposure

4 & 5. Predict and estimate

Or use {marginaleffects}

Uncertainty quantification

Write a bootstrap function

Bootstrap with {rsample}

What about model misspecification and poor overlap?

G-computation and positivity

What about model misspecification and poor overlap?

Two recipes, same estimand

IPW

G-computation

Or use `{marginaleffects}`

Or use `{marginaleffects}`

Bootstrap with `{rsample}`