Exercises in R

Part 1

Causal Diagrams in R

Why Causal Diagrams?

  • Associations don’t necessarily imply causation, we can be explicit about our assumptions
  • A Directed Acyclic Graph (DAG) encodes causal structure visually
  • DAGs help us identify confounders to adjust for, colliders to avoid conditioning on, and whether our causal question is even answerable

Example

Does regular exercise cause lower systolic blood pressure?

Exposure: exercise: does the patient exercise regularly? (yes/no)

Outcome: systolic_bp: systolic blood pressure (mmHg)

Proposed confounders:

Variable Column
Age age
BMI bmi
Diet quality diet_quality
Smoking status smoking

Setup

library(tidyverse)
library(ggdag)
library(dagitty)

set.seed(1)

Specify a DAG with dagify()

bp_dag <- dagify(
  systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
  exercise    ~ age + bmi + diet_quality + smoking,
  smoking     ~ diet_quality,
  exposure = "exercise",
  outcome  = "systolic_bp",
  labels = c(
    exercise    = "Regular\nExercise",
    systolic_bp = "Systolic BP",
    age         = "Age",
    bmi         = "BMI",
    diet_quality = "Diet\nQuality",
    smoking     = "Smoking"
  )
)

Note

Each formula reads as “this thing is caused by these things”

Plot with ggdag()

ggdag(bp_dag, use_labels = "label", text = FALSE) +
  theme_dag()

Time-Ordered Layout

bp_dag_to <- dagify(
  systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
  exercise    ~ age + bmi + diet_quality + smoking,
  smoking     ~ diet_quality,
  exposure = "exercise",
  outcome  = "systolic_bp",
  coords   = time_ordered_coords(),        # <-- new!
  labels = c(
    exercise = "Regular\nExercise",
    systolic_bp = "Systolic BP",
    age = "Age", bmi = "BMI",
    diet_quality = "Diet\nQuality",
    smoking = "Smoking"
  )
)
ggdag(bp_dag_to, use_labels = "label", text = FALSE) + theme_dag()

Time-Ordered Layout

Open Paths with ggdag_paths()

ggdag_paths(
  bp_dag,
  use_labels = "label",
  text = FALSE
) +
  theme_dag()

Open Paths with ggdag_paths()

Adjustment Sets

What do we need to control for to close all backdoor paths?

ggdag_adjustment_set(
  bp_dag,
  use_labels = "label",
  text = FALSE
) +
  theme_dag()

Adjustment Sets

Adjustment Sets in Text

dagitty::adjustmentSets(bp_dag)
{ age, bmi, diet_quality, smoking }

Note

If there is more than one adjustment set, you can adjust for any one of these sets and you’ve blocked all backdoor paths.

What if a Variable is Unmeasured?

bp_dag_latent <- dagify(
  systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
  exercise    ~ age + bmi + diet_quality + smoking,
  smoking     ~ diet_quality,
  exposure = "exercise", outcome = "systolic_bp",
  latent   = "diet_quality",          # <-- unmeasured!
  labels = c(exercise = "Regular\nExercise", systolic_bp = "Systolic BP",
             age = "Age", bmi = "BMI",
             diet_quality = "Diet\nQuality", smoking = "Smoking")
)
ggdag_adjustment_set(bp_dag_latent, use_labels = "label", text = FALSE) +
  theme_dag()

What if a Variable is Unmeasured?

Part 1 Takeaways

  • Use dagify() to write down your causal assumptions
  • ggdag() and ggdag_paths() reveal open (confounding) paths
  • ggdag_adjustment_set() tells you what to control for
  • time_ordered_coords() makes DAGs cleaner and more readable

Part 2

Propensity Scores

What is a Propensity Score?

The propensity score is the probability of receiving the exposure given observed covariates:

\[e(X) = P(\text{Exercise} = 1 \mid X)\]

  • Collapses all confounders into a single number
  • Patients with the same propensity score are (in expectation) balanced on all observed confounders
  • We estimate it with logistic regression

Fit a Propensity Score Model

library(broom)

propensity_model <- glm(
  exercise ~ age + bmi + diet_quality + smoking,
  data   = health_study,
  family = binomial()
)

tidy(propensity_model)
# A tibble: 6 × 5
  term             estimate std.error statistic p.value
  <chr>               <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)      -0.139      0.851    -0.163  0.870  
2 age               0.00878    0.0104    0.846  0.398  
3 bmi              -0.0712     0.0258   -2.76   0.00580
4 diet_qualitygood  0.530      0.259     2.05   0.0404 
5 diet_qualitypoor  0.00984    0.300     0.0328 0.974  
6 smoking          -0.969      0.322    -3.01   0.00258

Add Propensity Scores with augment()

health_ps <- propensity_model |>
  augment(type.predict = "response", data = health_study)

health_ps |>
  select(id, exercise, .fitted) |>
  head(6)
# A tibble: 6 × 3
     id exercise .fitted
  <int>    <int>   <dbl>
1     1        0   0.163
2     2        0   0.108
3     3        0   0.130
4     4        0   0.158
5     5        0   0.168
6     6        0   0.136

Note

.fitted is each person’s estimated probability of exercising

Visualise the Propensity Score Distribution

ggplot(health_ps,
       aes(x = .fitted, fill = factor(exercise))) +
  geom_histogram(bins = 30, alpha = 0.7, position = "identity") +
  scale_fill_manual(values = c("#E69F00", "#56B4E9"),
                    labels = c("No exercise", "Exercises")) +
  labs(x = "Propensity Score", fill = NULL) +
  theme_minimal()

Visualise the Propensity Score Distribution

Matching with MatchIt

library(MatchIt)

matched_health <- matchit(
  exercise ~ age + bmi + diet_quality + smoking,
  data    = health_study,
  link    = "linear.logit",
  caliper = 0.2
)

matched_df <- get_matches(matched_health, id = "i")
nrow(matched_df)
[1] 192

ATE Weights with wt_ate()

library(propensity)

health_ps <- health_ps |>
  mutate(
    w_ate = wt_ate(.fitted, exercise),
    w_atm = wt_atm(.fitted, exercise)
  )

health_ps |> select(id, .fitted, exercise, w_ate, w_atm) |> head(4)
# A tibble: 4 × 5
     id .fitted exercise      w_ate      w_atm
  <int>   <dbl>    <int> <psw{ate}> <psw{atm}>
1     1   0.163        0   1.194443  0.1944430
2     2   0.108        0   1.121614  0.1216139
3     3   0.130        0   1.149312  0.1493122
4     4   0.158        0   1.187937  0.1879374

Mirror Histogram: Before & After Weighting

library(halfmoon)

ggplot(health_ps, aes(.fitted, fill = factor(exercise))) +
  geom_mirror_histogram(bins = 40, alpha = 0.5) +
  geom_mirror_histogram(aes(weight = w_ate), bins = 40, alpha = 0.5) +
  geom_hline(yintercept = 0, linewidth = 0.5) +
  scale_fill_manual(values = c("#E69F00", "#56B4E9"),
                    labels = c("No exercise", "Exercises")) +
  scale_y_continuous(labels = abs) +
  labs(x = "Propensity Score", fill = NULL) +
  theme_minimal()

Mirror Histogram: Before & After Weighting

Part 2 Takeaways

  • Propensity scores are estimated with glm(..., family = binomial())
  • augment() from broom appends fitted probabilities to your data
  • MatchIt::matchit() + get_matches() creates a matched sample
  • wt_ate() and wt_atm() from propensity compute IPW weights

Part 3

Propensity Score Diagnostics

Why Diagnose?

  • A propensity score model can be technically correct but causally wrong
  • We need to check that the weighted/matched sample is balanced

Key diagnostics:

  • Standardised Mean Differences (SMDs) — Love plots
  • Empirical CDFs (ECDFs) — distributional balance
  • Weighted tables — publication-ready summaries

Standardised Mean Differences

smds <- health_ps |>
  mutate(age = as.numeric(age),
         bmi = as.numeric(bmi)) |>
  tidy_smd(
    .vars  = c(age, bmi, diet_quality, smoking),
    .group = exercise,
    .wts   = c(w_ate, w_atm)
  )

smds

Standardised Mean Differences

# A tibble: 12 × 4
   variable     method   exercise      smd
   <chr>        <chr>    <chr>       <dbl>
 1 age          observed 1        -0.0497 
 2 bmi          observed 1         0.287  
 3 diet_quality observed 1         0.274  
 4 smoking      observed 1         0.380  
 5 age          w_ate    1        -0.0141 
 6 bmi          w_ate    1         0.0291 
 7 diet_quality w_ate    1         0.107  
 8 smoking      w_ate    1         0.0477 
 9 age          w_atm    1         0.0131 
10 bmi          w_atm    1         0.00273
11 diet_quality w_atm    1         0.0199 
12 smoking      w_atm    1        -0.00225

Love Plot

ggplot(smds,
       aes(x = abs(smd), y = variable,
           group = method, color = method)) +
  geom_love() +
  labs(x = "|SMD|", y = NULL, color = NULL) +
  theme_minimal()

Note

A rule-of-thumb threshold is |SMD| < 0.1 after weighting

Love Plot

ECDF: Unweighted

ggplot(health_ps,
       aes(x = age, color = factor(exercise))) +
  geom_ecdf() +
  scale_color_manual("Exercise", values = c("#E69F00", "#56B4E9"),
                     labels = c("No", "Yes")) +
  labs(x = "Age", y = "Proportion ≤ x") +
  theme_minimal()

ECDF: Unweighted

ECDF: Weighted

ggplot(health_ps,
       aes(x = age, color = factor(exercise))) +
  geom_ecdf(aes(weights = w_ate)) +
  scale_color_manual("Exercise", values = c("#E69F00", "#56B4E9"),
                     labels = c("No", "Yes")) +
  labs(x = "Age", y = "Proportion ≤ x (weighted)") +
  theme_minimal()

ECDF: Weighted

Unweighted Table

library(gtsummary)

health_ps |>
  select(exercise, age, bmi, diet_quality, smoking) |>
  tbl_summary(
    by = exercise,
  ) |>
  add_difference(everything() ~ "smd") |>
  modify_column_hide(conf.low) 

Unweighted Table

Characteristic 0
N = 503
1
1
N = 97
1
Difference2
age 48 (41, 56) 47 (41, 54) -0.05
bmi 27.5 (24.3, 30.5) 26.1 (23.2, 29.5) 0.29
diet_quality

0.27
    fair 222 (44%) 38 (39%)
    good 135 (27%) 38 (39%)
    poor 146 (29%) 21 (22%)
smoking 144 (29%) 13 (13%) 0.38
Abbreviation: CI = Confidence Interval
1 Median (Q1, Q3); n (%)
2 Standardized Mean Difference

Weighted Table

library(survey)
library(gtsummary)

health_ps |>
  select(exercise, age, bmi, diet_quality, smoking, w_atm) |>
  svydesign(ids = ~1, data = _, weights = ~w_atm) |>
  tbl_svysummary(
    by      = exercise,
    include = -w_atm
  ) |>
  add_difference(everything() ~ "smd") |>
  modify_column_hide(conf.low) 

Weighted Table

Characteristic 0
N = 97
1
1
N = 97
1
Difference2
age 48 (42, 56) 47 (41, 54) 0.01
bmi 26.2 (23.3, 29.2) 26.1 (23.2, 29.5) 0.00
diet_quality

0.02
    fair 38 (39%) 38 (39%)
    good 39 (40%) 38 (39%)
    poor 20 (21%) 21 (22%)
smoking 13 (13%) 13 (13%) 0.00
Abbreviation: CI = Confidence Interval
1 Median (Q1, Q3); n (%)
2 Standardized Mean Difference

Part 3 Takeaways

  • tidy_smd() + geom_love() give you Love plots to assess balance
  • geom_ecdf() with weights = shows distributional balance
  • svydesign() + tbl_svysummary() produce weighted summary tables
  • Look for |SMD| < 0.1 and overlapping ECDFs after weighting

Part 4

Outcome Models & Bootstrapped CIs

Fitting the Weighted Outcome Model

Once we have balance, we fit a weighted linear model:

lm(
  systolic_bp ~ exercise,
  data    = health_ps,
  weights = w_ate
) |>
  tidy(conf.int = TRUE)
# A tibble: 2 × 7
  term        estimate std.error statistic  p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)   161.       0.541     298.  0           160.     162.  
2 exercise       -8.76     0.768     -11.4 2.14e-27    -10.3     -7.25

Why Bootstrap?

Standard errors from lm() ignore the uncertainty in the propensity score estimation step.

We need to bootstrap the entire pipeline:

  1. Fit propensity model
  2. Compute weights
  3. Fit outcome model

Write a Bootstrap Function

library(rsample)

fit_ipw <- function(split, ...) {
  .df <- analysis(split)

  # Step 1: propensity model
  ps_model <- glm(
    exercise ~ age + bmi + diet_quality + smoking,
    data = .df, family = binomial()
  )

  # Step 2: ATE weights
  .df <- ps_model |>
    augment(type.predict = "response", data = .df) |>
    mutate(w_ate = wt_ate(.fitted, exercise, exposure_type = "binary"))

  # Step 3: weighted outcome model
  lm(systolic_bp ~ exercise, data = .df, weights = w_ate) |>
    tidy()
}

Run 1,000 Bootstrap Resamples

set.seed(1)

ipw_results <- bootstraps(health_study, times = 1000, apparent = TRUE) |>
  mutate(boot_fits = map(splits, fit_ipw))

Distribution of Bootstrap Estimates

ipw_results |>
  mutate(
    estimate = map_dbl(boot_fits,
      \(.fit) .fit |> filter(term == "exercise") |> pull(estimate))
  ) |>
  ggplot(aes(estimate)) +
  geom_histogram(fill = "#2c7bb6", color = "white", alpha = 0.8) +
  labs(x = "Effect Estimate (mmHg)", y = "Count") +
  theme_minimal()

Compute the Confidence Interval

boot_estimate <- int_t(ipw_results, boot_fits) |>
  filter(term == "exercise")

boot_estimate
# A tibble: 1 × 6
  term     .lower .estimate .upper .alpha .method  
  <chr>     <dbl>     <dbl>  <dbl>  <dbl> <chr>    
1 exercise  -10.7     -8.79  -6.82   0.05 student-t

G-Computation

std_model <- lm(
  systolic_bp ~
    ns(posted_bp_8am, df = 2) * exercise +
    age + bmi + diet_quality + smoking,
  data = health_study
)

Clone & Predict

low_bp  <- health_study |> mutate(posted_bp_8am = 110)
high_bp <- health_study |> mutate(posted_bp_8am = 145)

pred_low  <- std_model |>
  augment(newdata = low_bp)  |> select(low_posted  = .fitted)

pred_high <- std_model |>
  augment(newdata = high_bp) |> select(high_posted = .fitted)

bind_cols(pred_low, pred_high) |>
  summarise(
    mean_low   = mean(low_posted),
    mean_high  = mean(high_posted),
    difference = mean_high - mean_low
  )
# A tibble: 1 × 3
  mean_low mean_high difference
     <dbl>     <dbl>      <dbl>
1     155.      159.       3.48

Part 4 Takeaways

  • Fit a weighted outcome model: lm(outcome ~ exposure, weights = w_ate)
  • Bootstrap the entire pipeline for correct standard errors
  • Use int_t() from rsample for bootstrap confidence intervals
  • For continuous exposures, use g-computation: fit one model, clone data at each exposure level, predict, and take means

Part 5

Sensitivity Analyses

The Unmeasured Confounder Problem

  • We can only adjust for measured variables
  • What if there’s an important confounder we didn’t capture?
  • A tipping-point analysis asks: > How strong would an unmeasured confounder need to be to explain away our result?

Our Estimate

From Part 4, we found that exercise reduces systolic BP by approximately:

# Our bootstrapped estimate and lower CI bound
effect        <- -6   # point estimate (mmHg)
upper_ci      <- -4.2 # upper bound of 95% CI

We’ll use the upper bound as the boud closest to the null to tip.

tip_coef() from {tipr}

library(tipr)

tip_coef(
  effect_observed = upper_ci,          # the estimate to tip
  exposure_confounder_effect = 2     # assumed confounder-exposure association
)
# A tibble: 1 × 4
  effect_observed exposure_confounder_effect confounder_outcome_effect
            <dbl>                      <dbl>                     <dbl>
1            -4.2                          2                      -2.1
# ℹ 1 more variable: n_unmeasured_confounders <dbl>

Note

This tells us how large the outcome-confounder effect would need to be to nullify the result, assuming the confounder-exposure association is 2

Scanning a Range of Assumptions

confounder_effects <- seq(0.5, 5, by = 0.5)

tips <- map_dfr(confounder_effects, \(ec) {
  tip_coef(upper_ci, exposure_confounder_effect = ec) |>
    mutate(exposure_confounder_effect = ec)
})

ggplot(tips, aes(x = exposure_confounder_effect,
                 y = confounder_outcome_effect)) +
  geom_line(color = "#2c7bb6", linewidth = 1.2) +
  geom_point(color = "#2c7bb6", size = 2) +
  labs(x = "Exposure - Confounder effect",
       y = "Confounder - Outcome effect needed to tip") +
  theme_minimal()

Scanning a Range of Assumptions

Interpreting the Sensitivity Analysis

  • If the tipping-point values seem unrealistically large, your result is robust
  • If they seem plausible, you should be cautious
  • This doesn’t invalidate your study — it contextualises the uncertainty
  • Always report alongside your primary estimate

Part 5 Takeaways

  • tip_coef() from tipr computes the tipping-point for unmeasured confounding
  • Use the lower confidence bound as a conservative input
  • Scanning over a range of assumptions gives a fuller picture
  • Sensitivity analyses are a requirement, not an optional extra