Exercises in R

Part 1

Causal Diagrams in R

Why Causal Diagrams?

Associations don’t necessarily imply causation, we can be explicit about our assumptions
A Directed Acyclic Graph (DAG) encodes causal structure visually
DAGs help us identify confounders to adjust for, colliders to avoid conditioning on, and whether our causal question is even answerable

Example

Does regular exercise cause lower systolic blood pressure?

Exposure: exercise: does the patient exercise regularly? (yes/no)

Outcome: systolic_bp: systolic blood pressure (mmHg)

Proposed confounders:

Variable	Column
Age	`age`
BMI	`bmi`
Diet quality	`diet_quality`
Smoking status	`smoking`

Setup

library(tidyverse)
library(ggdag)
library(dagitty)

set.seed(1)

Specify a DAG with `dagify()`

bp_dag <- dagify(
  systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
  exercise    ~ age + bmi + diet_quality + smoking,
  smoking     ~ diet_quality,
  exposure = "exercise",
  outcome  = "systolic_bp",
  labels = c(
    exercise    = "Regular\nExercise",
    systolic_bp = "Systolic BP",
    age         = "Age",
    bmi         = "BMI",
    diet_quality = "Diet\nQuality",
    smoking     = "Smoking"
  )
)

Note

Each formula reads as “this thing is caused by these things”

Plot with `ggdag()`

ggdag(bp_dag, use_labels = "label", text = FALSE) +
  theme_dag()

Time-Ordered Layout

bp_dag_to <- dagify(
  systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
  exercise    ~ age + bmi + diet_quality + smoking,
  smoking     ~ diet_quality,
  exposure = "exercise",
  outcome  = "systolic_bp",
  coords   = time_ordered_coords(),        # <-- new!
  labels = c(
    exercise = "Regular\nExercise",
    systolic_bp = "Systolic BP",
    age = "Age", bmi = "BMI",
    diet_quality = "Diet\nQuality",
    smoking = "Smoking"
  )
)
ggdag(bp_dag_to, use_labels = "label", text = FALSE) + theme_dag()

Time-Ordered Layout

Open Paths with `ggdag_paths()`

ggdag_paths(
  bp_dag,
  use_labels = "label",
  text = FALSE
) +
  theme_dag()

Open Paths with `ggdag_paths()`

Adjustment Sets

What do we need to control for to close all backdoor paths?

ggdag_adjustment_set(
  bp_dag,
  use_labels = "label",
  text = FALSE
) +
  theme_dag()

Adjustment Sets

Adjustment Sets in Text

dagitty::adjustmentSets(bp_dag)

{ age, bmi, diet_quality, smoking }

Note

If there is more than one adjustment set, you can adjust for any one of these sets and you’ve blocked all backdoor paths.

What if a Variable is Unmeasured?

bp_dag_latent <- dagify(
  systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
  exercise    ~ age + bmi + diet_quality + smoking,
  smoking     ~ diet_quality,
  exposure = "exercise", outcome = "systolic_bp",
  latent   = "diet_quality",          # <-- unmeasured!
  labels = c(exercise = "Regular\nExercise", systolic_bp = "Systolic BP",
             age = "Age", bmi = "BMI",
             diet_quality = "Diet\nQuality", smoking = "Smoking")
)
ggdag_adjustment_set(bp_dag_latent, use_labels = "label", text = FALSE) +
  theme_dag()

What if a Variable is Unmeasured?

Part 1 Takeaways

Use dagify() to write down your causal assumptions
ggdag() and ggdag_paths() reveal open (confounding) paths
ggdag_adjustment_set() tells you what to control for
time_ordered_coords() makes DAGs cleaner and more readable

Part 2

Propensity Scores

What is a Propensity Score?

The propensity score is the probability of receiving the exposure given observed covariates:

\[e(X) = P(\text{Exercise} = 1 \mid X)\]

Collapses all confounders into a single number
Patients with the same propensity score are (in expectation) balanced on all observed confounders
We estimate it with logistic regression

Fit a Propensity Score Model

library(broom)

propensity_model <- glm(
  exercise ~ age + bmi + diet_quality + smoking,
  data   = health_study,
  family = binomial()
)

tidy(propensity_model)

# A tibble: 6 × 5
  term             estimate std.error statistic p.value
  <chr>               <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)      -0.139      0.851    -0.163  0.870  
2 age               0.00878    0.0104    0.846  0.398  
3 bmi              -0.0712     0.0258   -2.76   0.00580
4 diet_qualitygood  0.530      0.259     2.05   0.0404 
5 diet_qualitypoor  0.00984    0.300     0.0328 0.974  
6 smoking          -0.969      0.322    -3.01   0.00258

Add Propensity Scores with `augment()`

health_ps <- propensity_model |>
  augment(type.predict = "response", data = health_study)

health_ps |>
  select(id, exercise, .fitted) |>
  head(6)

# A tibble: 6 × 3
     id exercise .fitted
  <int>    <int>   <dbl>
1     1        0   0.163
2     2        0   0.108
3     3        0   0.130
4     4        0   0.158
5     5        0   0.168
6     6        0   0.136

Note

.fitted is each person’s estimated probability of exercising

Visualise the Propensity Score Distribution

ggplot(health_ps,
       aes(x = .fitted, fill = factor(exercise))) +
  geom_histogram(bins = 30, alpha = 0.7, position = "identity") +
  scale_fill_manual(values = c("#E69F00", "#56B4E9"),
                    labels = c("No exercise", "Exercises")) +
  labs(x = "Propensity Score", fill = NULL) +
  theme_minimal()

Visualise the Propensity Score Distribution

Matching with `MatchIt`

library(MatchIt)

matched_health <- matchit(
  exercise ~ age + bmi + diet_quality + smoking,
  data    = health_study,
  link    = "linear.logit",
  caliper = 0.2
)

matched_df <- get_matches(matched_health, id = "i")
nrow(matched_df)

[1] 192

ATE Weights with `wt_ate()`

library(propensity)

health_ps <- health_ps |>
  mutate(
    w_ate = wt_ate(.fitted, exercise),
    w_atm = wt_atm(.fitted, exercise)
  )

health_ps |> select(id, .fitted, exercise, w_ate, w_atm) |> head(4)

# A tibble: 4 × 5
     id .fitted exercise      w_ate      w_atm
  <int>   <dbl>    <int> <psw{ate}> <psw{atm}>
1     1   0.163        0   1.194443  0.1944430
2     2   0.108        0   1.121614  0.1216139
3     3   0.130        0   1.149312  0.1493122
4     4   0.158        0   1.187937  0.1879374

Mirror Histogram: Before & After Weighting

library(halfmoon)

ggplot(health_ps, aes(.fitted, fill = factor(exercise))) +
  geom_mirror_histogram(bins = 40, alpha = 0.5) +
  geom_mirror_histogram(aes(weight = w_ate), bins = 40, alpha = 0.5) +
  geom_hline(yintercept = 0, linewidth = 0.5) +
  scale_fill_manual(values = c("#E69F00", "#56B4E9"),
                    labels = c("No exercise", "Exercises")) +
  scale_y_continuous(labels = abs) +
  labs(x = "Propensity Score", fill = NULL) +
  theme_minimal()

Mirror Histogram: Before & After Weighting

Part 2 Takeaways

Propensity scores are estimated with glm(..., family = binomial())
augment() from broom appends fitted probabilities to your data
MatchIt::matchit() + get_matches() creates a matched sample
wt_ate() and wt_atm() from propensity compute IPW weights

Part 3

Propensity Score Diagnostics

Why Diagnose?

A propensity score model can be technically correct but causally wrong
We need to check that the weighted/matched sample is balanced

Key diagnostics:

Standardised Mean Differences (SMDs) — Love plots
Empirical CDFs (ECDFs) — distributional balance
Weighted tables — publication-ready summaries

Standardised Mean Differences

smds <- health_ps |>
  mutate(age = as.numeric(age),
         bmi = as.numeric(bmi)) |>
  tidy_smd(
    .vars  = c(age, bmi, diet_quality, smoking),
    .group = exercise,
    .wts   = c(w_ate, w_atm)
  )

smds

Standardised Mean Differences

# A tibble: 12 × 4
   variable     method   exercise      smd
   <chr>        <chr>    <chr>       <dbl>
 1 age          observed 1        -0.0497 
 2 bmi          observed 1         0.287  
 3 diet_quality observed 1         0.274  
 4 smoking      observed 1         0.380  
 5 age          w_ate    1        -0.0141 
 6 bmi          w_ate    1         0.0291 
 7 diet_quality w_ate    1         0.107  
 8 smoking      w_ate    1         0.0477 
 9 age          w_atm    1         0.0131 
10 bmi          w_atm    1         0.00273
11 diet_quality w_atm    1         0.0199 
12 smoking      w_atm    1        -0.00225

Love Plot

ggplot(smds,
       aes(x = abs(smd), y = variable,
           group = method, color = method)) +
  geom_love() +
  labs(x = "|SMD|", y = NULL, color = NULL) +
  theme_minimal()

Note

A rule-of-thumb threshold is |SMD| < 0.1 after weighting

Love Plot

ECDF: Unweighted

ggplot(health_ps,
       aes(x = age, color = factor(exercise))) +
  geom_ecdf() +
  scale_color_manual("Exercise", values = c("#E69F00", "#56B4E9"),
                     labels = c("No", "Yes")) +
  labs(x = "Age", y = "Proportion ≤ x") +
  theme_minimal()

ECDF: Unweighted

ECDF: Weighted

ggplot(health_ps,
       aes(x = age, color = factor(exercise))) +
  geom_ecdf(aes(weights = w_ate)) +
  scale_color_manual("Exercise", values = c("#E69F00", "#56B4E9"),
                     labels = c("No", "Yes")) +
  labs(x = "Age", y = "Proportion ≤ x (weighted)") +
  theme_minimal()

ECDF: Weighted

Unweighted Table

library(gtsummary)

health_ps |>
  select(exercise, age, bmi, diet_quality, smoking) |>
  tbl_summary(
    by = exercise,
  ) |>
  add_difference(everything() ~ "smd") |>
  modify_column_hide(conf.low)

Unweighted Table

Characteristic	0 N = 503¹	1 N = 97¹	Difference²
age	48 (41, 56)	47 (41, 54)	-0.05
bmi	27.5 (24.3, 30.5)	26.1 (23.2, 29.5)	0.29
diet_quality			0.27
fair	222 (44%)	38 (39%)
good	135 (27%)	38 (39%)
poor	146 (29%)	21 (22%)
smoking	144 (29%)	13 (13%)	0.38
Abbreviation: CI = Confidence Interval
¹ Median (Q1, Q3); n (%)
² Standardized Mean Difference

Weighted Table

library(survey)
library(gtsummary)

health_ps |>
  select(exercise, age, bmi, diet_quality, smoking, w_atm) |>
  svydesign(ids = ~1, data = _, weights = ~w_atm) |>
  tbl_svysummary(
    by      = exercise,
    include = -w_atm
  ) |>
  add_difference(everything() ~ "smd") |>
  modify_column_hide(conf.low)

Weighted Table

Characteristic	0 N = 97¹	1 N = 97¹	Difference²
age	48 (42, 56)	47 (41, 54)	0.01
bmi	26.2 (23.3, 29.2)	26.1 (23.2, 29.5)	0.00
diet_quality			0.02
fair	38 (39%)	38 (39%)
good	39 (40%)	38 (39%)
poor	20 (21%)	21 (22%)
smoking	13 (13%)	13 (13%)	0.00
Abbreviation: CI = Confidence Interval
¹ Median (Q1, Q3); n (%)
² Standardized Mean Difference

Part 3 Takeaways

tidy_smd() + geom_love() give you Love plots to assess balance
geom_ecdf() with weights = shows distributional balance
svydesign() + tbl_svysummary() produce weighted summary tables
Look for |SMD| < 0.1 and overlapping ECDFs after weighting

Part 4

Outcome Models & Bootstrapped CIs

Fitting the Weighted Outcome Model

Once we have balance, we fit a weighted linear model:

lm(
  systolic_bp ~ exercise,
  data    = health_ps,
  weights = w_ate
) |>
  tidy(conf.int = TRUE)

# A tibble: 2 × 7
  term        estimate std.error statistic  p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)   161.       0.541     298.  0           160.     162.  
2 exercise       -8.76     0.768     -11.4 2.14e-27    -10.3     -7.25

Why Bootstrap?

Standard errors from lm() ignore the uncertainty in the propensity score estimation step.

We need to bootstrap the entire pipeline:

Fit propensity model
Compute weights
Fit outcome model

Write a Bootstrap Function

library(rsample)

fit_ipw <- function(split, ...) {
  .df <- analysis(split)

  # Step 1: propensity model
  ps_model <- glm(
    exercise ~ age + bmi + diet_quality + smoking,
    data = .df, family = binomial()
  )

  # Step 2: ATE weights
  .df <- ps_model |>
    augment(type.predict = "response", data = .df) |>
    mutate(w_ate = wt_ate(.fitted, exercise, exposure_type = "binary"))

  # Step 3: weighted outcome model
  lm(systolic_bp ~ exercise, data = .df, weights = w_ate) |>
    tidy()
}

Run 1,000 Bootstrap Resamples

set.seed(1)

ipw_results <- bootstraps(health_study, times = 1000, apparent = TRUE) |>
  mutate(boot_fits = map(splits, fit_ipw))

Distribution of Bootstrap Estimates

ipw_results |>
  mutate(
    estimate = map_dbl(boot_fits,
      \(.fit) .fit |> filter(term == "exercise") |> pull(estimate))
  ) |>
  ggplot(aes(estimate)) +
  geom_histogram(fill = "#2c7bb6", color = "white", alpha = 0.8) +
  labs(x = "Effect Estimate (mmHg)", y = "Count") +
  theme_minimal()

Compute the Confidence Interval

boot_estimate <- int_t(ipw_results, boot_fits) |>
  filter(term == "exercise")

boot_estimate

# A tibble: 1 × 6
  term     .lower .estimate .upper .alpha .method  
  <chr>     <dbl>     <dbl>  <dbl>  <dbl> <chr>    
1 exercise  -10.7     -8.79  -6.82   0.05 student-t

G-Computation

std_model <- lm(
  systolic_bp ~
    ns(posted_bp_8am, df = 2) * exercise +
    age + bmi + diet_quality + smoking,
  data = health_study
)

Clone & Predict

low_bp  <- health_study |> mutate(posted_bp_8am = 110)
high_bp <- health_study |> mutate(posted_bp_8am = 145)

pred_low  <- std_model |>
  augment(newdata = low_bp)  |> select(low_posted  = .fitted)

pred_high <- std_model |>
  augment(newdata = high_bp) |> select(high_posted = .fitted)

bind_cols(pred_low, pred_high) |>
  summarise(
    mean_low   = mean(low_posted),
    mean_high  = mean(high_posted),
    difference = mean_high - mean_low
  )

# A tibble: 1 × 3
  mean_low mean_high difference
     <dbl>     <dbl>      <dbl>
1     155.      159.       3.48

Part 4 Takeaways

Fit a weighted outcome model: lm(outcome ~ exposure, weights = w_ate)
Bootstrap the entire pipeline for correct standard errors
Use int_t() from rsample for bootstrap confidence intervals
For continuous exposures, use g-computation: fit one model, clone data at each exposure level, predict, and take means

Part 5

Sensitivity Analyses

The Unmeasured Confounder Problem

We can only adjust for measured variables
What if there’s an important confounder we didn’t capture?
A tipping-point analysis asks: > How strong would an unmeasured confounder need to be to explain away our result?

Our Estimate

From Part 4, we found that exercise reduces systolic BP by approximately:

# Our bootstrapped estimate and lower CI bound
effect        <- -6   # point estimate (mmHg)
upper_ci      <- -4.2 # upper bound of 95% CI

We’ll use the upper bound as the boud closest to the null to tip.

`tip_coef()` from {tipr}

library(tipr)

tip_coef(
  effect_observed = upper_ci,          # the estimate to tip
  exposure_confounder_effect = 2     # assumed confounder-exposure association
)

# A tibble: 1 × 4
  effect_observed exposure_confounder_effect confounder_outcome_effect
            <dbl>                      <dbl>                     <dbl>
1            -4.2                          2                      -2.1
# ℹ 1 more variable: n_unmeasured_confounders <dbl>

Note

This tells us how large the outcome-confounder effect would need to be to nullify the result, assuming the confounder-exposure association is 2

Scanning a Range of Assumptions

confounder_effects <- seq(0.5, 5, by = 0.5)

tips <- map_dfr(confounder_effects, \(ec) {
  tip_coef(upper_ci, exposure_confounder_effect = ec) |>
    mutate(exposure_confounder_effect = ec)
})

ggplot(tips, aes(x = exposure_confounder_effect,
                 y = confounder_outcome_effect)) +
  geom_line(color = "#2c7bb6", linewidth = 1.2) +
  geom_point(color = "#2c7bb6", size = 2) +
  labs(x = "Exposure - Confounder effect",
       y = "Confounder - Outcome effect needed to tip") +
  theme_minimal()

Scanning a Range of Assumptions

Interpreting the Sensitivity Analysis

If the tipping-point values seem unrealistically large, your result is robust
If they seem plausible, you should be cautious
This doesn’t invalidate your study — it contextualises the uncertainty
Always report alongside your primary estimate

Part 5 Takeaways

tip_coef() from tipr computes the tipping-point for unmeasured confounding
Use the lower confidence bound as a conservative input
Scanning over a range of assumptions gives a fuller picture
Sensitivity analyses are a requirement, not an optional extra

Exercises in R

Part 1

Causal Diagrams in R

Why Causal Diagrams?

Example

Setup

Specify a DAG with dagify()

Plot with ggdag()

Time-Ordered Layout

Time-Ordered Layout

Open Paths with ggdag_paths()

Open Paths with ggdag_paths()

Adjustment Sets

Adjustment Sets

Adjustment Sets in Text

What if a Variable is Unmeasured?

What if a Variable is Unmeasured?

Part 1 Takeaways

Part 2

Propensity Scores

What is a Propensity Score?

Fit a Propensity Score Model

Add Propensity Scores with augment()

Visualise the Propensity Score Distribution

Visualise the Propensity Score Distribution

Matching with MatchIt

ATE Weights with wt_ate()

Mirror Histogram: Before & After Weighting

Mirror Histogram: Before & After Weighting

Part 2 Takeaways

Part 3

Propensity Score Diagnostics

Why Diagnose?

Key diagnostics:

Standardised Mean Differences

Standardised Mean Differences

Love Plot

Love Plot

ECDF: Unweighted

ECDF: Unweighted

ECDF: Weighted

ECDF: Weighted

Unweighted Table

Unweighted Table

Weighted Table

Weighted Table

Part 3 Takeaways

Part 4

Outcome Models & Bootstrapped CIs

Fitting the Weighted Outcome Model

Why Bootstrap?

Write a Bootstrap Function

Run 1,000 Bootstrap Resamples

Distribution of Bootstrap Estimates

Compute the Confidence Interval

G-Computation

Clone & Predict

Part 4 Takeaways

Part 5

Sensitivity Analyses

The Unmeasured Confounder Problem

Our Estimate

tip_coef() from {tipr}

Scanning a Range of Assumptions

Scanning a Range of Assumptions

Interpreting the Sensitivity Analysis

Part 5 Takeaways

Specify a DAG with `dagify()`

Plot with `ggdag()`

Open Paths with `ggdag_paths()`

Open Paths with `ggdag_paths()`

Add Propensity Scores with `augment()`

Matching with `MatchIt`

ATE Weights with `wt_ate()`

`tip_coef()` from {tipr}