Does regular exercise cause lower systolic blood pressure?
Exposure: exercise: does the patient exercise regularly? (yes/no)
Outcome: systolic_bp: systolic blood pressure (mmHg)
Proposed confounders:
| Variable | Column |
|---|---|
| Age | age |
| BMI | bmi |
| Diet quality | diet_quality |
| Smoking status | smoking |
dagify()bp_dag <- dagify(
systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
exercise ~ age + bmi + diet_quality + smoking,
smoking ~ diet_quality,
exposure = "exercise",
outcome = "systolic_bp",
labels = c(
exercise = "Regular\nExercise",
systolic_bp = "Systolic BP",
age = "Age",
bmi = "BMI",
diet_quality = "Diet\nQuality",
smoking = "Smoking"
)
)Note
Each formula reads as “this thing is caused by these things”
ggdag()bp_dag_to <- dagify(
systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
exercise ~ age + bmi + diet_quality + smoking,
smoking ~ diet_quality,
exposure = "exercise",
outcome = "systolic_bp",
coords = time_ordered_coords(), # <-- new!
labels = c(
exercise = "Regular\nExercise",
systolic_bp = "Systolic BP",
age = "Age", bmi = "BMI",
diet_quality = "Diet\nQuality",
smoking = "Smoking"
)
)
ggdag(bp_dag_to, use_labels = "label", text = FALSE) + theme_dag()ggdag_paths()ggdag_paths()What do we need to control for to close all backdoor paths?
Note
If there is more than one adjustment set, you can adjust for any one of these sets and you’ve blocked all backdoor paths.
bp_dag_latent <- dagify(
systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
exercise ~ age + bmi + diet_quality + smoking,
smoking ~ diet_quality,
exposure = "exercise", outcome = "systolic_bp",
latent = "diet_quality", # <-- unmeasured!
labels = c(exercise = "Regular\nExercise", systolic_bp = "Systolic BP",
age = "Age", bmi = "BMI",
diet_quality = "Diet\nQuality", smoking = "Smoking")
)
ggdag_adjustment_set(bp_dag_latent, use_labels = "label", text = FALSE) +
theme_dag()dagify() to write down your causal assumptionsggdag() and ggdag_paths() reveal open (confounding) pathsggdag_adjustment_set() tells you what to control fortime_ordered_coords() makes DAGs cleaner and more readableThe propensity score is the probability of receiving the exposure given observed covariates:
\[e(X) = P(\text{Exercise} = 1 \mid X)\]
# A tibble: 6 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -0.139 0.851 -0.163 0.870
2 age 0.00878 0.0104 0.846 0.398
3 bmi -0.0712 0.0258 -2.76 0.00580
4 diet_qualitygood 0.530 0.259 2.05 0.0404
5 diet_qualitypoor 0.00984 0.300 0.0328 0.974
6 smoking -0.969 0.322 -3.01 0.00258
augment()# A tibble: 6 × 3
id exercise .fitted
<int> <int> <dbl>
1 1 0 0.163
2 2 0 0.108
3 3 0 0.130
4 4 0 0.158
5 5 0 0.168
6 6 0 0.136
Note
.fitted is each person’s estimated probability of exercising
MatchItwt_ate()# A tibble: 4 × 5
id .fitted exercise w_ate w_atm
<int> <dbl> <int> <psw{ate}> <psw{atm}>
1 1 0.163 0 1.194443 0.1944430
2 2 0.108 0 1.121614 0.1216139
3 3 0.130 0 1.149312 0.1493122
4 4 0.158 0 1.187937 0.1879374
library(halfmoon)
ggplot(health_ps, aes(.fitted, fill = factor(exercise))) +
geom_mirror_histogram(bins = 40, alpha = 0.5) +
geom_mirror_histogram(aes(weight = w_ate), bins = 40, alpha = 0.5) +
geom_hline(yintercept = 0, linewidth = 0.5) +
scale_fill_manual(values = c("#E69F00", "#56B4E9"),
labels = c("No exercise", "Exercises")) +
scale_y_continuous(labels = abs) +
labs(x = "Propensity Score", fill = NULL) +
theme_minimal()glm(..., family = binomial())augment() from broom appends fitted probabilities to your dataMatchIt::matchit() + get_matches() creates a matched samplewt_ate() and wt_atm() from propensity compute IPW weights# A tibble: 12 × 4
variable method exercise smd
<chr> <chr> <chr> <dbl>
1 age observed 1 -0.0497
2 bmi observed 1 0.287
3 diet_quality observed 1 0.274
4 smoking observed 1 0.380
5 age w_ate 1 -0.0141
6 bmi w_ate 1 0.0291
7 diet_quality w_ate 1 0.107
8 smoking w_ate 1 0.0477
9 age w_atm 1 0.0131
10 bmi w_atm 1 0.00273
11 diet_quality w_atm 1 0.0199
12 smoking w_atm 1 -0.00225
Note
A rule-of-thumb threshold is |SMD| < 0.1 after weighting
| Characteristic | 0 N = 5031 |
1 N = 971 |
Difference2 |
|---|---|---|---|
| age | 48 (41, 56) | 47 (41, 54) | -0.05 |
| bmi | 27.5 (24.3, 30.5) | 26.1 (23.2, 29.5) | 0.29 |
| diet_quality | 0.27 | ||
| fair | 222 (44%) | 38 (39%) | |
| good | 135 (27%) | 38 (39%) | |
| poor | 146 (29%) | 21 (22%) | |
| smoking | 144 (29%) | 13 (13%) | 0.38 |
| Abbreviation: CI = Confidence Interval | |||
| 1 Median (Q1, Q3); n (%) | |||
| 2 Standardized Mean Difference | |||
| Characteristic | 0 N = 971 |
1 N = 971 |
Difference2 |
|---|---|---|---|
| age | 48 (42, 56) | 47 (41, 54) | 0.01 |
| bmi | 26.2 (23.3, 29.2) | 26.1 (23.2, 29.5) | 0.00 |
| diet_quality | 0.02 | ||
| fair | 38 (39%) | 38 (39%) | |
| good | 39 (40%) | 38 (39%) | |
| poor | 20 (21%) | 21 (22%) | |
| smoking | 13 (13%) | 13 (13%) | 0.00 |
| Abbreviation: CI = Confidence Interval | |||
| 1 Median (Q1, Q3); n (%) | |||
| 2 Standardized Mean Difference | |||
tidy_smd() + geom_love() give you Love plots to assess balancegeom_ecdf() with weights = shows distributional balancesvydesign() + tbl_svysummary() produce weighted summary tablesOnce we have balance, we fit a weighted linear model:
# A tibble: 2 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 161. 0.541 298. 0 160. 162.
2 exercise -8.76 0.768 -11.4 2.14e-27 -10.3 -7.25
Standard errors from lm() ignore the uncertainty in the propensity score estimation step.
We need to bootstrap the entire pipeline:
library(rsample)
fit_ipw <- function(split, ...) {
.df <- analysis(split)
# Step 1: propensity model
ps_model <- glm(
exercise ~ age + bmi + diet_quality + smoking,
data = .df, family = binomial()
)
# Step 2: ATE weights
.df <- ps_model |>
augment(type.predict = "response", data = .df) |>
mutate(w_ate = wt_ate(.fitted, exercise, exposure_type = "binary"))
# Step 3: weighted outcome model
lm(systolic_bp ~ exercise, data = .df, weights = w_ate) |>
tidy()
}low_bp <- health_study |> mutate(posted_bp_8am = 110)
high_bp <- health_study |> mutate(posted_bp_8am = 145)
pred_low <- std_model |>
augment(newdata = low_bp) |> select(low_posted = .fitted)
pred_high <- std_model |>
augment(newdata = high_bp) |> select(high_posted = .fitted)
bind_cols(pred_low, pred_high) |>
summarise(
mean_low = mean(low_posted),
mean_high = mean(high_posted),
difference = mean_high - mean_low
)# A tibble: 1 × 3
mean_low mean_high difference
<dbl> <dbl> <dbl>
1 155. 159. 3.48
lm(outcome ~ exposure, weights = w_ate)int_t() from rsample for bootstrap confidence intervalsFrom Part 4, we found that exercise reduces systolic BP by approximately:
We’ll use the upper bound as the boud closest to the null to tip.
tip_coef() from {tipr}# A tibble: 1 × 4
effect_observed exposure_confounder_effect confounder_outcome_effect
<dbl> <dbl> <dbl>
1 -4.2 2 -2.1
# ℹ 1 more variable: n_unmeasured_confounders <dbl>
Note
This tells us how large the outcome-confounder effect would need to be to nullify the result, assuming the confounder-exposure association is 2
confounder_effects <- seq(0.5, 5, by = 0.5)
tips <- map_dfr(confounder_effects, \(ec) {
tip_coef(upper_ci, exposure_confounder_effect = ec) |>
mutate(exposure_confounder_effect = ec)
})
ggplot(tips, aes(x = exposure_confounder_effect,
y = confounder_outcome_effect)) +
geom_line(color = "#2c7bb6", linewidth = 1.2) +
geom_point(color = "#2c7bb6", size = 2) +
labs(x = "Exposure - Confounder effect",
y = "Confounder - Outcome effect needed to tip") +
theme_minimal()tip_coef() from tipr computes the tipping-point for unmeasured confoundingSlides by Dr. Lucy D’Agostino McGowan