Exercises in R

Part 1

Causal Diagrams in R

Why Causal Diagrams?

  • Associations don’t necessarily imply causation, we can be explicit about our assumptions
  • A Directed Acyclic Graph (DAG) encodes causal structure visually
  • DAGs help us identify confounders to adjust for, colliders to avoid conditioning on, and whether our causal question is even answerable

Example

Does regular exercise cause lower systolic blood pressure?

Exposure: exercise: does the patient exercise regularly? (yes/no)

Outcome: systolic_bp: systolic blood pressure (mmHg)

Proposed confounders:

Variable Column
Age age
BMI bmi
Diet quality diet_quality
Smoking status smoking

Setup

library(tidyverse)
library(ggdag)
library(dagitty)

set.seed(1)

Specify a DAG with dagify()

bp_dag <- dagify(
  systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
  exercise    ~ age + bmi + diet_quality + smoking,
  smoking     ~ diet_quality,
  exposure = "exercise",
  outcome  = "systolic_bp",
  labels = c(
    exercise    = "Regular\nExercise",
    systolic_bp = "Systolic BP",
    age         = "Age",
    bmi         = "BMI",
    diet_quality = "Diet\nQuality",
    smoking     = "Smoking"
  )
)

Note

Each formula reads as “this thing is caused by these things”

Plot with ggdag()

ggdag(bp_dag, use_labels = "label", text = FALSE) +
  theme_dag()

Time-Ordered Layout

bp_dag_to <- dagify(
  systolic_bp ~ exercise + age + bmi + diet_quality + smoking,
  exercise    ~ age + bmi + diet_quality + smoking,
  smoking     ~ diet_quality,
  exposure = "exercise",
  outcome  = "systolic_bp",
  coords   = time_ordered_coords(),        # <-- new!
  labels = c(
    exercise = "Regular\nExercise",
    systolic_bp = "Systolic BP",
    age = "Age", bmi = "BMI",
    diet_quality = "Diet\nQuality",
    smoking = "Smoking"
  )
)
ggdag(bp_dag_to, use_labels = "label", text = FALSE) + theme_dag()

Time-Ordered Layout

Open Paths with ggdag_paths()

ggdag_paths(
  bp_dag,
  use_labels = "label",
  text = FALSE
) +
  theme_dag()

Open Paths with ggdag_paths()