Lab 03 - Using Propensity Scores

Getting Started

Clone the repository
Go to our class’s GitHub organization sta-779-s25
Find the GitHub repository for this lab, lab-03-using-propensity-scores-YOUR-GITHUB-HANDLE. This repo contains a template you can build on to complete your assignment.

Packages

In this lab we will work with the following packages:

tidyverse which is a collection of packages for doing data analysis in a “tidy” way
halfmoon, a package to examine balance
gtsummary for creating our Table 1s
survey to allow us to create weighted tables
propensity to calculate propensity score weights

If you’d like to run your code in the Console as well you’ll also need to load the packages there. To do so, run the following in the console.

library(tidyverse) 
library(gtsummary)
library(survey)
library(halfmoon)
library(propensity)

Note that the package is also loaded with the same commands in your Quarto document.

Exercises

For the following exercises, be sure to include a written explanation of your results (in full sentences) in addition to any R output. All figures should be “publication ready” in that they have correct axis labels, legends, etc (labels should be words, not variable names - ie with spaces, not underscores, etc.).

You were contacted by the Coffee Company to assess whether people who live in Coffee Town consume more coffee than a neighboring town. The Coffee Company provided you with the following DAG to describe their assumed relationship between variables they have collected between the two towns. They think that smokers are more likely to live in the Coffee Town, they think more people with difficult jobs live in Coffee Town, and they think that age also predicts which town you live in. Additionally, they suspect that age, whether you smoke, and the difficulty of your job influences the number of cups of coffee and individual drinks. They’ve asked you to calculate the average causal effect for the “equipoise” population, that is the population of individuals who reasonably could live in either town.

Read in the coffee_town_df.csv data frame. Describe the data (what are the columns, how many observations, how many in the exposed group, any missing data? etc.)
Create an unweighted Table 1 by exposure group for these data. Describe what you see.
Fit a propensity score model using the DAG provided. Examine the distribution of propensity scores by exposure group. What do you see?
The Coffee Company researchers have asked you to calculate the average causal effect for the “equipoise” population, that is the population of individuals who reasonably could live in either town. Calculate an appropriate weight based on this question. Describe the causal estimand you will be estimating using this weight.
Create a weighted Table 1 using the weight in the previous exercise. Compare this to the Table 1 from Exercise 2.
Examine the distribution of propensity score between the two groups, weighted by the weight chosen in Exercise 4. Create a plot to show this. Make sure your plot is “presentation ready” (axis labels, clear legend description or labels to describe histograms, if histograms overlap too much make sure to mirror them, etc.) What do you notice? Describe the plot.
Create a Love Plot comparing the weighted and unweighted standardized mean differences. Describe what you see.
Create unweighted and weighted eCDF plot(s) for all continuous confounders. Describe what you see.
Based on Exercises 7 and 8, refit your propensity score model if necessary. Recreate the weighted histograms, Love Plot, and eCDF plots for your new propensity score model (iterating until you are satisfied with the result). If you don’t think you need to make any changes describe why not.
Estimate the average causal effect using your final propensity score model and weight. Explain what this means in words.

BONUS: The average causal effect in exercise 10 is a point estimate. Ultimately, we are interested in additionally quantifying the uncertainty. Describe how you might estimate the uncertainty bounds for this estimate.