Not So Standard Deviations is currently my writing muse (see my inspiration from their 28th Episode). This week, Roger Peng was discussing traffic data (listen here), inspiring me to solicit some input on a weird result my colleague Ryan and I found this summer.
This past August we won the Government Statistics Section JSM Data Challenge. We found that in mid-sized cars, vehicles that were in accidents with lower star ratings were more likely to have injured passengers (Figures 2 and 4). We delve into this further with an interactive scatterplot (Figure 5), allowing the user to examine the specific make and star rating of each car, along with the % of accidents with at least one injured passenger. You will notice there is certainly a sparsity of data in these lower-stare regions, in terms of types of vehicles, however they do represent a decent proportion of the weighted sample (201,239.8 (8.1%)).
We are hoping this will pique your interest. Perhaps we missed an important confounder, or maybe our data was too sparce to be conclusive, or maybe you think this whole thing is silly, and we definitely shouldn’t have won…either way we’d like your input! All of the data can be found here along with our raw analysis files. I would love to collaborate on this. Continue reading for more detailed information about our analysis.
The weight is the product of the inverse of the probabilities of selection at each of the three stages in the sampling process.
For more information on the sample scheme: click here.
In order to obtain the correct standard error estimates for a subpopulation, we need to properly account for the weights and design
R
0: No injury
1: Some injury (ranging from possible to fatal)
5 Stars: Injury risk for this vehicle is much less than average
4 Stars: Injury risk for this vehicle is less than average to average
2 or 3 Stars: Injury risk for this vehicle is average to greater than average or greater than average There were no one star vehicles in our data set
Adjusted for:
Fit a logistic model:
Maximum Injury in Vehicle | Weighted frequency (%) |
---|---|
No injury | 2,170,141.59 (83.6%) |
Injury | 425,048.1 (16.4%) |
NHTSA Safety Rating | Weighted frequency (%) |
---|---|
5-stars | 936,457.93 (36.1%) |
4-stars | 1,448,491.93 (55.8%) |
2- or 3-stars | 201,239.8 (8.1%) |
Figures 1, 2, and 3 show the injury & safety ratings by weight class. Figure 2 shows the concerning result that 2-3 star rated cars in the mid-size weight class have proportionally fewer injuries. These results hold true after adjusting for the covariates, this result held true (Figure 4). Here is where we would love your input! Perhaps we missed an important confounder. All of the data can be found here along with our raw analysis files.
To delve into this further, we have an interactive scatter plot, which allows the user to examine the specific make and star rating of each car, along with the % of accidents with at least one injured passenger (Figure 5).
We have some evidence to show that ratings contribute to whether or not passengers are injured, however this relationship is complex.