Projects
Professional/Academic Work
Masters Thesis: Evaluating the Representation of Snow Sampling Sites in the Western U.S. and Alaska
The full text can be found here and below is the abstract.
- Snowpack is a critical water resource threatened by climate change. This is particularly true in the western United States. NASA’s SnowEx program measured snow cover at numerous sites in the western United States and Alaska in preparation for future space-based missions. These snow cover sites were chosen largely based on snow cover classes created using subjectively defined thresholds. However, there has not been a systematic classification of snow cover in the US or SnowEx sites in terms of variables that affect snow water equivalent (SWE). Random Forest is a machine learning method that uses groups of decision trees to create robust predictions and identify important variables. SHAP (Shapley Additive Explanatory Values) is a modeling framework which uses game theory to evaluate the local importances of different predictors in a model by estimating their contributions in different coalitions of predictors. Using these advanced machine learning methods, I have created new snow cover classifications for the western United States and Alaska based on key predictor variables of peak SWE for Water Years 1993-2020 and assessed the representativeness of SnowEx sites in terms of these classes. These new snow classes are compared with the snow cover classification system created by Sturm and Liston (2021). This work will help NASA identify data gaps and enhance future snow monitoring efforts.
Personal Projects
Vortex of Accuracy
As part of my passion for sports analytics, I made what I call the Vortex of Accuracy (VoA). I call it that because I think it’s fun and silly, but believe me it is a serious statistical modeling effort. The VoA is a series of team strength models that I’ve created for FBS College Football and the NFL. These models take a combination of season summary stats and play-by-play data and attempt to approximate the projected win margin for each team against the hypothetical average team in their competition on a neutral field. The code for these models is written in R and Stan.