I am a Ph.D. student in CS at University of Wisconsin–Madison where I am advised by Fred Sala. This past summer, I completed an internship with the Physics of AGI research group at Microsoft Research led by Sébastien Bubeck, where I worked on large language models. Before starting my Ph.D., I had the pleasure of working with Ameet Talwalkar and Zack Lipton during my M.S. in the Machine Learning Department at Carnegie Mellon University. As an undergraduate, I was extremely fortunate to work with both Sanjoy Dasgupta and Gary Cottrell at the University of California, San Diego. Prior to all of this, I was a community college student at Fresno City College, where I was lucky enough to learn calculus, linear algebra, and C++ from Greg Jamison.
My research is motivated by the need to democratize machine learning and foundation models to handle the long tail of emerging ML tasks in the sciences and beyond. In order to use these models to solve high-impact problems in the sciences, my work aims to solve two main challenges:
- determine what additional data to provide them and understand how it interacts with pretraining data, and
- automate the process of adapting them to new problems.
To address these challenges, I am focused on the intersection of data-centric ML (which aims to solve 1) and automated machine learning (AutoML; which aims to solve 2), or more concisely data-centric AutoML. As a result of these motivating challenges, my work on developing the foundations of data-centric AutoML has a focus on diverse ML tasks that are far afield from standard ML domains. These often include problems related to solving PDEs, protein folding, climate modeling, and beyond.