Anne Bauer (PhD ’08, Physics)

What did you study at Yale? What is your current profession/job?

I studied physics at Yale; my Ph.D. thesis was in astrophysics, regarding active galactic nuclei in the Palomar-QUEST Survey. Now I am a lead data scientist at The New York Times. I’ve worked with numerous departments across the company, and now I lead the team in charge of algorithmic content recommendations.

What do you like most about your current role? What do you find most challenging and/or rewarding?

My role is an interesting and challenging mix of technical and communicative. I build machine learning models and software infrastructure, but I also work closely with nontechnical people to figure out the best opportunities for and integrations of data science with their workflows. The subject matter I’ve been working with is very broad, dealing with the newsroom, marketing, advertising, and print departments. It has been really interesting to see how these different business focuses work, and challenging to integrate predictive modelling smoothly into their varied processes.

How did your time at Yale shape your career trajectory?

The work I did at Yale did concretely prepare me for a career in data science, as there are many overlaps between the coding and presentation skills that make up both jobs. On a larger scale, it was inspiring to work with professors who were devoting their careers to what they considered the most important problems to understand. Their clear enjoyment of work as a pursuit of intellectual curiosity about important questions provided a model for a highly satisfying career trajectory that has served me well even after leaving the academic field in which we worked together.

What are the main skills that you acquired as a PhD student which help make you successful in your current career?

My thesis work involved writing code to analyze terabytes of images of the sky, understand the idiosyncrasies of the data, clean it up, and measure how certain galaxies’ properties changed over time. I enjoyed this work not only because astrophysics is cool, but also because the work was at the intersection of physics and computer science, in which I double-majored in college. The big data analysis pipelines I wrote at Yale directly prepared me for the big data analysis problems I face at work. Perhaps more important than the academic and technical skills I learned was what I’m sure every Ph.D. student develops: the ability to approach large, difficult problems, break them down into pieces, learn about them from scratch, and work them through to the end. While data science is much more applied than my physics research, it also involves investigating questions where offhand we don’t know how to answer them, or even if they can be answered. The ability to make sense of such situations is what makes data scientists so valuable to companies these days.

Did you acquire any professional experience related to your line of work while in graduate school?

No, I had no intention of becoming a data scientist while in grad school. In fact, I don’t think the job existed yet. After getting my Ph.D. I did two postdocs and enjoyed working in research. It was only then that I realized that data science was a good career that matched my skills and interests.

What advice would you offer PhDs who are interested in your line of work?

While Ph.D. research can prepare you well for data science, in some ways they are extremely different. In grad school you delve extremely deeply into one question. In data science you usually work on diverse topics using a range of approaches, and each project might last a few months. There is a lot of variety in the projects that data scientists do. When deciding what type of data science job I wanted, I found it helpful to think hard about my most and least favorite or successful aspects of my research jobs: coding, understanding the data, mathematical analysis? This helped me narrow down the flavor of data science job I would probably enjoy. And, of course, talking with people who work in the field is always extremely helpful.