Teaching Machines How to Learn: An Interview with Animashree Anandkumar

New Caltech faculty member Animashree (Anima) Anandkumar is researching ways to make machine learning fast and practical for real-world use. A Bren Professor of Computing and Mathematical Sciences in the Division of Engineering and Applied Science, she develops efficient techniques to speed up optimization algorithms that underpin machine-learning systems. Born in Mysore, India, Anandkumar received her B.S. in electrical engineering from the Indian Institute of Technology Madras and her PhD from Cornell University. She was a postdoctoral researcher at MIT from 2009 to 2010 and an assistant professor at UC Irvine from 2010 to 2016, as well as a visiting researcher at Microsoft Research New England in 2012 and 2014. Since 2016, she has been a principal scientist at Amazon Web Services, working on the practical aspects of deploying machine learning at scale using the cloud infrastructure. Recently, Anandkumar answered a few questions about her research and the future of machine learning at Caltech.

Even now that you are at Caltech, you are continuing your work with Amazon Web Services. What is the importance of a strong connection between industry and academia?

Bridging the gap between industry and academia is really important. It is a big part of what brought me to Caltech. The sooner we can take theory and deploy it practically, the faster innovation moves and the more impact it can have. You want to ask, "Is the theory that I'm working on relevant?" For example, I love math and I love to study math for its own sake. That's what I think academia is great for: understanding the fundamental theory and creating theoretical underpinnings that support broader work. But often it can be a bit of a leap to say, "Can I make this algorithm practical? When does it work? When does it not work?" By having these kinds of partnerships with industry, we can make that process a lot more seamless.

What are you working on right now?

One of the things I've been interested in understanding is how can we train machine-learning models more quickly and more accurately. Theorists tend to be pessimistic about this because you can always find worst-case scenarios where it's very hard to do this. So this is where I love the practitioners. They take the most basic algorithm, run it, and with enough tuning can get practical performance out of it.

So my question is, why is this happening? Can we fundamentally treat theory better? That is, stop thinking about worst-case scenarios and instead turn it around and ask, under what conditions do we expect simple algorithms to succeed? Instead of designing complex algorithms that can cover every case possible, can we come up with simple and practical algorithms that we can deploy at scale and make work across a wide range of domains?

I am also interested in more effectively exploiting structures present in data. Data usually have multiple dimensions and modalities. My research involves using mathematical techniques to encode such multi-dimensional data processing efficiently at scale.

At CAST, I'm excited about teaching robots to think. Let's say you want a robot to walk in an autonomous way. Right now, you have to manually feed in the parameters for each type of terrain on which the robot is expected to walk. This is where we need reinforcement learning, which I work on, because then the robot is continuously recording the data as it is walking. And based on that, it can adapt very quickly—not just to avoid falls, but also to develop a good navigation strategy. So that's one of the immediate goals that I see: teaching robots to learn on the fly as the environment is changing, and changing their strategies accordingly.

What do you enjoy most about being at Caltech?

I have to say the people. Caltech has this very collegial, tight-knit community of extremely smart people and that includes highly motivated students. So, I think that is what I value the most here.