Data Science at Spotify with Boxun Zhang

“I normally try to sit together or very close to a product team or engineering team. And by doing so, I get very close to the source of all kinds of challenging problems.”

Spotify is a streaming music service that uses data science and machine learning to implement product features such as recommendation systems and music categorization, but also to answer internal questions.

Boxun Zhang is a data scientist at Spotify where he focuses on understanding user behavior within the product.

Questions

What is the overlap between distributed systems and data science?

How has Spotify’s big data architecture evolved over time?

As a data scientist do you need to understand this big data architecture well?

What were the benefits for starting to use Kafka?

What kinds of data science problems do you tackle at Spotify?

Could you describe what a random forest is?

Why are there so many streaming systems, and what do you use at Spotify?