This Data Scientist Is BuzzFeed’s Secret Weapon

Ky Harlin has been called BuzzFeed’s secret weapon. The viral darling’s director of data science has his finger on the pulse of the Internet like few others. And as one of the company’s original 25 employees, he’s had it there for a long time.

BuzzFeed’s successes keep coming. Their latest content experiment, quizzes, commonly generate over a million views each, with some quizzes racking up tens of millions of shares.

Those are numbers most content creators would kill for, but it’s not luck. Science—data science in particular—drives a lot of BuzzFeed’s content strategy.

Intrigued, we talked to Harlin to learn about his scientific game plan.

What are some of the most interesting ways BuzzFeed uses data science or machine learning?

Here are a couple examples. First, think about this: A list is in itself a mini pool of content. We treat each individual item in a list almost like its own article. So we’ll try to really figure out what people are engaging with and turn a list of 45 items to a list of 25 items without the duds, reordered to make it most likely to share.

A more recent experiment: We’re working on clustering user characteristics and content. It’s the same idea you see on other sites, like Netflix, with a ton of content and recommendations. The idea is to cluster articles in buckets, and it’s really interesting because that reveals these latent topic interests people have. Like, we’ll find clusters that show people who are interested in Jennifer Lawrence are also interested in penguins.

We’ll find clusters that show people who are interested in Jennifer Lawrence are also interested in penguins.

What are some of the areas where data science fits into BuzzFeed?

One part is the whole consumer side. Within that, there is stuff you can look at before you publish an article and stuff you can do after.

Here’s what you can do before: Data can help you answer questions like what to write about. There are all these APIs out there on sites such as Facebook, Twitter, or Google. So they can identify things people care about and figure out how saturated the coverage is. We also look at editorial best practices: What are some characteristics of a good headline, a good thumbnail, good length for a post? Looking at historical data can tell us what works.

Now here’s what you can do after publishing: Optimize how you promote content.

We do this by following methods borrowed from biology. BuzzFeed is known as a viral content company, and one of basic statistics people look at with viruses is reproduction rate. So for every one person who has some disease, how many more get it directly as a result? And there are obviously correlations for sharing. For content, we can tell within an hour or so of publishing what type of stuff we should put prominently on the home page, promote on Twitter, things like that. That’s where data science can be really useful.

We also use it to train our models. The way something goes viral on Facebook now versus how it went viral before tells us what to change. We’ve gotten really good at predicting what the 10 biggest posts will be on a given day. The one thing that is hard to predict is—and why people scoff at this approach—is the magnitude.

One of other things we do is try to make data accessible to our writers—give them feedback on how content is doing in a consistent and regular way. That way, we see the same things, and can try to figure out why does well on certain platforms and things like that.

What about different teams, like HR and Sales?

There’s a lot of work we do to figure out how we can use data to tell stories that will compel people to advertise with us.

On the HR side, we’ve done analysis of how we’ve been hiring in the past, and we’ve come up with ways to measure the productivity of certain editors and certain teams in editorial groups. A lot of it is figuring out what’s important for certain content—like somebody creating celebrity content vs. politics content. Politics is less about raw magnitude and more about whether it’s being noticed in specific niches of the Internet.

Ad content vs. editorial content: How are they optimized differently?

In general, we try to take the approach that we have this great editorial team that is creating great content and always experimenting. We can use the things we learned through them and apply them to advertiser content.

There are more constraints around types of content you can create. It varies by the client, but at the same time we try to take the same approach since, ultimately, we’re trying to create BuzzFeed content.