Predicting the Number of Likes on a Facebook Status With Statistical Keyword Analysis

What makes people click the Like button on Facebook Statuses? Does the status say something funny? Does it contain a cool link or a funny photo? Does the status actively engage its readers?

Needless to say, brands on Facebook want to know the answers to these questions. The more people that Like their statuses on their Facebook Pages, the more exposure they gain, and the more money they earn. Many, many companies have been created for the sole purpose of maximizing the number of Facebook Likes.

By 2013, most brands have realized that photo posts and statuses that ask a question will generate much higher numbers of Likes. Personally, that’s not why I like a given Facebook status: I Like statuses that contain new and exciting things that are relevant to my interests. The first thing I read in a status is the Message: if the message is boring and uninteresting, then there’s no reason for me to Like the status needlessly.

Can the keywords and language in a Facebook Status predict the number of Likes the status receives?

Running multivariate regressions on keywords from thousands of Facebook statuses, I’ve discovered that the language used in a status does help predict the number of Likes a status receives, and actually could help brands learn much more about their fans.

Setup

In order to draw accurate conclusions from the analysis, large amounts of data are needed (and there’s no such thing as too much data).

I decided to analyze the Facebook Pages of 3 extremely popular News sources: CNN, the New York Times, and BBC World News. All three Pages have millions of fans, post the same types of Facebook statuses, and post statuses extremely frequently to Facebook (around 8-9 times a day). The statuses analyzed will be from June 1st 2012 to June 1st 2013, in order to both gather a large sample size (~3,000 statuses from each Page) and a ensure a consistent apples-to-apples comparison between the three sources.

The raw data, code, and a detailed technical explanation of the statistical techniques used to process the data can be found in this GitHub repository. (tl;dr, it’s an optimized least squares regression)

Can the keywords and language in a Facebook Status predict the number of Likes the status receives? We’re ready to test this hypothesis.

Analysis

Bourdain refers to Anthony Bourdain, who very recently began a food and travel show on CNN. I assume that targets CNN’s primary demographic.

The presence of Barack (Barack Obama) predicts more likes than just Obama.

The presence of Monday predicts more likes most other keywords. Garfield would be disappointed.

The presence of North (North Korea) predicts a decrease in Likes from the average. Is North Korea boring?

The presence of Check (“Check this out!”) also predicts a decrease in Likes from the average. This is interesting because this call-to-action is usually associated with the other keywords. Perhaps it’s not necessary?

Analysis

The most impactful keywords for NYTimes are political keywords and current events, such as Hurricane Sandy and the LondonOlympics.

Mills refers to Doug Mills, photographer for the New York Times. The power of photos!

You and See are impactful call-to-action keywords, even though they don’t ask any questions. (Interestingly, Are, which implicitly asks a question, predicts a decrease in Likes. Maybe people don’t Like questions?)

Analysis

The most impactful keywords for BBC World News are keywords describing news around the world. Unlike CNN and NYTimes, there are very few impactful keywords related to domestic politics (such as Prime Minister David Cameron, and even Barack is more impactful than him.)

Malala refers to Malala Youdsfzai, a Pakistani activist who survived an assassination attempt.

Implicit question keywords such as Let, Do, and Would are all very effective.

The simple salutation of Hi predicts a decrease in Likes. There’s a British joke here somewhere.

The language used in Facebook Statuses can be very useful in identifying what words Fans like, and what words will be most useful in generating the most exposure. While my analysis can’t predict the exact number of Likes a Status receives, and I likely broke a few rules of statistics in the process of making this post, the impact of language and specific keywords on social media interaction is an endeavor worth pursuing.

If you liked this blog post, I have set up a Patreon to fund my machine learning/deep learning/software/hardware needs for my future crazy yet cool projects, and any monetary contributions to the Patreon are appreciated and will be put to good creative use.