HBS Digital Initiative builds community and expertise around digital transformation and tech at Harvard Business School and beyond. We manage this forum to gather and share perspectives from the HBS student community.

When Machine Learning Influences Your Vote: Lessons from Cambridge Analytica in the 2016 US Presidential Elections

Machine learning is changing the landscape of political advertising. Consumers need greater transparency into how their personal data influences the political messages they see online.

Following the 2016 presidential election in the United States, a major scandal broke at Facebook that revealed a new frontier of machine learning in American electoral politics. Multiple investigative reports indicated that Cambridge Analytica (CA), a UK-based data analytics firm backed by conservative funding from the United States, had acquired access to the personal data of more than 50 million Facebook users without authorization.[1] This data was used to “identify the personalities of American voters and influence their behavior” through targeted “psychographic” digital ads.[2][3]The ensuing scandal led Cambridge Analytica to close in 2018.

Electoral politics is a high-stakes information arms race to understand voter sentiment and influence voter behavior. Cambridge Analytica correctly recognized that voters’ ongoing use of social media provided thousands of relevant data points that could be mined for insights. In October 2016, Cambridge Analytica’s CEO, Alexander Nix, discussed his firm’s strategy with Sky News, stating that “Cambridge Analytica is a political tech company that is delivering hypertargeted – and hyperpersuasive – messages to people on social media.” CA’s machine learning effort typically began with a detailed survey, often presented as a psychological test. Survey responses were combined with other data – for example, Facebook likes. CA then applied machine learning to that data to “identify clusters of people who care about a particular issue… and [then] nuance the messaging of [an] advert according to how people see the world, according to their personalities.”[4]

Investigations revealing improper data access led CA to close in 2018 despite the firm’s denial of any wrongdoing.[5] Yet the lesson from CA’s strategy is clear: in a social media landscape with ever-growing stores of personal data, short-term backlash against CA’s unethical methods will not prevent other actors from trying out this successful strategy in the future. Competitive advantage in the political big data industry comes from having the most innovative machine learning tools for identifying and influencing voters. In the next 2-10 years, we should expect other players in the politics industry to develop ever more sophisticated methods for acquiring and mobilizing data on voter sentiment.

Despite the risks, it is a smart business strategy for firms like CA to target voters through data mining. To avoid the backlash that led to CA’s closure, other actors in this space must be careful to identify themselves as politically-interested parties and adhere to privacy guidelines. But political actors will find many opportunities within these constraints. Social media users enjoy taking surveys and many have established party preferences. These traits can be used to drive voluntary engagement that will provide data for machine learning algorithms. Algorithms can be designed to provide insights into voter trends by issues, geographies, and demographic segments. In an ideal world, this would enable political platforms to be more responsive to voter needs.

The story of Cambridge Analytica’s success and subsequent closure raises important questions for the social media and politics industries. How should social media platforms be required to report on data collection and advertising related to political campaigns? What mechanisms should be established prevent foreign parties from influencing elections through social media?

From HBS Digital Initiative

The machine learning arms race to drive voter engagement is now a permanent feature of electoral politics around the world. As analytics firms and political organizations innovate new ways to understand and engage with constituents, individual voters must develop a stronger awareness of who is trying to influence them online and how their personal data determines which political messages they see. In the high-stakes game of electoral politics, companies seeking to mine voter data to inform political strategies are incentivized to test the boundaries of ethical data collection. It would be valuable to increase regulatory pressure on social media platforms to build transparency and accountability into their sales of political ad space, so that consumers understand how their personal data may be used to influence their behavior at the polling station.

This is a very interesting and controversial topic, especially given the recent midterm elections and unscrupulous Russian influence over political ad messaging. I agree that there needs to be much stricter regulation around the applications of machine learning and data mining in politics. I’d even go as far as allowing users to completely opt out of seeing any politically charged content on their social media feeds. I think there’s just too much potential for abuse in the system, both from domestic and foreign entities.

The case of Cambridge Analytica’s rise and fall is fascinating and I will be curious to see who replaces their efforts in future elections. It is particularly interesting that the general technology for targeting specific users was created for marketing/advertising applications and has been going on for years with little user concern. The true danger occurs when political advertisements masquerade as objective news. You asked great questions, however in an age where people are influenced by a picture, tweet, or brief article headline, I wonder what effect increased transparency would have. I imagine an arms race, with companies such as CA developing new and improved tools to stay one step ahead of attempts at regulation. It seems that the only way to prevent voters from being influenced in this manner is through education on the topic and for them to take responsibility for their media consumption.

I think the example of Cambridge Analytics is just the tip of the iceberg for the exploitation of data to influence an election. We humans all think of us as very individualistic and not prone to outside influences, particularly if this outside influence is geared towards the masses. However, our brain functions very much like a computer that can be hacked over time. That is why I believe technology that enables machine learning will have an even bigger impact in future elections of any kind. Then again, maybe my brain was hacked as well and I could be very wrong about this.

I think this is a very well written article. Cambridge Analytica case brought together quite a few important topics in consideration of society such as privacy/ ethical grounds regarding usage of virtual data, political advertising etc. As mentioned in the article, we have to be very cautious in design/ model Machine learning tools as it is difficult to teach machine what is right/wrong unless a human teaches it what is right/wrong. Election is an emotional phenomena for many people around the world and hence teaching the machine right thing to do is very critical.

This is an exceptionally controversial and terrifying topic (and thank you for writing about it!) In the largely under-regulated space of consumer data protection (at least in the U.S.), firms using consumer data to influence or reinforce opinions and drive certain behaviors is a hugely important issue. Particularly when technologies can be used to influence things like voting behavior (and, if you take an aggressive extension of this, effectively controlling how people think), our democratic society stands on the brink of collapse. I’m taking a somewhat hyperbolic stance here, but only because I’m worried if this issue isn’t addressed immediately through regulation, restrictions, and overall customer awareness, it will take a catastrophic event for us do something about it. And, by then, will it be too late?

I think the fourth paragraph underestimates how this Cambridge Analytica scandal, as well as other recent data exposure scandals and data breeches (Equifax, Target, etc.), will impact how people think about sharing personal information and other data with companies. It will not happen overnight, but longer-term people are going to be more careful with what they share as they become aware of how their data is used (thanks to articles like this) and how easily it can be hacked by nefarious actors. Less accessible data could potentially mute the benefits of machine learning for consumer data analysis.

Very interesting topic with potentially scary outcomes. As a citizen of a country where the media and lot of data collection platforms are controlled by the government I am frigthened of how they can use my user data against me. A lot of my friends are not on Facebook, never let the browser remember any of their data, use incognito mode all the time. I thought they are paranoic but as time passes, I start to understand their reasons. For me the big question is how can we protect this valuable set of data? And who will protect it? And to what extent is it OK to listen to people’s conversations, read their messages (from a national security perspective for example). Is it OK at all? With machine learning this behavior gets easier opening a lot of doors.

Very well written and a very relative topic that been the headline for the past year and the topic of most conversations I had recently. I see the reason for calling to have stronger regulations around the use of users’ data and how the potential of not allowing political ads on social media can be a good solution too. But what about people able to know for themselves? and putting more efforts to validate the ads they watch, see and read? I know how damaging these ads can be, but we must also trust people awareness and ability to make the right judgment of what they see and what makes them vote. Talking about such topic more will increase people ability to be more skeptic of what they see in such ads.

Thanks for raising this topic! I’ve been following the Cambridge Analytica story for quite some time throughout the election season and our current President’s term in office. As I’ve discussed this topic with peers and friends, what is unnerving is how analogous this approach of using “psychographic digital ads” and “identifying personalities to influence behavior” is to how businesses approach marketing. For example, many e-commerce companies hire data scientists and ad tech experts to achieve the same drilldown in categorizing your demographics as a user, identifying your browsing behavior on other platforms, and surfacing ads to influence your next purchase — it’s quite intentional that the pair of shoes you searched for on Google is now being presented in a Facebook ad or web banner, a day later on your mobile phone. What businesses are doing today to optimize your conversion to click on ‘BUY’ is a similar vein of experimenting on human behavior. Influencing purchasing behavior vs. voting behavior…can we spot the difference? Quite unsettling.

This was a great post! To your first question, I think that social media companies should be forced to report out exactly what information they are sharing with advertisers, political parties, etc. across all of their platforms. As you highlight in your case, machine learning is a very powerful tool that can be used to target those most susceptible to “fake news” and inflammatory stories. All social media companies have a responsibility to our country and the political system which has allowed them to even come into existence so I don’t think it is an unreasonable request to have them be transparent about how they are using our data and who they are selling it to. As it relates to Machine Learning, I think we must be thoughtful about whether mining consumer data to sell more products and spread political messages is really the best use of this revolutionary technology (vs. using it to cure diseases, support military operations, etc.).