The author is a Forbes contributor. The opinions expressed are those of the writer.

Loading ...

Loading ...

This story appears in the {{article.article.magazine.pretty_date}} issue of {{article.article.magazine.pubName}}. Subscribe

Artist Heather Dewey-Hagborg's 'Stranger Visions', comprising of 3D printed faces extracted from DNA taken from discarded cigarette butts and chewing gum, is displayed at the Big Bang Data exhibition at Somerset House on December 2, 2015 in London, England, showcasing how technology and "big data" is radically reshaping our ability to capture and understand the world around us. (Peter Macdiarmid/Getty Images for Somerset House)

American campaigning in the 21st century is all about data. The 2008 and 2012 Obama campaigns are often held up as the models that remade the modern political campaign into the data-driven behemoth that it is today. In the place of the coarse demographically-defined geographic boundaries of the past, Obama’s campaign used precision targeting to build rich data-driven profiles of every potential voter in the United States, using this data to precisely tune its messaging at the person level.

One of the campaign’s highly-touted innovations was a breakthrough partnership that allowed it to peer into Americans’ living rooms through their DVRs to see what each person was actually watching on their television, building high-resolution individual-level profiles of television viewing habits.

The Cruz campaign has attracted press recently with its use of “psychographic targeting” with teams of mathematicians and psychologists building the campaign equivalent of the Myers-Briggs personality test for America’s voters. The data underlying the system comes in part from a survey of more than 150,000 US households that scored voters along the psychological traits of “openness, conscientiousness, extraversion, agreeableness and neuroticism." This was merged with more than “50,000 data points gathered from voting records, popular websites and consumer information such as magazine subscriptions, car ownership and preferences for food and clothing.”

Yet, one of the most powerful data sources in the system comes from a massive dataset created by researchers from Cambridge University for their company called Cambridge Analytica that incorporates private profile data from tens of millions of Facebook users. To collect this data, the company used Mechanical Turk to recruit users to take a personality test that also required access to the user’s Facebook profile. This then downloaded key demographic data from the user’s profile, including “names, locations, birthdays, genders – as well as their Facebook ‘likes’.” The same data was captured for each of the user’s friends, of which a typical person had 340 in 2014 when the dataset was created. The company subsequently touted the dataset as encompassing over 40 million Americans.

One academic likened the dataset to “packaging voters like they’re consumers,” arguing “It’s one thing for a marketer to try to predict if people like Coke or Pepsi, but it’s another thing for them to predict things that are much more central to our identity and what’s more personal in how I interact with the world in terms of social and cultural issues.”

In other words, even though such profiling has become a gold standard in the world of advertising, there are still concerns about the technology being applied to other areas of people’s lives like political preferences.

In fact, even simple lists of email addresses are so valuable to the data engines of campaigns that there’s a “black market” for the email addresses of campaign donors, in which many of the major campaigns rent their email lists to each other in revenue sharing agreements. Rubio’s single biggest campaign expenditure has been nearly $1M for list rental, while Cruz has spent more than $2M. Even Donald Trump recently executed the necessary agreement to access the RNC’s master voter file, while the NRCC recently launched a debranded website touted as “the future of politics” that, according to Politico, is designed to raise not just funds, but also email addresses and other contact information.

All of this data collected by the campaigns has been in the news over the last few days after a database totaling more than 300GB and containing the voter registration records for more than 191 million Americans was found available for open download on the web due to an unsecured website. The database contains each “voter's full name (first, middle, last), their home address, mailing address, a unique voter ID, state voter ID, gender, date of birth, date of registration, phone number, a yes/no field for if the number is on the national do-not-call list, political affiliation, and a detailed voting history since 2000 [as well as] fields for voter prediction scores.”

Despite the utility of such data for identity theft and fraud, the majority of the fields in the database come from fully open public government data. In fact, the data in question is considered of such rudimentary value compared to the kinds of sophisticated psychographic data powering modern campaigns that one of the companies who distributes such data notes that they make such data available free of charge to all campaigns.

What made the breach all the more remarkable is that despite more than two days of exhaustive searching, the security researcher who found the vulnerable website was unable to determine who it actually belonged to in order to alert them that it was publically accessible. Put another way, not only was the complete dataset of US voter records available online for the taking to anyone in the world, but it was impossible to figure out whose database it actually was in order to get it taken down. Only after contacting federal law enforcement was the website eventually disabled.

This is the future of political campaigning in the 21st century in which we are all just data points in giant psychographic models that attempt to figure out how best to make us vote a certain way. Welcome to our big data future.