Database configuration issues expose 191 million voter records

A misconfigured database has led to the disclosure of 191 million voter records. The database, discovered by researcher Chris Vickery, doesn't seem to have an owner; it's just sitting in the public – waiting to be discovered by anyone who happens to be looking.

What's in the database? The database was discovered by researcher Chris Vickery, who shared his findings with Databreaches.net.

The two attempted to locate the owner of the database based on the records it housed and other details. However, their attempts didn't pan out, so they came to Salted Hash for assistance. Never one to shy away from a puzzle, I agreed to help. The best place to start looking was the database itself. That's when Vickery sent me my personal voter record from the database. It was current based on the elections listed. My personal information was accurate too. Vickery discovered his own record as well, so I asked him about his initial reaction.

"My immediate reaction was disbelief," Vickery said. "I needed to know if this was real, so I quickly located the Texas records and ran a search for my own name. I was outraged at the result. Sitting right in front of my eyes, in a strange, random database I had found on the Internet, were details that could lead anyone straight to me. How could someone with 191 million such records be so careless?"

The database contains a voter's full name (first, middle, last), their home address, mailing address, a unique voter ID, state voter ID, gender, date of birth, date of registration, phone number, a yes/no field for if the number is on the national do-not-call list, political affiliation, and a detailed voting history since 2000. In addition, the database contains fields for voter prediction scores.

All voter information, except for a few elements protected by law in some states, is public record. For example, in Ohio, voter records are posted online. Other states make obtaining voter records a bit more challenging or outright expensive, but they're still available. For the most part, voter data is restricted to non-commercial purposes. However, each state has its own rules for such data.

Point in case, in Alaska, Arkansas, and Colorado, voter data has no restrictions placed on it. However, in California, voter data may only be used for political purposes and may not be made available to persons outside of the U.S. South Dakota has a law that is directly related to this article's focus: "...the voter registration data obtained from the statewide voter registration database may not be used or sold for any commercial purpose and may not be placed for unrestricted access on the internet."

The database discovered by Vickery doesn't contain Social Security Numbers or driver license numbers, but it's still a massive collection of data. Again, most states or data brokers require that anyone obtaining voter data affirm that they're not going to use it for commercial gain and that they'll follow all related state laws. Yet, because the information Vickery discovered is in a database available to anyone on the Internet who knows how to find it, it's essentially unrestricted data.

I shared my personal voting file with a few election sources and experts. One of them offered a simple explanation as to why it exists, and what a database such as this could be used for during an election season. "This file has all the basic information that a voter file would have on you: your address, date of birth, every election you did or didn't vote in, and some basic demographic information. Campaigns use all of [this] information to target their messages more efficiently: to make sure they're targeting not just the right people, but people who will actually end up voting. Most of this data is public record, with the caveat that it can only be used for campaign purposes," explained Maclen Zilber, a Democratic political consultant with the firm Shallman Communications.

"Some major voting data companies will give each voter a rating of how likely they are to turn out and vote, how likely they are to support a given political party, and even more niche questions such as how likely they are to support a specific issue. The prediction score row suggests that this file is from a company selling voter data, not just a file from a government database."

Who owns the database?

Salted Hash reached out to several political data firms in an effort to locate the owner of the exposed database. Dissent (admin of Databreaches.net) did the same thing. However, none of our efforts were successful. The following firms were contacted by Salted Hash for this story: Catalist, Political Data, Aristotle, L2 Political, and NGP VAN. Databreaches.net reached out to Nation Builder. Speaking to Dissent, Nation Builder said that the IP address hosting the database wasn't one of theirs, and it wasn't an IP address for any of their hosted clients.

As for the firms contacted by Salted Hash, each of them denied that the database was theirs, and in the case of NGP VAN, the technical aspects of the infrastructure (Linux vs. Windows) ruled them out because they're a Windows shop and the data is housed as part of a Linux build.A later attempt to contact i360, another political data firm, was unsuccessful. In addition, DSPolitical, TargetSmart, and Data Trust were also contacted about the database.

Conversations between TargetSmart and Salted Hash went as expected by this point; the database isn't theirs and they are not using that IP address. If DSPolitical and Data Trust respond to questions, this story will be updated. Data Trust has reached out with confirmation that the database isn't theirs.

How was this database compiled?

For the last week, Salted Hash has attempted to discover not only who owns the database that's been exposed to the public, but also how it was compiled. The hope was that if the owner couldn't be determined, then knowing the source of the data could be useful, as the vendor might be able to contact a customer and alert them to the problem.

As it turns out, researching this story was a bit complicated because of the Sanders / Clinton / NGP VAN voter database incident. Many of those contacted by Salted Hash assumed the two stories were somehow connected. To be perfectly clear, this story is not related to the Sanders / Clinton incident at all.

The NGP VAN incident involving the Sanders and Clinton campaigns centered on a software configuration error that resulted in the Sanders campaign seeing client scores from the Clinton camp. There were no voter records exposed, just client scores. In fact, the Sanders and Clinton campaigns share the exact same DNC voter database. The information exposed was added by one campaign, and the glitch allowed the other campaign to see it.

What Vickery has discovered is worse, because the data he discovered isn't a client score – it's a complete voter record for 191 million registered voters. The problem is, no one seems to care that this database is out there and no one wants to claim ownership.

As it turns out, many state and county elections offices charge for access to voter data. Sometimes, voter data is free, but when there's a cost involved, the total paid can be extreme. For example, in 2012, the fee to obtain 3 million voter registration records in Alabama was just over $29,000. Such costs can really cut into the budget of a political campaign, so campaign managers will turn to various political data firms and purchase the information needed at a lower cost.

One of the places campaigns turn to is Nation Builder. When Vickery first discovered the voter database, he and Dissent identified Nation Builder as the possible source of the data. However, as mentioned, Nation Builder denied that the IP address was theirs. They also said the IP wasn't being used by any of their hosted clients.

Digital maps and Big Data

But did the data in the exposed voter database come from Nation Builder? Based on the database schema and formatting, yes, it did. The personal voter file given to me by Vickery is clearly from a Nation Builder data set.In the U.S., few vendors maintain a national voter file. For those vendors that do, each voter file has signature components that are unique to that particular vendor – similar to a digital fingerprint.

In order to distinguish one voter file source from another, one can compare the file structure - how the vendor chooses to name various fields as well as the order in which they appear on their file. Another clear distinguishing factor is the unique voter ID - the code that the vendor assigns to each voter in the country. Each vendor that deals with national voter files has their own distinct approach to creating unique identifiers for voters.

In my voter record, the voter ID and the field names point directly to Nation Builder as the source of the data that's been exposed. When you compare my voter record to the file structure published by Nation Builder, there are clear similarities including the nbec_precinct_code. This code is unique to Nation Builder. It's shorthand for Nation Builder Election Center Precinct Code. In my case, that code is: 18097-Marion-Center (Marion County, Center Township). As for the voter ID, my voter record uses a voter ID code consisting of 32 letters and numbers separated by dashes: 058a902b-4e1d-4989-8fdb-4976f48fbfb6

Multiple firms questioned about the digital fingerprints in my voter record (UID / NBEC code) quickly concluded that Nation Builder was the source of the data, and one said that this would be clear to anyone who has ever viewed Nation Builder before. But is Nation Builder to blame? Not really...

So while Nation Builder denied any claim to the IP and the leaked database, it's entirely possible they might know who developed it – but that would require an extensive records check. This is because a developer or campaign wishing to access the Nation Builder Election Center would need to register their contact details, such as name and email address.

However, Nation Builder is under no obligation to identify customers, and once the data has been obtained, they cannot control what happens to it. In short, while they provided the data that's in my newly leaked voter record, they're not liable in any way for it being exposed.

And to be clear, I don't blame Nation Builder for my leaked record either, I blame the person(s) who developed the database and poorly configured its hosting. I'm just not sure who they are yet. Either way, I'm just one individual. There are more than 191 million people with records in this database. So if you're a registered voter in the U.S., you should know your data has been exposed.

Moreover, there is no way to know for sure how long this database has existed online, and for some of you – that's a problem. Point in case, the law enforcement officer that spoke to Dissent about their leaked voter file. Based on the voter count and some of the records, the database appears to be from Nation Builder's 2014 update from February or March, but unless the database owner is contacted and confirms, there's no way to prove that conclusion.

The concern is the potential for abuse. Stalking and the exposure of people who normally don't share their personal information is certainly an issue. There are other long term issues too. The personal information in this database, including political affiliation, date of birth, could be used to construct a targeted Phishing campaign. While most people are aware of financially-based Phishing attacks, or those focused on retail or shipping, a targeted list based on politics might have a higher level of success, especially this time of year heading into the 2016 election cycle.

Vickery and Dissent have reached out to federal law enforcement for assistance in locating the database's owner or removing it from public view. In addition, they've contacted the California Attorney General. At the time this article was written and published, the database was still live. It should be mentioned that earlier MacKeeper exposed personal data of 13M users.