Trump’s Voter Fraud Commission Is Facing A Tough Data Challenge

President Trump selected Kansas Secretary of State Kris Kobach to head his Presidential Advisory Commission on Election Integrity.

Jabin Botsford / The Washington Post via Getty Images

Every state in the union was sent a letter last week seeking data from its voter rolls — including names, addresses, dates of birth, political party affiliation and the last four digits of voters’ Social Security numbers. The request came from Kris Kobach, vice chairman of the Presidential Advisory Commission on Election Integrity. States have responded in a variety of ways: About 20 have agreed to send publicly available data, including Kansas, where Kobach is secretary of state. Other states have said the commission can only have that data if they buy it. A few — including California and Mississippi — have said they won’t be complying with the request at all.

Just a few years ago, compliance with a letter like this would likely have been impossible. Until recently, many states did not have voter data stored in a centralized location or collected in a uniform way. In most cases, the records existed only on paper. “Ten years ago, if you wanted to get voter data from Pennsylvania, you had to call every single county,” said Samantha Luks, managing director of scientific research for the polling organization YouGov.

And although states have made huge strides in internal record-keeping consistency and accessibility — largely because of funding and rules connected to the Help America Vote Act of 2002 — experts say that the way data is collected and stored still varies widely from state to state and that the information is not available in an easily shareable format. “This letter was almost identical to the type of letter I’ve seen sent out by graduate students who don’t know about election administration,” said Charles Stewart, professor of political science at MIT and co-director of the Voting Technology Project. “It’s like, ‘Oh, gee, there’s 50 states and D.C. and it’s probably on a file somewhere and they’ll just put it on a thumb drive and send it to me. How hard can it be?’”

Pretty hard, it turns out. The advisory commission letter did not spell out what it intends to do with the state-level data, beyond making it public in some way. That, combined with concerns about the security of the data storage, led the nonprofit Electronic Privacy Information Center to ask a U.S. district judge to issue a temporary restraining order to block the commission from gathering any state’s data.

In an email, Kobach’s state office in Kansas said it couldn’t comment on the commission or its plans. However, statements from Kobach suggest that the goal is to merge the entire country’s voter data into a single, searchable database that could then be cross-referenced with other sources, such as immigration records, to identify people who cannot legally vote or who are registered, and possibly voting, in multiple states. This has never been done before.

A national comparison of voter data is possible, experts said, but doing it right would require many hours of labor, large investments in data security and analysis systems, and many decisions about how to match and merge data in ways that will reduce the number of false positives (voter records that look like they belong to the same person, but don’t). The Electronic Registration Information Center — a nonprofit cooperative of 21 states and Washington, D.C., that share and analyze voter data in much the same way Kobach has proposed to do nationally — took almost three years just to set up its technical infrastructure, design privacy protection and member bylaws, and formalize a system for data input. The advisory commission is supposed to report results to the president by sometime next year. Stewart, Luks and others are concerned that the committee does not understand or appreciate the scale of its task.

In general, the idea of cross-checking and cleaning up state voter registries makes a great deal of sense, experts said, because those rolls are all but guaranteed to contain dead people, people who have moved and people whose names have changed.

From academic research and state records, we also know that there are relatively small numbers of people registered to vote who should not be — such as noncitizens or people who’ve been convicted of felonies. The state of Virginia, for instance, removed 404 noncitizens from its voter registration records from 2015 to 2016. (Among noncitizens who are registered to vote, a small subset make it to the polls and successfully cast a ballot. No one knows what the exact number is because there’s no centralized database of voter fraud cases.)

But figuring out who should be removed from voter rolls is no easy task. Experts who have worked with voter data say the commission would need to address three key problems.

First, the data is typically dirty, said Jan Leighley, who is a professor of government at American University and studies voting and voter behavior. By that, she means riddled with typos and complicated by people who are not easy to distinguish from one another. For instance, voter registration forms don’t always ask — and people don’t always note — whether they share a name with a relative. In a 2010 press conference before his election as Kansas secretary of state, Kobach cited the case of Albert K. Brewer, a dead man who was still voting, as an example of voter fraud. But the Topeka Capital-Journal found that Brewer was alive. Kobach’s staff had confused him with his father, also named Albert Brewer but with a different middle initial. The elder Brewer had indeed died, in 1996, on his son’s birthday.

A second problem is that there’s no such thing as a universal voter registration form. Some states collect data that other states don’t, and they store it in software systems that don’t necessarily play well together. Also, what information is publicly available varies widely from state to state, and states collect the same data but call it different things in their databases, said Shane Hamlin, executive director of the Electronic Registration Information Center. That creates problems when it comes time to use a computer to compare the files. The center requires participating states to edit records before submitting them so that every state submits the same categories of data under the same terminology. The advisory commission letter did not make that kind of request. Because of that, Leighley said, they’ll have to make decisions about how to reclassify data and constrain or expand the information available. If the commission’s data rules and methodology aren’t spelled out and transparent, it could be impossible for anyone to replicate the results.

Finally, once all the data is together, records will have to be matched — either state to state or to records in other databases — to find the people who are registered in multiple states or the people who show up on lists of felons and also on voter rolls. When it does this, ERIC uses an algorithm developed by IBM and previously used by the CIA. And the algorithm matters a lot. That, and the amount of information on each voter that you have access to, can have a big impact on results. That’s because this kind of search produces potential matches, not actual matches. Any attempt at using an algorithm to match voter records between databases will yield some false positives. Luks has worked on the Cooperative Congressional Election Study, one of the largest voter surveys in the U.S., and has used several of these programs to match people in that survey with actual voter records. None of them was perfect, she said.

But some are more prone to false positives than others. In addition to his other roles, Kobach runs Interstate Crosscheck, a free service that looks for matches between voter records. Twenty-eight states participate in the service, Samantha Poetter, Kobach’s director of public information, said in an email. Although similar in concept to ERIC, Interstate Crosscheck has a reputation for producing high numbers of false positives; its own documentation acknowledges that false positives are an issue. Research analyzing the accuracy of matching based on just first name, last name and birth date — Crosscheck’s method — suggests that this methodology is likely to turn up 200 false positives for every discovery of a true double voter. (That paper is still going through the process of peer review.) Virginia — a state that Stewart applauded for its high-quality voter registration data cleanup efforts — is a member of both ERIC and Interstate Crosscheck. In its 2016 voter data report, Virginia noted that Crosscheck was particularly prone to false positives. “The need to greatly refine and analyze Crosscheck data has required significant … staff resources that are not accounted for when proponents claim the program is ‘free.’”

And false positives can have a big impact on actual voters. States are not allowed to block people from voting based just on an algorithmic database match, Stewart said. In most cases, states send verification letters to people suspected of wrongly being registered to vote, who then have to document by a certain date that they are eligible. But the process can still end with legitimate voters’ being denied their rights. That happened in Florida in 2000, when at least 1,100 people were wrongly dropped from the voter registration rolls because a state analysis had incorrectly determined that they were convicted felons. The Palm Beach Post reported that the state had asked the company it contracted with to use broad parameters — in other words, to set up the data analysis so that it prioritized finding as many felons as possible, rather than limiting false positives.

Given the very real issues with out-of-date and poorly maintained voter registries, experts I spoke with didn’t want to see the issue ignored. Rather, they said the presidential commission is going about it the wrong way. To Stewart, the first step should have been a survey to find out what states are already doing to clean up their registries and what the federal government could do to make that task easier. Virginia, for instance, noted in the 2016 voter data report that the Help America Vote Act funding that helps support its efforts is dwindling. And Stewart noted that there is already an organization doing what the committee would like to do, but with well-established infrastructure. “A better use of resources would be to buy every state a subscription to ERIC and have the matches done in a highly professional way,” he said.

Maggie Koerth-Baker is a senior science writer for FiveThirtyEight. @maggiekb1