Researchers undercut anonymity of voting, test-taking

Princeton researchers have shown that Scantron-style forms can be de- …

At some point in your life you've probably been asked to take out a #2 pencil and fill in a series of numbered ovals. This method for gathering standardized data is widely used in elections, tests, and surveys, and it's generally considered to be anonymous: if you don't put your name at the top, you don't expect your answers can be traced back to you.

New research from Princeton University calls that assumption into question. A team led by computer science professor (and current Chief Technologist of the Federal Trade Commission) Ed Felten has demonstrated software techniques for re-identification of respondents using only images of their filled-in bubbles. Their technology has both benign uses—detecting cheating in standardized tests—and malicious ones like undermining the secret ballot.

Co-author Will Clarkson described the group's findings in a Tuesday blog post. The researchers obtained copies of surveys completed by 92 different high school students, which they scanned with a high-resolution scanner. A labeled subset of the bubbles—12 bubbles from each respondent—was used to train a classifier that used a combination of machine learning techniques described in the research paper.

This classifier was then given the remainder of the bubbles—8 from each respondent—and the classifier was asked to re-identify them. It was surprisingly accurate. It got the right answer on the first try (out of 92 options) more than half the time. And the correct answer was on its top ten list more than 90 percent of the time.

The technique has a number of possible applications, both positive and negative. One negative application involves undermining the secret ballot. Some jurisdictions, such as Humboldt County, CA, offer digital images of all ballots cast in recent elections. If a third party obtained a sample of filled-in bubbles from a known Humboldt County voter—perhaps as part of an employment applications—he could use the Princeton team's techniques to identify the voter's ballot.

In principle, this raises voter intimidation concerns. For example, an employer might threaten to fire employees who fail to vote for his preferred candidates. But Joe Calandrino, the study's lead author, concedes that the 51 percent accuracy rate "does leave some room for deniability" for a voter who faces such intimidation. The problem deserves further study, but Humboldt County voters shouldn't lose sleep over it.

A more positive application of the team's research is the detection of cheating. For example, a high school teacher whose students are taking a high-stakes test might be tempted to fill in some answers for his students after they have turned them in. The techniques described by Calandrino et al could be used to scan a large number of documents looking for evidence that the same person filled out bubbles on multiple tests.

Here too, there are questions about whether the algorithm is powerful enough to give useful results. But Calandrino argues it would. First, in this application the algorithm would have many more samples to work with, which might improve its accuracy. More importantly, Calandrino says that he "sees our work as fitting in with other risk-based approaches, like answer analysis." By itself, the algorithm may not be able to definitively prove someone cheated, but it offers valuable, independent evidence of wrongdoing.

We covered related work in 2009. The same Princeton team found that variations in the structure of paper allowed it to be "fingerprinted" with a commodity scanner.

Disclosure: I'm a former member of Felten's research group. I reviewed an early draft of the study, but did not participate in the research.

26 Reader Comments

OK, so IF they have access to forms you filled out including personally identifying information, and can scan at very high resolution your bubbles, they can theoretically identify other bubbles you filled out on other forms, with "surprising accuracy", at least when the comparison pool is small, and when you use pencil instead of a bleeding ink or marker....

Facial recognition is highly accurate in a small pool, but try to find one face in 100 million face databaseand the accuracy falls to near zero, at best finding a hundred or so "possible" patches. How you bubble is very much less accurate than handwriting recognition, and can even be effected by simply switching pencils between uses where one has a deformed tip vs the other... This is no security risk to anyone unless we're talking a very small pool of pre-screened people. Um, lets just use markers on scantron and solve the problem.... #2 pencil is no longer required.

Yet another tool for identifying people if you have enough data. Timing of keystrokes to how they fill in bubble sheets to netflix selections (correlated against IMDB and other sources). I wonder if there's any research in behavioral anonymity tools to defeat such analysis. Anybody got links?

It's pretty easy to tell by looking the "method" of filling that was used, so in the case of the teacher they'd just have to mimic how the student bubbled close enough to avoid detection across all tests.

My favorite was tight diagonal zig-zags followed by two or three circles around the edges to knock off points.

I dunno. In this part of California, the ballot have large ovals, and you use a felt-tip marker to fill them in. I use a more horizontal stroke with those, as opposed to a diagonal stroke when I'm filling in circles with a pencil.

Can you actually be fired for voting differently to your boss in the US? Or was that just an exaggeration to make a point?

Most of the US uses at-will employment.

Quote:

any hiring is presumed to be "at will"; that is, the employer is free to discharge individuals "for good cause, or bad cause, or no cause at all," and the employee is equally free to quit, strike, or otherwise cease work

As long as it's not explicitly stated that they terminated your employment because of your voting then it's ok. If you can prove that the termination was because of politics then you have a first amendment case.

So I take it it's the way the bubbles are drawn that is, more or less, unique to each individual, not the answers he chooses. It wasn't very clear in the article, at least to me. I guess this falls in the same category as facial and handwriting recognition.

One problem I foresee here is that people can easily change their drawing styles (even easier than handwriting) to confuse the algorithm. The only downside is that they have to remain ever vigilant when filling out bubbles (least their automations kick in and they start drawing in the same style).

I'm sorry, but the ability to come close when guessing, even to a point of "We can divide the field to one ninth with a %90 accuracy" doesn't come close to removing anonmity. One-ninth is a small number if you start with a small enough sample, but otherwise it is near useless.

In other words, in any read world application, this couldn't be used. In fact, as the sample size they deal with goes up, the little black dots would probably start to look a lot alike and it's accuracy would likely suffer.

Sure, some people fill in an outline, then fill in the center. Some scribble up and down rather then left and right. But the larger the population size, the worse accuracy you will get, particularly when the sample size of answers is smaller (It's not like there are a lot of 25 scantron answer vote sheets). Finally, you have to have a sample that is already known to compare it to.

There is a reason that we usually use humans for handwriting analysis.

Look at the caveats here: This works with some level of confidence when (a) the user uses the same implement (b) on the same day (c) in the same type of response sheet (d) with the same level of thought put into their mark and (e) with a highly-expressive implement (#2 pencil) chosen instead of a low-expression implement (marker or stamp).

I know that on different days I fill in "scantron" type ovals differently. I also tend to fill them in differently when I am saying "I ... think ..." in my head versus "oh, HELL yeah!", and of course there's not much escaping the fact that the "draw a bar in this rectangle" and the "fill this oval COMPLETELY with NO STRAY MARKS" different types of machines cause different fill-in patterns for me.

So, I think trying to correlate voting scantrons with casual surveys or benefit application sheets, etc, would be incredibly error-prone, giving a minor shade more data, but adding a lot of noise. Correlating multiple votes on the same ballot might be somewhat fruitful, but voting booths tend to be administered and guarded well enough that a partially-entered ballot is unlikely to get "filled out" by anyone else before it is read.

On a test, you really need to make sure you control for different types of questions, as a user's marks will vary based on how confident they are and how tired they are. Which, of course, would also be the ones the "scruple-free proctor" would target for re-marking, so I'm not sure how practical any "you filled in this oval differently" arguments might be.

This is very similar to handwriting analysis, were handwriting analysis completely restricted to the drawing of periods.

I know it's pedantic, but the "#2" grade of pencil is a US-specific designation. Almost everyone in Europe (and many other people across the world) would refer to this as an HB pencil.

Is this to clarify the issue for non US readers?Simply calling it an "HB" isn't really accurate enough anyway, which manufacturer's pencil are we describing here? There really isn't a standardized pencil grade between manufacturers.

Pearly Penile Papules Treatments It is not necessarily their fault for missing an unprecedented event as they are focused strictly on the candidates and voter demographics. On the surface this seems sufficient as demographics represent people.

I know it's pedantic, but the "#2" grade of pencil is a US-specific designation. Almost everyone in Europe (and many other people across the world) would refer to this as an HB pencil.

Is this to clarify the issue for non US readers?Simply calling it an "HB" isn't really accurate enough anyway, which manufacturer's pencil are we describing here? There really isn't a standardized pencil grade between manufacturers.

What we care about is the "lead", not the pencil, and there IS a standard for that, and HB is one of those standards. Specifically, it refers to how hard/soft the "lead" is, it's thickness is simply measured in millimeters. (And since we're being pedantic, a 'pencil' is just the layman's term for a Lead Holder, not the part that makes the actual marks)It's called HB by the standard, and if you are a trained draftsman, engineer, artist, architect, etc. you'll be very familiar with the term no matter what part of the world you are from. A #2 Pencil uses HB lead, but it had some additional definitions which nobody really cares about anymore. (It was part of an office equipment standardization in the US back when everything was done by hand)

The reason they chose the #2 Pencil/HB lead type is because you can make a dark enough mark for the Scantron to read, without having to push so hard you tear the paper. Softer leads leave too much residue which tends to gunk up the reader faster and cause misreads.

As for the topic at hand, it's just like using handwriting analysis. You need a known sample from the person, and access to the original 'anonymous' form or a very high-resolution copy. You also need a minimum number of ovals/squares in each sample in order to make any kind of accurate prediction, but where that point is they just don't mention.

I know it's pedantic, but the "#2" grade of pencil is a US-specific designation. Almost everyone in Europe (and many other people across the world) would refer to this as an HB pencil.

Is this to clarify the issue for non US readers?Simply calling it an "HB" isn't really accurate enough anyway, which manufacturer's pencil are we describing here? There really isn't a standardized pencil grade between manufacturers.

What we care about is the "lead", not the pencil, and there IS a standard for that, and HB is one of those standards. Specifically, it refers to how hard/soft the "lead" is, it's thickness is simply measured in millimeters. (And since we're being pedantic, a 'pencil' is just the layman's term for a Lead Holder, not the part that makes the actual marks)It's called HB by the standard, and if you are a trained draftsman, engineer, artist, architect, etc. you'll be very familiar with the term no matter what part of the world you are from. A #2 Pencil uses HB lead, but it had some additional definitions which nobody really cares about anymore. (It was part of an office equipment standardization in the US back when everything was done by hand)

The reason they chose the #2 Pencil/HB lead type is because you can make a dark enough mark for the Scantron to read, without having to push so hard you tear the paper. Softer leads leave too much residue which tends to gunk up the reader faster and cause misreads. [...]

All that the grading means is that the Faber Castell HB is darker/softer than the Faber Castell F, and a Staedtler HB is going to be darker/softer than a Staedtler F. But that Faber Castell F might be the same as the Staedtler HB, or vice versa, or somewhere in between. The two manufacturers' HBs are probably pretty close, but there's absolutely no way to tell for sure, since grading is not standardized across manufacturers.

mexaly Everybody knows you can identify voters by their hanging chads.

While outside the scope of this article, it should be noted that the problem of "hanging chads" only occurred in a very small percentage of punch cards and that could have been eliminated by the voters themselves if they had checked their cards before handing them in. It only became an issue in contested and / or fixed possibly rigged elections and over the years that I used that system never once had a problem with it or felt it to be unsecure (like the touch screen systems).

That aside, i don't care who really knows how I vote (although yes, i prefer the "anonymity" which supposedly exists in a secret ballot. (they can always track back via ballot number and sign-up sheets exactly which ballot was yours). I have yet to run across anyone stupid enough to tell me how to vote and i've been voting for over 40 years now.

Regarding the pencils....HB is a standard "medium" lead in the US; which may not be a standard, but was easier for folks to understand than 4B, 3B, 2B, B, HB, H, 2H, 3H, 4H, etc without counting special leads as "mark sense".

Timothy B. Lee / Timothy covers tech policy for Ars, with a particular focus on patent and copyright law, privacy, free speech, and open government. His writing has appeared in Slate, Reason, Wired, and the New York Times.