After we published our Q&A on the Connecticut racial profiling data, we got a handful of e-mails about the dataset, which looks at every police stop in the state, by town. Sharing the dataset allowed anyone to dig into the numbers and point out errors — and a handful of readers had both valid and invalid findings.

But one e-mail caught our eye.

It was from Tom Frenaye, the former first selectman of Suffield, Connecticut, a town of about 16,000 people on the Massachusetts border. He said the data for his town was skewed because a state prison, the MacDougall-Walker Correctional Institution, is located in Suffield — and the data counted incarcerated people as part of the driving population.

Estimated driving populations are important because it allows analysts to see whether a given racial group is pulled over more frequently than another. The researchers used U.S. Census data to make these estimates — but, as it turns out, there are some pitfalls to using that methodology in towns with unique circumstances.

Inmates can’t drive

In the report, the estimated minority driving population for Suffield is 8.78 percent, but Frenaye said the number was off. And he had data to back it up.

But a seven percent minority population — versus an 8.78 percent driving population — didn’t seem so off. It seemed plausible that the prison population was actually included in the data.

Flaws in the data

We contacted Ken Barone of Central Connecticut State University, who analyzed the data — and he confirmed Frenaye’s suspicions.

“We used Census data, and the Census counts [the inmates] as part of the town,” Barone said. “We’re going to go back and recalculate going forward.”

Barone said this was the second error caught in the data release. The other was a double-counting error in Granby; about 250 stops were in the database twice. That error was fixed. But this one, with the prison populations, will take some work. “We’ll have to spend some time gathering demographic populations for the prisons and pull them out,” Barone said.

Sharing data to clean data

Sharing the data with the public, Barone said, allows people to come forward and point out things the analysts didn’t consider, like the Suffield prison.

Barone said there are two points at which errors can be introduced into this dataset: 1) When an officer inputs the data after a stop, and 2) when a data vendor incorrectly codes a stop. (Municipalities use vendors to collect this data, and submit it through the Criminal Justice Information System, or CJIS.)

The dataset is vast and analysts can’t account for a factors in every town — so Barone encourages people to contact him with anything in the data he should consider.

Did you also see something in the data? Tell us in the comments section.