Tag Archives: Data Valdez

In most parts of Europe, since the totalitarian governments of the inter-war period, pressure from governments to make their citizens legible has been hard to resist. Germany now has universal biometric ID cards for all adults, which police have a right to demand to see, irrespective of whether they have probable cause of your involvement in a crime; 24 of the 27 EU states have mandatory national ID cards.

Biometrics matter, because outside of science fiction, they can’t be changed. During refugee crises, deep anxieties – Who are these people? Why are they coming here? – induce governments to pin people down to an unchanging identity, like bugs in a biologist’s cabinet.

This is a fundamental difference between mainly-autochthonous and mainly-settler societies. Ideologically, the United States came to be out of westward conquest, by people eager to refashion themselves away from the religious and social strictures of more settled societies. At Ellis Island, you could change your name; on the frontier, a white man could be whoever he declared himself to be. As Walt Whitman wrote, “Of every hue and caste am I, of every rank and religion, / A farmer, mechanic, artist, gentleman, sailor, quaker, / Prisoner, fancy-man, rowdy, lawyer, physician, priest. / I resist any thing better than my own diversity, Breathe the air but leave plenty after me.” Settler societies are supposed to “leave plenty” of air to breathe for those who come to settle after them; they’re supposed to leave room to self-refashion. Anonymity, pseudonymity and the ability to erase your tracks bolster your power versus the state.

A former Middle East advisor to President Obama, Steven Simon, suggested in Saturday’s New York Times that the administration’s response to the Paris attacks was likely to include “Tighter border controls, more intensive surveillance in the U.S. and more outreach to local communities in the hope that extremists will be fingered by their friends and family. And a tightening of already intimate cooperation with European intelligence agencies.”

These proposals, if adopted, would be immensely counterproductive, and here’s why.

First, tighter border controls are irrelevant to this attack. It appears that all of the attackers so far identified, were EU citizens; none were refugees from Syria.

Second, France already had a draconian mass surveillance law, which came into effect at the beginning of October. It didn’t work to thwart these attacks. The reason is the “false positives” problem. Any system employing demographics, metadata, or past behavior, inevitably sweeps up a vast majority of innocent people, and diverts police and intelligence resources towards ruling them out. This LA Times study of “pre-crime” efforts to prevent violent crimes by US Army soldiers added every variable they could, and still, for every 15 people who did in fact commit violence in a given year in their set of suspects, 985 did not. Similarly, before the Boston Marathon attacks, the FBI had flagged Tamarlen Tsarnaev for interview; but they interview hundreds of flagged people every week, and have no way of knowing which among them will actually commit an attack. So, it appears that six weeks before the attacks, France’s intelligence agencies snowed themselves under with an ocean of false positives, and weren’t able to detect among that traffic the communications that were suspicious. They can’t be faulted for not being able to do so; it’s mathematically impossible. All mass surveillance allows is what’s happening now, which is to be able to go back into the system and see what you missed.

Third, Muslim and black communities were already under very heavy pressure in France, and are already under very heavy pressure here from the FBI, through its “Countering Violent Extremism” program, to “finger friends and family”. CVE uses models of radicalization with no solid academic basis to identify people as potentially radical simply because they have changed their dietary habits or become more devout about their religion. To make their numbers, the FBI has even resorted, in case after case, to creating their own terrorists out of young, poor, and mentally unstable young men, using confidential informants to lead them through every stage of devising a plot till they do something the FBI can arrest them for. We don’t need more of that either.

Last, if we react in this particular way, it also serves the ends of the violent criminals who committed this attack. Lacking resources themselves to wage war, they seek to provoke a backlash that will garner them support among the peaceful Muslim majority. Back in the day, the IRA posed as the defenders of the rights of peaceful Northern Irish Catholics against foreign oppression; today, the Islamic State poses as the defenders of the rights of peaceful Muslims against foreign oppression. A governmental backlash against Muslims in general will merely bolster their propaganda: See? We told you they’re out to get you! Come join us!

Instead, we should use the Constitution to solve the false positives problem. The Fourth Amendment bars mass surveillance, requiring, before surveillance is conducted, a warrant based on individualized probable cause of involvement in actual criminal activity. Imagine that, instead of having a “TIDE” terrorist database with 750,000+ names on it, it were limited to a maximum of one thousand, but that the one thousand were each investigated thoroughly on the basis of actual evidence. The surveillance agencies would waste a lot less time chasing fruitless leads, building data centers, or shoveling money to software vendors to try to solve this insoluble problem.

Foreign policy and economic solutions are beyond our remit, but it should be obvious that in order to drain the Islamic State of support, we have to provide those fleeing its rule with a credible chance at a better life. At the bare minimum, we should let them know that if they come to our country, they will be treated justly, not kept constantly under watch even if innocent of any crime.

In the hothouse of Congress, members have been sweating over the need to do something – anything – about “cybersecurity.” They were under pressure from the administration, the intelligence services, and the tech industry. But the latest news is that the Republican majority will be turning, in the few days left before the recess, from the contentious highways bill to a bill to defund Planned Parenthood, likely shifting the previously-catastrophically-urgent cybersecurity crisis through to the fall. So Congress, like my seven-year-olds in school assembly, can take a few deep breaths and imagine that they can smell a flower.

The truth is, there never was a “cybersecurity crisis.” Companies are already legally allowed to share information on hacking attempts with the government, and they usually do. This debate is not really about making US companies or the US government more secure; it’s about putting more of your information, that you have voluntarily shared with US companies, into the government’s hands, without companies being liable for violating their privacy policies for sharing personally identifiable information. All proposals on the table in Congress would immunize companies from suit in this way. In this sense, it would be perfectly all right for Congress to do nothing.

Nevertheless, there is a cybersecurity problem that is worth trying to solve. The government is not a good custodian of our data. Its networks are often poorly secured and vulnerable to outside intrusion. In the surveillance arena, there are now over five million people with security clearances, who are in a position to leak sensitive information. Cultivating a more disciplined approach to network protection and data retention would seem to be a good idea. That’s where the principle above comes in.

In this spirit, let’s calmly reflect on what a bill dealing with this real problem would look like.

The Atlantic picks up on a story from the Center for Investigative Reporting that in 2012, the LA County Sheriff’s Department secretly tested a civilian surveillance aircraft by flying it over a town in their jurisdiction and taking high-resolution footage of everything visibly happening there, over a period of up to six hours (highlights are ours):

If it’s adopted, Americans can be policed like Iraqis and Afghanis under occupation – and at bargain prices:

McNutt, who holds a doctorate in rapid product development, helped build wide-area surveillance to hunt down bombing suspects in Iraq and Afghanistan. He decided that clusters of high-powered surveillance cameras attached to the belly of small civilian aircraft could be a game-changer in U.S. law enforcement.

“Our whole system costs less than the price of a single police helicopter and costs less for an hour to operate than a police helicopter,” McNutt said. “But at the same time, it watches 10,000 times the area that a police helicopter could watch.”

A sergeant in the L.A. County Sheriff’s office compared the technology to Big Brother, which didn’t stop him from deploying it over a string of necklace snatchings.

The town they chose? Compton. Yes, that Compton, but it’s not the same Compton as yesteryear. Its boosters are now touting it as the hip, countercultural Brooklyn of the LA area. It has an inspirational new Millennial mayor, Aja Brown, who has garnered comparisons to Cory Booker. Its crime rate is down sixty percent, and it’s now majority-Latino. But it still has a median household income of $42,335, and still, even after all its struggles, somehow found itself the first city selected for mass surveillance, over, say, majority-white, tony Santa Clarita (median household income $91,450). Well, blow me down with a post-racial colorblind goddamn feather.

In related news, the NSA, under its MYSTIC and RETRO programs, was revealed last month to have been collecting the contents of the phone communications of an entire country (unnamed, but probably Iraq).

Believe it or not, this is the program’s actual logo.

These two stories are essentially the same. Developments in technology allow law enforcement surveillance to sweep past legal constraints intended for an era where collecting, storing and analyzing so much data was inconceivable. In luckless Compton, the Supreme Court’s 1989 decision in Florida v. Riley renders “wide area surveillance” presumptively constitutional. In luckless Iraq, the expansive powers of Executive Order 12333 and the FISA Amendments Act impose effectively no constraints on the NSA in intercepting the communications of foreign nations.

Massachusetts has two “fusion centers”, mostly state-funded, which aggregate enormous amounts of data on innocent Massachusetts residents, with the notion of preventing terrorist attacks. When you call the “See Something, Say Something” line, the information goes into “Suspicious Activity Reports.” The ACLU of Massachusetts documented that the Boston fusion center (“BRIC”) had actually spent its time harassing peaceful activists rather than thwarting terrorism, which is one of the reasons why there will be nationwide protests against fusion centers on April 10, including in Boston.

In response to the ACLU revelations, Rep. Jason Lewis (now the newly elected Sen. Jason Lewis) filed a fusion center reform bill on Beacon Hill. Disconcerted at the prospect of more sunshine on their work, the Commonwealth Fusion Center, the fusion center in Maynard, offered him and other legislators a courtesy tour of their facility, to try to explain what good work they were doing. As an example of that work, they cited their First Amendment-violating harassment of an Arlington man who was not actually planning any violent crime, but who had tweeted about it being a good idea to shoot statists. They also provided to Rep. Lewis copies of various policies that they follow, including their Privacy Policy (updated 06.13.2013) and their policy on First Amendment investigations. Rep. Lewis then asked Digital Fourth to evaluate the policies they had provided, to assess whether they were constitutional. We enthusiastically agreed, and the resulting report is here.

Just before Christmas, Muckrock and the ACLU of Massachusetts brought out excellent articles based on a full year of Muckrock’s investigative reporting into Boston PD’s use of automated license plate recognition technology.

ALPR systems automatically photograph and store in a police database the license plates of any car an ALPR-equipped police vehicle passes. The car may be parked or driving. It could be on the Pike, in a driveway, or anywhere a camera can reach. The question was, what does the Boston PD do with the mountain of data once it has it?

The NSA has just vigorously denied that their new Utah Data Center, intended for storing and processing intelligence data, will be used to spy on US citizens. The center will have a capacity of at least one yottabyte, and will provide employment for 100-200 people. With the most generous assumptions [200 employees, all employed only on reviewing the data, only one yottabyte of data, ten years to collect the yottabyte, 5GB per movie], each employee would be responsible on average for reviewing 4500 billion terabytes, or approximately 23 million years’ worth of Blu-ray quality movies, every year.

Must…keep…watching…my…country…needs…me

This astounding and continually increasing mismatch shows that we are well beyond the point where law enforcement is able to have a human review a manageable amount of the data in its possession potentially relating to terrorist threats. Computer processing power doubles every two years, but law enforcement employment is rising at a rate of about 7% every ten years, and nobody’s going to pay for it to double every two years instead. Purely machine-based review inevitably carries with it a far higher probability that important things will be missed, even if we were to suppose that the data was entirely accurate to begin with – which it certainly is not.

So why is anybody surprised that Tamerlan Tsarnaev, the elder of the Boston Marathon bombing suspects and one of around 750,000 people in the TIDE database, was not stopped at the border? That facial recognition software wasn’t able to flag him as a match for a suspect? That the fusion centers, intended to synthesize data into actionable “suspicious activity reports”, flag things too late for them to be of any use? That the Air Force is panicking a little at not having enough people to process the data provided by our drone fleet?

They are missing something very simple. We don’t need a terrorism database with 750,000 names on it. There are not 750,000 people out there who pose any sort of realistic threat to America. If the “terrorism watch list” were limited by law to a thousand records, then law enforcement would have to focus only on the thousand most serious threats. Given the real and likely manpower of the federal government, and the rarity of actual terrorism, that’s more than enough. If law enforcement used the power of the Fourth Amendment, instead of trying to find ways round it, it could focus more on the highest-probability threats.

Yes, they would miss stuff. That’s inevitable under both a tight and a loose system. But a tight system has the added advantages that it protects more people’s liberties, and costs a lot less.

UPDATE: With the help of a New Yorker fact-checker, the figure of “400 billion terabytes” above has been corrected to “500 billion terabytes”.

Just because one part of the government has a certain set of data, doesn’t mean that all other parts of the government should have it. Your tax return is kept privately within the IRS; records of your immigration applications stay with USCIS; Medicare keeps your health records private; and so on. This kind of data confidentiality used to be routine; but once again, in the service of terrorism, the normal limitations on government power are considered expendable.

This is what happened.

NCTC asked the Department of Homeland Security for access to a database on terror suspects. DHS gave NCTC the disks, on the condition that NCTC, within 30 days, remove information regarding “innocent US persons” (innocent non-US persons are apparently fair game).

Possibly terroristic non-US person Malala Yousafzai.

NCTC couldn’t do it. In fact, after 30 days they had barely been able to download the database from the disks. Even with another 30-day extension, they couldn’t do it, and in response to this failure, they have demanded, and gotten, even broader access to even more government databases. The only constraint is that any time they access a new database, they have to publish that fact in the Federal Register.

Their problems in removing “innocent US persons”‘ data are completely understandable, because NCTC was anxious that today’s innocent person may not turn out to have been innocent tomorrow. How do you remove innocent people, when nobody is provably innocent?

Your government, protecting you. With science!

From a resource standpoint, how could NCTC possibly deploy enough skilled analysts to prove the innocence of the (at minimum) hundreds of thousands of people this one database contained? And that’s just one of the many databases to which they will now have access!

This is an example of what, over at EFF when I was interning there in 2000, we used to call “Data Valdez”. The amount of data being created is enormous and essentially impossible to thoroughly analyze. The federal government has, in its various parts, access to data on every part of our lives, but no matter how fast its computers, it will never, ever have the human resources necessary to process it properly. Demanding access to ever greater oceans of data is not going to help. It’s a processing problem, not a data problem.

That’s why, at Digital Fourth, we recognize the wisdom for law enforcement of aggressively applying the constraints identified in the Fourth Amendment. Even if you have the ability to collect more data, it works better to consciously commit to collecting less. Law enforcement should, for its own sanity and ours, collect, retain and use in investigations only data that is related to investigations of actual, well-defined crimes committed by previously identified people. Only then will the volume of data collected be low enough that law enforcement will be able to process it thoughtfully and intelligently. Yes, that means that connections will be missed that will only become apparent after the fact of an attack. But we cannot insure perfectly against the probability of future attacks. We have to invest our resources rationally, and we have vastly over-invested in preventing terrorist attacks relative to other things that kill many more Americans.