Data mining at the FBI: digging for terrorists, insurance scammers, and identity thieves

In a report sent to Congress this week, the Department of Justice describes …

Students who turn in research papers four months late are likely to be rewarded with a big fat zero; the Department of Justice, on the other hand, has to face the wrath of Sen. Patrick Leahy (D-VT), who chairs the Senate Judiciary Committee. Leahy was unhappy after the DoJ turned in a late report on the FBI's use of data mining, but he was unhappier still about the report's conclusions than its tardiness.

"This report raises more questions than it answers and demonstrates just how dramatically the Bush Administration has expanded the use of this technology, often in secret, to collect and sift through Americans' most sensitive personal information," Leahy said in a statement. "Unfortunately, the Congress and the American public know very little about these and other data mining programs, making them ripe for abuse."

The American public does know a bit more about the data mining efforts now that the DoJ report has been submitted to Congress. According to those who have seen a copy of the document, it spells out six different ways that the FBI uses data mining in an attempt to find criminals. Sure, there's a component that attempts to "score" potential terrorists, but there are also programs in place to detect identity theft rings, health insurance fraud, problems at Internet pharmacies, fraudulent home purchases, and false insurance claims from car accidents that never happened (BusinessWeek has a nice rundown on the categories).

The most controversial system is likely to be the newest one, a terrorist ranking system known as the System to Assess Risk (STAR). The system won't pop up bright red screens that say "Terrorist!" or anything, but government officials do admit that it will score those considered to be a potential threat. A higher score could earn you a visit from the Feds, though no one's saying exactly how the algorithms work or what information is included in the databases. What is known is that STAR will focus on foreigners and that such things as country of origin will be taken into account by the ranking system. Here on a visit from Pakistan? +25 right there.

Several of these projects sound relatively innocuous. Does anyone object, for instance, to the FBI using public real estate transaction information to nail people making fraudulent home purchases? But the government has really set itself up for suspicion when it rolls out such programs, especially when they relate to national security. That's because of the government's repeated interest in things like the now-defunct Total Information Awareness program—a program that was shot down by Congress.

Despite Congress' obvious uneasiness with massive data mining projects that target American citizens, the government has plowed ahead with things like ADVISE and CAPPS II and ATS, screening programs whose details are largely secret and whose information cannot be checked for accuracy by those that it most affects. Several Senators and Congressmen have even ended up on TSA watch lists, further fueling fears that non-targeted data mining is likely to finger plenty of innocent people. The TSA, which runs several of these programs, didn't help the government's position when it lied about how it was using private data to test CAPPS II.

The FBI itself has come under plenty of recent criticism for its (mis)use of National Security Letters sent to ISPs and its apparently illegal use of "exigent letters" to demand information without a warrant or an NSL. NSLs have been controversial for some time, but they entered the news again this week as the EFF made public a massive trove of data on them gleaned from a Freedom of Information Act request to the FBI. The concern over NSL use at the Bureau is likely to take away any "benefit of the doubt" that Congressional oversight committees might have been tempted to grant the FBI with respect to its data mining programs.

The STAR program appears to take past criticisms to heart by targeting a narrow subset of people in the US, but it's clear that Congress wants to know more about what's being done and what sort of safeguards are in place. The Senate Judiciary Committee held a hearing on data mining back in January that focused on the privacy implications of such programs; at that time, Leahy claimed that "52 different federal agencies are currently using data mining technology, and there are at least 199 different government data mining programs operating or planned throughout the federal government." Now that he knows about six more programs, another hearing may be in the works.