MediaSentry weighed in the balance, found wanting

At the behest of the FSF, a computer science professor sets out to dismantle …

The RIAA has filed tens of thousands of lawsuits against suspected file-sharers—and that's just the US total. Worldwide, the music industry has done this so many times that its lawyers certainly know the drill and trust their own evidence collection procedures. But just how good are those procedures?

Yongdae Kim, a computer science professor at the University of Minnesota, raised that question again this week by filing an expert witness report (PDF) on behalf of Jammie Thomas, the Minnesota woman who was fined $222,000 for file-sharing in her first trial. As Thomas prepares for a scheduled June retrial, the Free Software Foundation paid $3,000 to Kim to weigh in on the evidence offered by investigator MediaSentry and by the RIAA's expert witness, Doug Jacobson of Iowa State University.

Kim's conclusion is uncompromising: "MediaSentry's claims of their ability to record activity on the FastTrack network [used by KaZaA] and identify individual computers used to commit copyright infringement are not only unproven, but highly unlikely to be accurate."

And yet, in his pretrial report (PDF), Jacobson said he was willing to testify that MediaSentry's techniques were good enough to show that "Defendant's Internet account and computer were used to download and upload Copyrighted music from the Internet using the KaZaA peer-to-peer network." What's going on here?

The claim

MediaSentry's basic technique was to connect to P2P networks like FastTrack and browse users' "share" folders looking for copyrighted material. If copyrighted songs appeared to turn up, the company would often download a few of them and take screenshots showing the whole lot of them. The IP address from this transfer was then used to look up the ISP involved, and the music industry would ask (or subpoena) the ISP to look through its logs for the user account which had been assigned that IP address at the time in question. The user was then sent a letter demanding settlement.

In various moments in his testimony on the stand, Jacobson has shown that he knows the system isn't foolproof, but he appears to regard most of the other explanations offered by defense lawyers as exceedingly unlikely in most cases.

Jacobson is a professor at Iowa State, started his own computer security company, has testified before Congress on P2P software, and is certified in computer forensics. He has offered expert testimony in several RIAA lawsuits against file-sharers, perhaps because—in addition to being qualified—he can explain things like IP address assignment in a way that makes sense to judges and jurors. For instance, in the Thomas case, his expert report described Internet addressing using an extended analogy to the postal system, where mail drops are like IP addresses and ZIP codes are like networks.

Jacobson claims that offering music from a KaZaA share folder is like "putting a list of copyrighted music you have available in a public place and telling everyone they are welcome to stop by your house and pick up a copy of the song." MediaSentry's techniques for picking out these users are generally good, he thinks, not because they're foolproof, but because they produce results.

In the Thomas case, for instance, MediaSentry downloaded 11 songs from a particular IP address belonging to user "tereastarr@KaZaA." The RIAA then approached Charter, the ISP controlling that particular IP address. Charter responded that, at the time in question, the IP address given had been in use by Jammie Thomas, one of its customers. Further confirming the accuracy of this link, the MAC address of Jammie Thomas' cable modem matched that logged by Charter for the IP address in question at the time in question.

Thomas was later shown in court to have used the "tereastarr" name for various personal Internet accounts.

The critique

But Kim tells the court that it's really not that simple. There are all sorts of ways for someone else to have used that IP address. Thomas might have run an open wireless access point which anyone could access; in that case, their traffic would then travel through Thomas' cable modem and match her assigned IP address. (Thomas apparently did not use a wireless router, however.)

Unfortunately for Thomas, none of these quite possible defenses indicate why someone would be both spoofing her IP address and doing so while using her KaZaA credentials. The question here appears not to be about what is absolutely possible, but what is within the realm of plausibility.

Someone on her node of the cable network might have been spoofing the modem's MAC address or the assigned IP address. Entire chunks of IP address space might have been "hijacked" through Border Gateway Protocol (BGP) spoofing. And a KaZaA supernode could be configured in such a way that it could "frame" a child node.

Kim also attacks Jacobson's postal system analogy on the grounds of near-total inaccuracy. "This analogy is not only ?awed in several respects, but provides the illusion of intuitive understanding of Internet technologies that is simply false," he writes. "If we were to use that analogy, we must ?rst assume that all letters travel in fully transparent envelopes. Second, that there are several postal stations between source and destination, and the postmaster at each station can rewrite the letter in any way without being detected. Furthermore, the postmaster at any intermediate location the letter visits would be able to write a new letter from scratch and send it to a destination, faking the return address."

In addition, IP addresses are not necessarily a "unique" identifier for a computer. Consider Network Address Translation, for instance, which allows a group of machines sitting behind a router to share a single IP address. Spoofing can also mean that two unrelated machines share a single IP address.

Unfortunately for Thomas, none of these quite possible defenses indicate why someone would be both spoofing her IP address and doing so while using her KaZaA credentials. The question here appears not to be about what is absolutely possible, but what is within the realm of plausibility.

More plausible is the suggestion that Thomas might have left her computer running and unlocked when a friend or visitor downloaded the files in question through her KaZaA software. Kim does say it's likely that Thomas left the machine running without logging out or locking the screen, though jurors in her first trial did not appear to think much of the possibility.

And yet...

Given the "tereastarr" name, the match between Charter's records and the MAC address of Thomas' modem, and her lack of a wireless router, the case against her seems tight. Certainly the RIAA thinks so. When we last talked with the trade group's CEO, Cary Sherman, he professed total confidence in the technology being used.

"We're comforted by the fact that the technology that we used was actually examined by a group of engineers at the University of Washington, and they concluded that our technology was the best out there in terms of this approach," he said. "None of the false positives they found came from us, at all, and we got high marks from them. So we feel very comfortable that the technology is very accurate and very reliable, but we're also happy to have it examined to ensure that that's the case."

And yet the odd thing is, despite all these tools for verifying the link between a machine and an IP address pulled from a P2P program, mistakes happen. And they don't happen once, or twice, or three times. We've covered many of these odd cases on Ars, but Kim provides a quick list to the most egregious. "Furthermore, MediaSentry’s faulty data collection has lead to multiple embarrassing episodes," he writes, "such as lawsuits brought against persons who did not have Internet accounts at the time of the alleged infringement, persons whose homes were in a state of ruin at the time of the alleged infringement, and persons who do not even own computers. There are multiple reports from service providers about requests for release of information about IP addresses which are not even owned by the provider in question. In light of this evidence of improper error checking and lack of transparency, it is impossible to place any trust in evidence or testimony from MediaSentry."

He goes on to quote from a 2005 mailing list discussion among university IT leaders, a discussion that took place near the time of Thomas' alleged infringement. Thirteen network admins reported receiving incorrect and even impossible copyright infringement notices from the RIAA that referenced non-routable IP addresses or addresses that did not belong to the school in question.

So despite the Jacobson's comfort in the process (which might well be justified in Thomas' case), the basic question about how this sort of evidence is gathered and used has a complex answer. Kim isn't the first to raise such questions, either. In 2008, Dutch professor and P2P researcher J.A. Pouwelse testified as an expert witness (PDF) in a separate trial, and he also took Jacobson to task for his postal analogy and for his claim that an IP address always belongs to only one machine in the world. He also faulted Jacobson for apparently taking no more than a hour to write his report.

It still matters

The whole issue might seem unimportant at this point, except to Thomas and the handful of other defendants actually moving through trials. That's because last year the RIAA said it was abandoning its general lawsuit strategy in favor of a voluntary "graduated response" deal with ISPs that could eventually see users kicked off the Internet for copyright violations.

As Sherman made clear, though, the basic technology used to identify P2P violations remains the same, even though MediaSentry was ditched back in January in favor of a European firm. Deciding what evidence should truly "count" for a graduated response scheme will remain controversial,

Sherman also professed his willingness to have the technology examined, and we hope the RIAA lives up to this commitment. One of the prominent complaints from Pouwelse and Kim is that MediaSentry's work is opaque, from specific data collection techniques to retention policies. Nothing makes academics more suspicious than a "black box " process that produces "just trust us" results.