3.9M or 7M? Behind the UK’s dodgy file-sharing numbers

The government says that 7 million UK residents share files illegally online. …

Seven million of the roughly 60 million UK residents are dirty P2P pirates, says the government. The number sounds exact, scientific, and authoritative—but where did it come from? And, more importantly, can it be trusted, especially when it is used as a ground for action?

BBC Radio 4 show "More or Less," which devotes itself to "the powerful, sometimes beautiful, often abused but ever ubiquitous world of numbers," looked into the question. What it found should serve as a reminder that such numbers soon take on a life of their own in public debate, even as their (sometimes dubious) origins retreat into shadow. To have an honest conversation about the effects of copyright infringement, good numbers are helpful but hard to come by, while bad numbers are routinely used to drive fear-based policies that move copyright law in one direction only: tougher sanctions and longer terms.

The show followed up on a reader e-mail from "Paul in Nottinhgamshire," who wanted to know about the origin of the "seven million illegal file-sharers" figure. The number was quoted in a recent government report, but it's not a government number; it turns out that the government commissioned a report from the CIBER research group at University College London, which contained the number. CIBER's report cited the number four times, noting that it came from yet another report from consultancy Forrester.

Still with me? Get ready to go down the rabbit hole, because it's here that things get weirder. The Forrester report in question does not in fact contain the "seven million" number, despite the CIBER citation. The number actually comes from a separate piece of research called the Jupiter Industry Losses Project, which attempted to quantify losses for the recording industry due to things like P2P usage. And who paid for the Industry Losses Project? The British recording industry, of course.

BPI, which represents the major labels, wouldn't turn over the complete Industry Losses Report to Radio 4, but it did supply some of the numbers from the piece. In addition, Radio 4 talked to the author of the report, Mark Mulligan.

The report estimates that there are 6.7 million illegal file-sharers in the UK, a number that was generated by multiplying two other numbers: the total number of Internet users in the UK and the percentage of the population engaged in file-sharing.

Both numbers used in this calculation turn out to be controversial. The UK government, for instance, says that 33.9 million people are online, while the Industry Losses Report said that around 40 million were online. As for the estimate of the piracy percentage, that comes from a 2008 survey of 1,176 UK households. The survey actually found that 11.6 percent of respondents admitted to using file-sharing software, but Mulligan adjusted this upwards to 16.3 percent to account for "underreporting" (i.e., the fact that some people were lying).

The differences in all these numbers are tremendous. If the lower numbers are used instead (33.9 million and 11.6 percent), it turns out that only 3.93 million UK residents are dirty pirates—a mere 60 percent of the original 6.7 million number.

As Radio 4 correctly points out, they have no evidence that this calculation is any more accurate than the original, though the reporter does offer this takeaway message: "the number of offenders varies enormously depending upon the assumptions you make about consumer behavior and about the size of the online population."

75 percent of all statistics are bogus

The problem isn't that such calculations are done; they can serve as useful tools for industries and even for policymakers. But problems develop when the numbers are ripped from their original, provisional context by repetition and citation, eventually taking on the force of Fact. When such "facts" end up being used to make policy, the problems are compounded.

It's a disturbingly common scenario, though. Lawmakers can't be expert in all topics and are susceptible to the terrifying numbers thrown at them by lobbyists who want increased state protection of their industries. Case in point: lawmakers and even an FCC Commissioner have had a bad habit recently of citing two-decades old statistics about how much money the US loses each year to "piracy." But when Ars took a hard look at the numbers, they turned out to be even harder to pin down than the UK numbers, and much dodgier.

As we pointed out in that piece, "These statistics are brandished like a talisman each time Congress is asked to step up enforcement to protect the ever-beleaguered US content industry. And both, as far as an extended investigation by Ars Technica has been able to determine, are utterly bogus."

The motion picture industry had some bogus numbers of its own about university P2P use, numbers that it had been using in Washington to make the case for stronger federal regulation of higher education. Those numbers turned out to be wrong by a factor of three.

William Patry is one of the foremost copyright authorities of our time, having authored a definitive set of books on copyright law. In his new popular book, Moral Panics and the Copyright Wars, Patry talks about both our article and the MPAA fiasco and concludes: "The point in the Copyright Wars is not to point to real data—quite the opposite, the data is fake—the point is to create a sense of siege, of urgency, of a clear and present danger that must be eliminated either by Congress or the courts."

It's a worldwide problem, too, one we have seen in the US, the UK, and in Canada.