Only 0.3% of files on BitTorrent confirmed to be legal

Nearly all of the files available on BitTorrent are of the illegal variety, …

The large majority of content found on BitTorrent is illegal, a new study out of the University of Ballarat in Australia has confirmed. Researchers from the university's Internet Commerce Security Laboratory scraped torrents from 23 trackers and looked up the content to determine whether the file was confirmed to be copyrighted. They found that 89 percent of the files they sampled were confirmed to be illegally shared, and most of the remaining ambiguous 11 percent was likely to be infringing.

The total sample consisted of 1,000 torrent files—a random selection from the most active seeded files on the trackers they used. Each file was manually checked to see whether it was being legally distributed. Only three cases—0.3 percent of the files—were determined to be definitely not infringing, while 890 files were confirmed to be illegal.

Additionally, 16 files were of ambiguous origin and 91 files were pornographic, which were unclear due to their oft-mislabeled nature. "[M]any files were tagged as amateur (suggesting no copyright infringement) but further inspection revealed that they were in fact infringing," wrote the researchers.

Basically, the 89 percent is a baseline number when it came to infringing files, and the three most shared categories were movies, music, and TV shows—among those categories, there were zero legal files being shared. Assuming all 16 files of ambiguous legality were in fact legal, the researchers said that there was an overall figure of 97.9 percent infringing content being distributed on BitTorrent.

This report echoes similar results out of Princeton that were published earlier this year. Though the top categories were slightly different—Princeton found that movies and TV were the most popular, while music fell behind games/software, pornography, and unclassifiable files—that study found that all of the movie, TV, and music content being shared was indeed infringing. Overall, Princeton said that 99 percent of the content on BitTorrent was illegal.

The University of Ballarat said that just four percent of torrents were responsible for 80 percent of the seed population. And, according to the list of the top 10 most seeded files, they were all Hollywood films (save for Lady Gaga's album, The Fame Monster, at number 7)—it's clear that Linux distros weren't exactly dominating the charts here. Copyright holders have one consolation, however: P2P users seem to buy more content than the average person, so there's still some chance of earning those users' money after all.

Update: There are a number of criticisms about the study that have popped up since Friday. Most notably, TorrentFreak raises questions about the categorization of files, the use of older data, and the numbers being used by the researchers. We have contacted those behind the study for comment.

Update x2: Paul Watters, one of the researchers behind the study, has responded:

Thank you for your enquiry regarding our research report "Investigation into the extent of infringing content on BitTorrent networks". As researchers, we not only stand by the findings that we have arrived at, but - having made our methodology public - we are providing other bona fide researchers to replicate and/or dispute our findings. Their results can in turn be assessed through the peer review process; this is the process that normal research activity takes.

You have raised some interesting points that are fundamental to the validitiy of any study in this area: the sampling strategy; verification of results and so on. We believe that our methodology was rigorously applied to the sample that we obtained. Over time, we will replicate the sampling process, so that we will gain better estimates of the population results. This is the fundamental tenet of statistical sampling.

Jacqui Cheng
Jacqui is an Editor at Large at Ars Technica, where she has spent the last eight years writing about Apple culture, gadgets, social networking, privacy, and more. Emailjacqui@arstechnica.com//Twitter@eJacqui