The web has a lot of porn, but putting an exact number to it can be a tricky exercise.

The era of big data is growing faster than expected. In 2008 Wired proclaimed we are living in the ‘petabyte age’, an era of data deluge caused by society’s ability to produce and upload more and more. Cisco and the NSA made Wired’s proclamation irrelevant less than five years later: Cisco with a white paper about the dawn of the zetabyte era; the NSA with the construction and opening of a zetabyte-sized data center.

While a large portion of the data created in these two eras is of a scientific nature, a grand portion of it is spurious thanks to the popularity of Youtube’s cat videos and vlogs. With the popularity of the tube site came the creation of its X-rated cousins.

Initially, these lascivious tube sites contained amateur fair — exhibitionist couples wanting to show the world their thing (or things). As the sites evolved they later become repositories for professional studio-made smut, aided by the fact that most of these sites are incorporated and operated from tax havens and other locales where intellectual property law is not a high-ranking concern of officialdom.

But, surprisingly, despite the prevalence of porn on the web nobody has an exact figure on how much there is. Part of the confusion is the lack of uniform standards of measurement. In trying to determine this figure, should one measure the number of porn themed URLs there are? Or perhaps take a count of the pages? Or what about try to measure the volume of data?

This lack of uniformity in measurement means a variety of statistics are thrown around when the porn question comes up. One of the more common statistics is 37%. This comes from a 2010 press release from a Swedish web filtering company called Optenet and was taken from counting porn themed URLs from a “representative sample” of 4 million URLs. However, this figure may not be accurate as the BBC reports a recent academic study suggested that only 4% of the world’s websites are porn based upon a count of web sites. The BBC also notes that while a porn site may have a lot of pages attached to it, this isn’t necessarily indicative of its audience as such sites tend to have a long tail of content that is seldom watch.

Another commonly quoted number comes from ExtremeTech, which attempted to make the calculation based on traffic numbers. ET concluded that 30% of all online traffic is generated by porn, but as the BBC points out there are problems with the site’s math. In its calculations, ET said that at the total Internet traffic per-day totalled around half an exabyte. Cisco claims otherwise, putting the figure at 1.4 exabytes. Without an accurate base statistic, it’s impossible to derive additional figures.

Regardless of whether the figure is 4 or 30%, this episode reveals that there aren’t uniform ways to track and calculate traffic on the web. One person’s calculations may be based on flawed information, and without uniformity in statistics the end result is useless.