Mountain View says it filters copyright-infringing content from its autocomplete function, but not all search filters are created equal.

According to a recently released paper "How Google Fights Piracy" (available here as a Google Doc), the search giant claims that fans intent on pirating media use pirate sites rather than search. Despite Google's insistence that its search function is not the preferred doorway to copyrighted content, Mountain View says it "has taken steps to prevent terms closely associated with piracy from appearing in Autocomplete and Related Search."

The exact whats and hows of Google's search algorithm has long been shrouded in mystery, but I found it to be particularly foggy when it comes to playing around to searching specific terms related to the copyright-free lifestyle.

On a pirate hunt

Last week the British Music Industry trade group, BPI requested for Google to block a number of URLs from its search function, citing copyright infringement. Included in the list of rouge sites were several urls from the torrent-search site The Pirate Bay.

Of the 2056 URLs that BPI asked to have removed--including 171 from The Pirate Bay--only one had no action taken against it: The Pirate Bay's current homepage, www.piratebay.sx.

Why? Google doesn't say specifically, but it's probably because although the TPB homepage allows users to search for torrents that no doubt contain pirated property, The Pirate Bay's front page itself does not actually list or contain any copyright-protected information.

It is our policy to respond to clear and specific notices of alleged copyright infringement. Upon review, we may discover that one or more URLs specified in a copyright removal request clearly did not infringe copyrights. In those cases we will decline to remove those URLs from Search.

This isn't so newsworthy, as BPI is one of the most active senders of DMCA requests, and The Pirate Bay is one of the most targeted DMCA takedown recipients. (According to Google, TPB has had nearly 460 thousand URLs requested to be removed.) However, despite the fact that Google has determined that The Pirate Bay's front page should remain indexable and available, Mountain View has constructed a strange maze of semantic filters regarding "The Pirate Bay" and other pirate sites within its autocomplete functions.

Autocomplete anarchy

In 2009, Google completely removed The Pirate Bay from its search index, only to eventually return TPB's homepage to searchability. The Pirate Bay is currently available through Google, but remains somewhat hidden via the site's autocomplete function. For example, on the US Google site, searching the term "The Pirate Bay" will not bring up the site's front page in Google's autocomplete results.

Meanwhile, the autocomplete for the conjoined url phrase "thepiratebay" will bring up some sketchy sites (that you should NOT click on) rather than TPB's official site. However, if you add in a period to your search ("thepiratebay."), the Pirate Bay's homepage will magically appear in the main search field below the bar.

Meanwhile, search for a phrase like "the pirate bay eminem", "the pirate bay ironman" or "the pirate bay sisterhood of the travelling pants 2" will all return relevant torrent searches within The Pirate Bay. While the returns will not be available in the main search field until you click it, this is a curious loophole in the Mountain View's search filtering algorithm.

Meanwhile, the top-level urls for sites included in BPI's most recent DCMA request, such as for mp3chief.com, will autocomplete on the US Site when you search for the terms "MP3 Chief" or "mp3chief" in autocomplete with associated links in the returns page below the search bar.

In fact, Google autocomplete worked for many (but not all) urls included in BPI's DMCA request including, nakido.com (under search term "nakido"), free-albums.net ("free-albums"), and indowebster.com ("indowebster").

Autocompletely hit-and-miss

According to Google's autocomplete FAQ, the autocomplete terms are the algorithm-driven result of "search activity from all web users and the content of web pages indexed by Google." Google also readily acknowledges that the autocomplete function is does not roam completely unchecked. In addition to filtering out "objectionable" material such as pornography, violence, or hate speech in the autocomplete function, the company filters out terms that "are frequently used to find content that infringes copyrights."

Google's position on its autocomplete function becomes even more curious when you consider how it relates to porn searches. If you were search for the domain name of a well-known pornography site--which Google explicitly says it will filter though its autocomplete--it will not turn up in autocomplete. This is true for a number of sites I tried with seemingly innocuous url names. Unlike The Pirate Bay searches, you will not receive a proper autocomplete even if you conjoin the words together and include the period afterwards. Try it if you're at home on a work computer--assuming you know the name of at least one adult site.

We have contacted Google for clarification on their autocomplete functionality and will update when and if we receive any response.

This story, "Exploring Google Search's tenuous relationship with pirated content" was originally published by
TechHive.