the quotation problem

Jan092017

Finding the source of a quotation has always been a problem. Before the internet, if a quotation wasn't in the Oxford Dictionary of Quotations, you had little chance of tracking it down unless you were a journalist or an academic with access to well-indexed reference material.

The WWW has helped with searching, but comes with a problem, namely that as it becomes ever easier to publish, more and more people are publishing material which contains errors, if not outright lies. This has been in the news recently with the emergence of “factories” for publishing fake news (mostly about American politics) being found in Russia, but it's a problem which has been building for years.

As an example: on a recent trip to Paris I saw this fly-post in Montmartre:

The words sounded familiar so I googled them. The top hit on Google is from a site called azquotes.com attributing this to Pablo Picasso. Although the wording is slightly different (The urge to destroy is also a creative urge) several other sites back this up as a quote from Picasso. Unfortunately, this attribution is simply wrong. Further searching eventually revealed that this quotation, with the “also”, is from an 1842 work by Mikhail Bakunin, who I hadn't heard of. Reading up about him was very interesting, but this experience is absolutely typical of my attempts to track down quotations on the web. There are hundreds of sites publishing snippets of quotations; lots of them have completely wrong information on them, and even the correctly attributed quotations almost never come with a reference to the work cited, just the author's name, making it insanely difficult to verify the quotation. And Google seems entirely unable to return the most reliable sites as the top “organic” (not paid for) results in a search.

Google Search is an amazing invention which has revolutionized the world, but in recent years I've noticed the quality of its results deteriorating as the sheer volume of the WWW continues to soar. In its early days, it was able to give higher rank to public institutions and universities, and therefore bring organic search results that you could have some degree of confidence in. Nowadays – and this is a problem for all web search engines when faced with a thousand websites none of which have any measurable credibility – the only possible metrics are things like how many people clicked on them, and how long they stayed on the site before returning to search again. Unfortunately, this rewards entertainment value over the truth, so that credible looking websites like AZ Quotes can beat sites which actually make an effort to curate and check information which they publish. This is exactly the same problem which they face in identifying the “fake news” sites which have been uncovered recently.

So the ease with which anyone can publish anything is making it increasingly difficult to find the truth on the web. Crowd-sourcing information, which is probably how AZ Quotes and many other quotations websites get their content, is unreliable unless you also have either a paid staff to do fact-checking (which hurts profits), or a volunteer staff to do it, in other words a community. This has worked reasonably well for Wikipedia and a few other wiki-style sites, but online communities need a certain critical mass to work properly and avoid becoming dominated by an agenda-driven clique, and there is a chicken-and-egg problem inherent in building such communities.

Apparently, Mikhail Bakunin has come to be regarded as one of the “founding fathers of anarchism”. The quotation above illustrates his belief that the old order must be pulled down to build the new. There is a striking parallel here with Google itself, which has a history of destroying value in existing companies in order to further its own advertising business. One of the ways it destroys value is by offering services for free, supported by advertising, which thanks mainly to Google's success has become the dominant model for publishing content on the web. This is exactly what has led to the problem with tracking down accurate quotations on the web, among many other kinds of research, because the absence of any measure of authority, or any way of indicating that a site has been carefully curated, leads to entertainment value being the only available measure for organic search to use.

I've felt for several years that this trend can't continue, with web search becoming less and less useful. The web needs ways to indicate when information has been carefully curated; and because anyone can stamp “curated” on their website even when it's a lie, we need ways to apply trust networks to the web so that searches can prioritise content which has been reviewed by a community of trusted experts in a field.

Since its IPO, Google seems to care more about increasing profits every quarter than solving problems like this which don't affect its bottom line (yet). I hope someone will crack the curation/trust network problems and disrupt web search soon. We're up to our eyes in lies and errors on the web, and there must be a tipping point soon. I'm sure I can't be the only patient, quiet researcher who thinks so.