Google Search and the Primary Source Revolution

A while ago this article appeared on the website of The Age, one of the Melbourne dailies. It discusses how this image found its way onto the official final high school History exam for Victorian students in 2012:

Now, looking carefully, you will see a giant robot assisting the Bolshevik forces from behind an ornate fence. Correct me if I’m wrong, but I don’t recall any cybermen or transformers being part of the curriculum when I took this unit on The Age of Revolutions. (OK, it was a long time ago, but I doubt *that* kind of new material has since come to light!)

In this case, we’re assured that students (a) weren’t asked any questions to which this interloper in the image could have introduced confusion (which rather implies they weren’t asked any questions about the image, and that leads me to wonder why the hell it was there in the first place… but aaaaaanyway), and (b) if any confusion or distraction attributable to the image is ‘detected’ in student answers, somehow something will be done to make sure they are not disadvantaged. God knows how *that’s* supposed to work.[1] I remember I once applied for my own final history exam to be remarked (I had received a D, and for the record, it went up to A- upon review) but there wasn’t any sort of confusion on the paper, merely in the examiner’s mind: let’s hope this bunch were more on the ball.

The online news on the day was having a giggle at the expense of the examinations board at the Victorian Curriculum and Assessment Authority (VCAA), and we, the readers, were invited to scoff along — ‘How silly! What kind of morons do they hire there these days? *Snort* Fancy not noticing a giant robot!’ — before moving on with our self-satisfied surfing. Indeed, twitter briefly lit up with guffaws and virtual finger-pointing. And to an extent, yes: people whose job it is to set exams, one would assume, have something of a moral obligation to ensure that they are providing tests which give accurate information, and ask questions to which answers can reasonably be expected. The obligation of examination boards presumably extends to them not just shoving in an image they’ve scooped from the top of a list generated by the black box of their online search engine of choice. Yet that is what we must suspect occurred in this case. According to The Age, “a search for the image in Google brings up the robot version as the first result“.[2] This incident has therefore prompted me to reflect on the often unrecognized influence of, particularly, Google’s search algorithm on our thinking.[3]

Google famously bases its searches and suggestions on a combination of viewing frequency and interconnectedness, which means that sites with more hits and links are rated more highly in search results. More on this below. But before there are results there are the searches themselves. Google (and other engines) also incorporate autocomplete suggestions for search terms, based on the frequency of the terms themselves. Journalists from The New York Times, reprinted in The Age here, have been reporting on research into how autocomplete functions reflect what users ask on the web. Given the stereotypes, rather predictably the most common ways of ending such queries as “why are Americans…?” included “fat”, “stupid” and “patriotic”. More (or equally, depending on your point of view) disturbingly, “for “Chinese” the autocompletes include “skinny,” “rude” and “smart”.” Remembering that Google is geosensitive, I repeated this test with my own browser and turned up the common questions: why are Americans… so stupid? so loud? so ignorant? and called yanks? For Chinese [people], my results were… so rude, so rich, so cheap and so smart? (Apparently Australian Google users aren’t particularly curious about American patriotism, or by the slimming regimes of our northern Asian neighbours, although I’m not confident the questions we do apparently ask cast us in any better light.)[4]

The news report is focussed principally on the disturbing racist and other antisocial tendencies revealed by the findings, but my attention was grasped by another possibility. That is, to what extent do autocomplete options create and not merely reflect the searching practices of the ‘reasonable’ web user? When the mythical ‘reasonable person’ begins to type a search term into the unassuming little search space, to what degree do the autocomplete options determine what he/she ends up searching for? The worryingly probable answer is ‘substantially’: and that, my friends, is one way in which these autocomplete options remain the most common ones. Once they appear at or near the top of the list, they become more likely to stay there, shaping thoughts by reinforcing stereotypes rather than opening them up to questioning.[5]

Beyond the realms of autocomplete, and to return to the specific issues raised by the VCAA robot debacle, lies the question of how prioritized search results similarly shape the ideas of the searcher. The order of results fundamentally skews what we find when we look, especially since most people are not given to examining search results beyond the first screen view. In fact, according to recent figures, the top three results in any search account for between 68% and 90% of ‘clicks’ depending on whether the search involved a brand.[6] How worried should we be that this phenomenon risks producing a narrowing rather than widening of scholarship as we increasingly turn to electronic means to locate our sources? Does this mean we, as scholars, should be paying more attention to how companies like Google, Yahoo, & co. process search terms? Do we have a vested interest in how the information of a search is processed in order to generate results? I think the answer is yes, although I’m not entirely sure that I am personally equipped to engage in the debate at an informed level.

Of course, there’s nothing inherently wrong with clicking on one of the top three results in a search if it turns out that the search has truly identified what you were looking for. But the combination of a tendency to accept the first results generated by (potentially) autocompleted searches and prioritized search results produced according to (assumed) relevance built on an invisible reasoning algorithm becomes particularly concerning when combined with inexpert search design. This is particularly true of students setting out to use online tools in academic ways for the first time.

I began to notice effects which I presume can be attributed to search engine-driven phenomena in my classes in 2012 when students were giving 8 minute talks on set topics in tutorials throughout the latter half of semester. As I was teaching five groups in a row, I had the opportunity to gather a reasonably significant sample of student responses to the same task. What I noticed was that among the many students who chose to use PowerPoint in their presentations, most of them had identified the same image(s) to illustrate their points. For instance, in the week in which we covered the Black Death, almost everyone who used PowerPoint showed this image:

A common image used to illustrate Black Death presentations among an unscientific sample of history undergraduates.

As it happens, a much nicer, coloured version of this image is available if one should care to scroll further down the list of results, but most students don’t seem to look further than the first screen. The more adventurous ones found this:

Another illustration of the Black Death, generally only used by those willing to scroll down the page…

A similar phenomenon can be observed with the primary sources students use for essays. To be fair, we encourage them to look online for easily accessible translations of medieval sources, and I *love* the fact that there are now so many to be found. If you know where and how to look (which is basically the issue here) there are more and more primary sources for medieval history turning up on line every week. But most of them are not highly visited sites with lots of embedded links. It takes skillful search design and prior knowledge to bring them to the top of Google’s results; they are hard for the neophyte to locate. This means that students end up relying on a narrow range of common sources, when the most cursory look at almost any book in the 940s of the University Library (only a 2 minute stroll from the department) would have furnished them with infinitely more examples and a richer variety. (Not that I would ever condone reducing the process of marking to a momentary glance, but these practises are now so common that whether a student has used images from deeper in the common search results or identified source material in the library itself could almost function as a litmus test for their engagement with the task, if not the outcome itself.)

What is the result of all of this? We derive massive benefits from digitization of source material that would otherwise be difficult, time-consuming and costly to access – and this is especially true of Australasian scholars who face enormous burdens of time and expense to travel to European libraries and archives relative to their northern hemisphere colleagues. However, there are patterns emerging that cause concern. We need to develop strong search practices for ourselves, and we need to inculcate them in students if we expect them to make intelligent and pro-active use of this resource. And, as the VCAA robot episode demonstrates, it might not be only undergraduates who need this advice.

Finally, in what is perhaps the most serious development (although dealt with humorously by the immediate victims) an image used by medievalists.net to illustrate a post on ‘Sex in Medieval Iceland’ was blocked by facebook, as they noted here. In this instance, online policies about images quite literally police the available information about medieval worlds. Hence, despite promising an e-republic of knowledge, the online medium has the capacity to narrow both what we search for and what we find, sometimes in ways which cannot be overcome simply by being willing to scroll down, or construct creative search strings.

I am all for making use of technologies to aid and advance learning and knowledge. But I do worry that rather than using tools, we are at risk of being led blindly by the nose. Whose medieval worlds will be revealed by this process?

—[1] Turns out students who scored ‘significantly lower than expected’ on the Robot question in comparison to other assessments had their marks adjusted. This apparently applied to 130 students. See: http://www.theage.com.au/victoria/vce-scores-changed-over-battle-tech-marauder-confusion-20130208-2e2qn.html[2] This is no longer the case, but was true when I checked on the day the article first appeared online.[3] Google is my focus here because of its market dominance. According to a 2012 survey published online by PewInternet, 83% of internet users use Google as their main or preferred search engine. See http://www.pewinternet.org/Reports/2012/Search-Engine-Use-2012/Summary-of-findings.aspx[4] Intriguingly, Yahoo gives a different set of answers to this test. Results for “Why are Americans…?” were: ‘so stupid’, ‘in Iraq’ and ‘stupid’. Results for “Why are Chinese…?” were: ‘businesses so successful’ and ‘people leaving China’. I am not certain whether the geographical zoning is the same between the two search engines.[5] Out of interest, I repeated the autocomplete test with the question “why were medieval people…?” and obtained the results “so superstitious”, “so cruel” and “so religious”. No autocomplete suggestions emerged in Yahoo implying that people who are curious about the medieval world in my geographical zone just don’t use Yahoo for their queries… Interesting![6] See http://searchenginewatch.com/article/2200730/Organic-vs.-Paid-Search-Results-Organic-Wins-94-of-Time

Yahoo uses a different type of information management to draw its answers/hit results to google, so it should have different answers. It used to be substantially different, but alas, no longer. There is a reason why I use four different search engies that I know search and compile information very differently from each other to make sure that I’ve got breadth to my search, in particular if I want to avoid youtube hits and google affiliate entries. Especially when google is pissing me the hell off and returning stuff that it thinks I want and ignoring the actual terms I put into it. On those days, I rage and either draw up a different login of mine that runs different information, change my IP and log out of everything affiliated or go sit with duckduckgo for a while.

Considering I know (and have helped) people stack google so that their webpages come up at the top of the pile, and this was on an individual level, it is quite frightening the level of control of information. Medieval or not.