“You can’t believe everything you read” is one of those aphorisms that we learn early on. “Caveat lector” was already a maxim long before the advent of the digital age and the world wide web. In that earlier era, though, barriers to publication were significant, and much print material went through gatekeeping procedures in order to ensure credibility, accuracy, and reliability. For academic work in particular, the editorial and peer-review processes of publishing companies, journals, and professional societies imparted authority to published works. The fact that a book or resource was available in libraries gave you some confidence that the author’s claims and positions had been vetted as credible and reliable.

Today, peer-review and editorial oversight remain important. But the Web has made it possible for anyone to publish anything, and search engines provide immediate and unfiltered access to the abundance of online information. As Clay Shirky has noted, our paradigm now is “publish, then filter.” Now, more than ever, the reader (or viewer or listener) is responsible for assessing the claims of authors and evaluating documents for reliability and accuracy. Of course, faculty help guide their students toward critical thinking and the development of a discerning judgment about sources and materials. A big part of a prof’s job is to convey to students a sense for what’s reputable and what’s not in his or her field. Still, even reputable sources such as scientific journals and mainstream news outlets can reflect biases and misrepresentations of data, or even fall victim to outright fraud.

Technologist Howard Rheingold has argued that “crap detection” is one of the core literacies we need to cultivate. What does that look like when it comes to searching the web and evaluating what we find? I think we can approach this in several steps:

Clarifying in our mind exactly what we are searching for. Are we after a specific bit of information, or a more general introduction to or background on a topic?

Composing our search query with the terms and filters that best match what we are looking for

Knowing how to read and interpret the search engine result page

Knowing how to assess a given site, page, document, or resource that the search returns to us.

“Search literacy” is an important basis for crap detection. One of the best resources I have found to improve search literacy is this free self-directed course, Power Searching with Google. In a series of short videos, “search anthropologist” Dan Russell explains basic and advanced search techniques and concepts. Obviously, you should think carefully about the appropriate terms of your query; results could be biased due to poor or imprecise phrasing. Russell emphasizes that “every word matters” and that “word order is important.” Focus on the keywords and terms that are most essential to your topic, and disregard auxiliary words and phrases that you might use in normal conversation. The terms, structure, and filters that you put in your search query will determine what you get back. Use quotes around a phrase to return pages that only have that exact phrase, and use the minus sign in front of terms you want to exclude from your results. Use the the site: and filetype: operators to narrow the search to specific web domains and types of documents if that is useful.

Search engine results are returned in “rank order” according to how well a webpage matches your query. But rank order is not the same as credibility or authoritativeness. Thus, the highest results of a search are not necessarily the most credible or useful…they are simply those that best match the query that you entered. It is not the search engine’s job to assess the accuracy of facts or the soundness of arguments that might be found on a web page. That’s why it is so important to formulate your query as appropriately as possible, and to be aware of how specific terms might influence the results. For example, searching for information about “Falkland Islands” would not produce the same results as a search about “Islas Malvinas.”

As you examine search results, consider visiting several sites to cross-check information and assess reliability. As with analog sources, you obviously want to ask some very basic questions:

Who is the author of this information? What person or organization is behind this document and site?

What evidence, such as verifiable credentials, is presented for the author’s competence with the subject matter?

What do other people say about the author?

What are the author’s sources? Are there citations and references to support the claims and arguments?

Are there feedback options, so that visitors can ask questions, engage in discussion, and publicly challenge erroneous or misleading information?

What are the outbound links from the page? Is the document or resource connected to other trustworthy sources?

Conversely, what are the inbound links, that is, what other pages are pointing to this page? What are those other sites saying about the page in question?

Does the site appear to be well maintained and up to date, or are there signs of neglect (e.g., broken links)

Rheingold gives the example of a search for information about Martin Luther King, Jr. Among the top results for this query from most search engines is the site titled “Martin Luther King, Jr.: A True Historical Examination.” The URL for this site, http://martinlutherking.org, looks valid enough, but upon closer examination the site is revealed to be a front for a white supremacist organization called Stormfront.

An important tool for checking the background of a website or domain is the internet protocol WHOIS. If you are suspicious about the legitimacy of a site, use this command to reveal information about who owns and operates a given domain on the web, where the site is hosted, contact information, etc. Just enter a domain to see who’s behind a site. For example, here’s some background info about martinlutherking.org:

You can examine a website’s outbound links to see how it references other sources and documents. Hyperlinks on a site may be internal or external. “Internal” links point to other pages within the same domain, while “external” links point to pages outside of the domain. It’s the external, or “outbound,” links that will give you a sense of how the resource or page is situated within the larger web presentation of a topic. Conversely, “inbound” links can also be very telling as to a site’s legitimacy. But that information is somewhat harder to get at. Google used to have a link: operator as part of its search toolbox, but that seems to have been deprecated. The best resource I have found to discover incoming links to a site is this backlink checker. Here are the top results for a backlink check on http://martinlutherking.org:

Incoming links give us information in two ways–where the link is coming from, and the reason or purpose of the link. In the example given above, it is clear that many links pointing to http://martinlutherking.org are doing so precisely in order to call attention to its racist misrepresentations.

Beyond the “detecting” of biased, misleading, unreliable, scam, and hoax websites, there is the further possibility of “correcting” these sites, or at least of drawing attention to their shortcomings. If a site allows feedback and commentary, you can engage the issues there; it could be that an author is simple uniformed or ignorant, and may be willing to revise their materials. Sites such as FactCheck.org and Snopes.com are dedicated to examining questionable claims and setting the record straight on a wide range of topics from politics to science. You can report websites to these services, or search their archives.

A further level of “crap correction” has been made possible by the development of web annotation tools such as Hypothes.is. An outstanding example of this approach is the work of climate scientists at the Climate Feedback project. This organization of experts monitors the web for articles related to climate change and offers feedback on the scientific accuracy of the information presented by annotating the article.

Of course, in order to see these annotations, viewers of the site would need to be familiar with the Hypothes.is platform, and while that may not be likely at this point, we can hope that, going forward, that the practice of webpage annotation will become more familiar and widespread.

These are just a few things to keep in mind as you fact-check and analyze information on the web. Join us this week at our workshops for further conversation and demonstration.