This study examines the degree to which Wikipedia entries cite or reference research and scholarship, and whether that research and scholarship is generally available to readers. Working on the assumption that where Wikipedia provides links to research and scholarship that readers can readily consult, it increases the authority, reliability, and educational quality of this popular encyclopedia, this study examines Wikipedia’s use of open access research and scholarship, that is, peer-reviewed journal articles that have been made freely available online. This study demonstrates among a sample of 100 Wikipedia entries, which included 168 sources or references, only two percent of the entries provided links to open access research and scholarship. However, it proved possible to locate, using Google Scholar and other search engines, relevant examples of open access work for 60 percent of a sub-set of 20 Wikipedia entries. The results suggest that much more can be done to enrich and enhance this encyclopedia’s representation of the current state of knowledge. To assist in this process, the study provides a guide to help Wikipedia contributors locate and utilize open access research and scholarship in creating and editing encyclopedia entries.

Contents

There is considerable bravado to Wikipedia’s tag line which identifies it as “the free encyclopedia that anyone can edit” [1]. What might otherwise seem a crippling point of editorial weakness is now widely recognized as a point of remarkable strength. Since 2001, Wikipedia has used the principle “that anyone can edit” to create the world’s largest contemporary encyclopedia, involving at this point some 2.9 million people who are registered as contributors, although a much smaller core of Wikipedeans are responsible for most of the entries and edits, with 4.6 million articles now available in 100 languages [2]. That it is “the free encyclopedia” has contributed greatly to the fact that more people turn to it than to any other encyclopedia, reference work, or news source. There are only a dozen more Web sites of any sort (e.g., Yahoo, MSN, Google, etc.) that people turn to more often than Wikipedia on the Internet. It appears that people are looking things up in a big way. I, as a professor of education and former (but not reformed) school teacher, could not be happier, although I have to admit a little sheepishly that it simply would not have occurred to me that such a thing was possible if I had not seen it develop before my eyes [3].

In that sense, Wikipedia stands alongside open source software (such as Linux and Apache) as a new breed of impossible public goods that have been made possible by the Internet [4]. What is impossible about open source software is that a large number of talented software programmers have freely given of their talents to create a computer operating system and a good number of other pieces of software that are open to modification and free distribution. What distinguishes Wikipedia’s unlikely quality as a public good is its particular openness to the very creation of a free online encyclopedia. It is a public good with a particularly educational feel to it, and one that has caught many educators, such as myself, a little off guard, as we might otherwise have thought such a venture impractical and impossible [5].

The question I pose here, by way of trying to support this education initiative, concerns whether contributors are making the best possible use of online references and sources in writing and editing entries to Wikipedia. My interest is not only in the quality of the encyclopedia’s entries, but also in how this very popular encyclopedia serves as a gateway to a larger world of knowledge. For I believe that Wikipedia contributors could do more to strengthen the scholarly quality of their entries, while at the same time throwing open those gates a little farther for readers to go on in pursuing what is introduced in Wikipedia. At issue in both these measures is the growing public access to a body of research and scholarship that in a parallel public good development is being made freely available online, most commonly under the rubric of open access (Willinsky, 2006).

In this paper, I attempt to assess the degree to which Wikipedia is taking advantage of a growing open access to research, which enables readers to freely access what is at this point roughly 20 percent of the 2.5 million peer-reviewed articles published annually in scholarly journals (Harnad, 2005). I also look at the potential for this freely available research to make a much greater contribution to the authority and educational quality of Wikipedia. This paper is also about the need and value of developing stronger connections among the new online sources of knowledge’s public sphere, represented not only by Wikipedia and open access research, but also by open source software: open biology (e.g., Human Genome Project), open data, Creative Commons, Gutenberg Project, Open Content Alliance, and the Stanford Encyclopedia of Philosophy, to name just a few. Caught in the enthusiastic grip of these developments, Cass Sunstein calls it the dawning of an infotopia based on this potential aggregation of information, led by what he terms “open source science,” which is committed freely sharing data, software tools, and research results [6]. By way of considering what Wikipedia could gain by greater aggregation, I have examined Wikipedia’s approach to authorities and citations, as well as its current practices and its potential use of open access research and scholarship.

Wikipedias authority

The modern encyclopedia has been typically the work of a team of knowledgeable editors who carefully divide up the world of learning into discrete topics and solicit articles on these topics from established authorities in the field [7]. The very principle of Wikipedia – that anyone can initiate or edit an entry and that they can do so anonymously – runs more than a little counter to how encyclopedias have worked. It was not that Wikipedia’s founders Jimmy Wales and Larry Sanger set out with entirely contrarian approach as their guiding principle. Their original idea for an online encyclopedia, Nupedia, was based on solicited articles going through a seven-step review process [8].

After the Nupedia foundered with only 200 articles in the first year, Wales and Sanger switched to wiki software which allowed direct entry and editing by anyone online, while tracking the history of the editing and supporting a discussion of the process. Over the next 12 months, some 20,000 articles were added to the encyclopedia in what does appear to be a spontaneous, unsolicited expression of people’s interest in creating an open public record of what they know and what they can discover. Wales has gone on to put in a number of checks, to be discussed below, on that spontaneous formation, but has otherwise embraced it, as is clear from the Wikipedia’s tagline – the free encyclopedia that anyone can edit[9]. The Wikipedia’s achievement is a result of not just anyone editing but nearly everyone with an interest in seeing things set out properly; the idea works “because so many minds are involved,” as Sunstein succinctly puts it [10]. It is a new form of collective expertise, dynamic and semi-anonymous, but also cumulative and continually under review and open to updated citation and references.

The accuracy of Wikipedia’s entries has been subject to a number of studies [11]. For example, Roy Rosenzweig, a history professor at George Mason University, reported on a number of shortcomings in the area of American history in a review of Wikipedia, which he later observed were corrected following the publication of the review (Read, 2006) [12]. However, the best known of these assessments, to date, was conducted by Nature in 2005, finding that in comparing Wikipedia to Encyclopædia Britannica, Wikipedia’s accuracy was “surprisingly good” (“Wiki’s Wild World,” 2005). Nature ended up advising that “researchers should read Wikipedia cautiously and amend it enthusiastically,” while Britannica went on to challenge the rigor and reliability of the Nature study (“Fatally Flawed,” 2006) [13].

Wikipedias policies

Wikipedia provides a series of policies intended to set standards for composing the entries that make up the encyclopedia. For example, the entry “Wikipedia: Verifiability,” carries a statement at the top declaring that “the page is an official policy,” which means that, with Wikipedia’s non-authoritarian approach, what follows has “wide acceptance among editors and is considered a standard that all users should follow.” These policies are literally as susceptible to continuous revision as any other entry (there is an “edit this page” tab). The policies that I quote from below have undoubtedly been modified in the meantime. However, the general tenor and focus of Wikipedia’s policies that evolved over the early years of Wikipedia has been relatively consistent and steady over the last few years, which in Internet life is as steady as it gets.

The policies make it clear that the role to be played by contributors is to assemble what others have reviewed and established. The central policy in this regard is found in the “Wikipedia: Verifiability,” an entry which plainly states, “the threshold for inclusion in Wikipedia is verifiability, not truth.” What is verifiable, in Wikipedia’s terms, is what has been published elsewhere and published well: “Facts, viewpoints, theories, and arguments may only be included in articles if they have already been published by reliable and reputable sources.” Its policy is “no original research,” in the sense that original research is published elsewhere – where it can be properly vetted – and not in Wikipedia; whereas original research can be drawn on and cited to verify the content of Wikipedia entries. The guidelines point out, for example, that edits to a Wikipedia entry must be backed by verifiable sources, or otherwise “any material that is challenged and has no source may be removed by any editor.”

Wikipedia is literacy’s ultimate democracy.

What is asked of contributors to Wikipedia, by way of qualifications, is that they be careful readers and good students on the topics they would write about. They should consult multiple sources, detect dubious sources, and eschew self-published sites. No other expertise is required for playing a part in shaping the knowledge represented in Wikipedia. It is literacy’s ultimate democracy [14].

Wikipedia’s guideline on the use of “reliable sources” for entries makes further reference to published sources, as the new will build on tradition. It goes on to provide an extensive set of strategies and means for ensuring the reliability of sources, including the recommendation to use peer-reviewed scientific publications for entries bearing on the physical sciences, mathematics, and medicine. However, in the guideline on “citing sources,” the one-line summary of what sources to use states a preference for “credible third-party, peer-reviewed English-language sources.” There are entries, such as the one on “polio vaccine,” which include a number of statements for which editors have entered a superscripted “citation needed” at the end, which hyperlinks to the style guideline on citing sources: “Many medical researchers have expressed fear that rare human cancers may be linked to contamination by the monkey virus SV40 of a proportion of polio vaccines administered in the 1950s and 1960s [citation needed].” Also, for example, the entry for “copyright” includes the cautionary statement at the top, “This article’s section called ‘History of Copyright’ does not cite its references or sources.”

In advising contributors on how to find “good sources,” the guideline entitled “Reliable Sources” starts with a general recommendation to use the local public library, but goes on to recommend online databases, such as Google’s Book Search and the Internet Archive’s Million Book Project, which provide access to full text sources only in the case of books for which the copyright has expired (70 years after the death of the author in the case of the United States). When the guideline points out that “peer-reviewed publications are considered to be the most reliable,” it carefully notes that this peer review process is capable of making errors, and draws attention to a couple of cases, including the Sokal hoax, in which the physicist Alan Sokal was able to get a pastiche of mock-postmodern nonsense published in the peer-reviewed cultural studies journal, Social Text.

With the sciences, the guideline on “Reliable Sources” advises contributors to present the “community consensus” or “scientific consensus,” while making “readers aware of any uncertainty or controversy.” It states that “a well-referenced article will point to specific journal articles or specific theories proposed by specific researchers.” For entries that have historical content, contributors are advised that history journals are among the sources to consult, while referencing JSTOR (http://www.jstor.org/), which offers digitized collections of complete back-issue sets for hundreds of titles, although this database is available only through subscribing libraries and organizations. At the same time, the guideline may be steering contributors needlessly away from a strong open access sources, such as arXiv.org (http://arxiv.org/), when it advises that works found there “should be considered to be self-published” (in “Wikipedia: Reliable Sources/Examples”). ArXiv.org serves as an open access repository in which researchers have deposited what amounts to a good proportion of the peer-reviewed literature in high energy physics, as well as papers in mathematics, computer science, and quantitative biology, most with their publication status indicated (as soon as they are accepted for publication and updated on publication). More generally, the guideline on “Reliable sources” recommends that contributors check the figures and statistics they cite by referring to published sources. Finally, Wikipedia advises against using wikis, as well as blogs and personal Web sites, as secondary sources.

It is among the complaints against Wikipedia that students are citing its articles and doing so, as Alan Y. Liu, professor of English at the University of California at Santa Barbara has put it, “with no awareness that there was a need to read primary work or even a critical work” [15]. Liu points out that even as students should steer clear of citing any encyclopedia, it is all the more troubling as Wikipedia’s text is under a state of permanent editing. All of this points, in my mind, to the question of such a work leading into a body of knowledge (by citation and links) and not standing in as the whole of it, or as Stacy Schiff notes in her New Yorker review of Wikipedia, “the facts may be sturdy, but the connective tissue is either anemic or absent; and citation is hit or miss” [16].

It is the connective tissue provided by citation to scholarly work that this study considers. It does so at a time when Wikimedia Foundation employee Danny Wool has said that the current emphasis with Wikipedia is on improving the quality of the entries, and many contributors, as Ann Kirschner discovered, are looking for ways to enhance existing entries in form and substance (Wool in Kirschner, 2006). Also, recently, the Wikimedia Foundation has held a “Wikimedia open access chat” to address the question of “what types of open access content are potentially relevant to Wikimedia projects” (Wikimedia Open Access Chat, 2006).

Method for this study

A sample of 100 Wikipedia entries for this study was generated using the site’s “Random Article” button. Of the 100 entries, ranging from “Aeronautics” to “Windom Earle” (“villain in the American TV series Twin Peaks”), 68 of them had been edited within the previous two months, reflecting the constant state of revision and, presumably, refinement of the entries. Also among the 100 entries, 35 of them had been labeled as “Stubs” by the editors, which carries a warning and invitation at the top of the entry: “This article about [X] is a stub. You can help Wikipedia by expanding it.” The editors have judged, according to Wikipedia’s statement on stubs, that such entries “do not yet contain sufficient information” [17]. In addition to the 35 stubs, two of the 100 entries were identified as needing “cleanup” and three were slotted “to be merged” with other entries. The entries designated as stubs, in need of cleanup, and to be merged were retained in the sample, as they form a vital part of this dynamic text.

A typical Wikipedia entry will have hyperlinks from keywords in the text to the entries on those words. The entry will also contain links within its text to Web pages outside of Wikipedia. This study focused exclusively on references in entries that were listed under “Sources,” “Notes,” “Related Research,” or “External Links,” which typically formed the final section of the entry. These sources were categorized as print or online, with those that were online further identified as either open access (freely available to online readers) or not.

A second step in this study was to create a smaller random sample (n = 20) out of the original sample of 100 that was used to establish the degree to which relevant open access research and scholarship could be found and added to the entries as references. The search for open access materials was conducted first by entering the title of the entry in Wikipedia from this smaller sample, and then by conducting subsequent searches using key concepts from the entry, using a number of search engines [18]. However, it became clear in the course of the study that Google Scholar was able to locate all of the open access resources found with the other search engines, although Google Scholar does not yet provide a ready means of identifying open access materials on the Web [19].

The final step in the method pursued for this study is a rather unusual one for a research venture. Rather than just report my findings, I have taken advantage of Wikipedia’s very ease of editing and its welcoming approach – if you think it can be improved, then fix it – to implement them. The open access references that we were able to locate for the smaller sample of twenty entries in the course of the study have now been added to the relevant Wikipedia articles and clearly marked with a link to the “open access copy” (by Sarah Munro who served as a research assistant on this study). In my experience, this is a rare — if small  step in demonstrating how research can directly enhance the object that it studies. Of course, the addition of these sources will need to be taken into account, in relation to the figures presented here, if readers should consult the relevant Wikipedia entries.

Cited sources

Number and nature of sourcesOn the initial question of how well Wikipedia’s entries are sourced, the results seem a little thin in light of this work’s policies. Among my sample of 100, close to half (44) did not offer a source for the entry, although these entries would have links to related materials and other entries (Table 1). The entries that the editors had labeled “stubs,” as they were judged to be in need of more information, had a slightly better record with 61 percent of these entries citing at least one source.

Table 1: Number of sources per Wikipedia entry (n =100)Notes: The 168 sources were found listed in the 100 Wikipedia entries under “Sources,” “External Links,” “Notes,” “References,” or “External Links”; Stubs refer to entries that editors have designated as needing more information; OA refers to “open access” research studies or scholarly works published in journals that can be reached and read at no cost.

Sources per entry

Entries (Stubs)

Entries w/ print source(s)

Entries w/ OA source(s)

0

43 (14)

n/a

n/a

1

29 (14)

4

0

2-5

21 (6)

6

1

6-21

7 (2)

2

1

With entries, such as “Mister Jip” (“an evil fictional character belonging to the Marvel Comics universe”), the absence of cited sources might seem entirely understandable. The 600-word article on Mister Jip reflects a fine example of original research, which Wikipedia advises against as “one of three content-governing policies” (as the entry “No Original Research” puts it). The “Mister Jip” entry establishes not just the special powers of this popular-culture figure but also assembles a biography that is drawn from a knowledge of the span of his comic-book life. That is, the expertise on Mister Jip appears to lie entirely in the hands of the 13 contributors to this entry, without need for outside sources. The particular strength of Wikipedia in dealing with themes of contemporary popular culture, in a way that clearly runs contrary to earlier encyclopedic traditions, is attested to by the sample in this study, which includes entries on “Hyperion” (“character in the Marvel Comics series Supreme Power”), “Static” (“album by hard rock band Mr. Big”), “Children of Time” (“episode of Star Trek: Deep Space 9”), and “Sprockets” (“fictional television talk show skit”). None of the contributors to these entries felt the need to add a source or reference.

On the other hand, a number of the 43 entries lacking a single reference were of a decidedly scholarly nature, such as “Fermat Curve” (“the algebraic curve”), “Marcus Vipsanius Agrippa” (“Roman statesman and general”), and “Sinsharishkun” (“one of the last kings of the Assyrian empire”). While these entries generally contained internal references to other Wikipedia articles, the contributors in these cases did not feel the need to connect readers directly to the scholarly work that is being done on these topics, work which the contributors were most likely familiar with to some degree.

A large share of the entries examined (29) offered a single source, such as the entry “Sócrates Rizzo,” a Mexican politician affiliated to the Institutional Revolutionary Party, which cites the Diccionario Biográfico del Gobierno Mexicano as its source. Only four out of the 100 entries relied exclusively on print sources (and they were singlesource entries), while print sources turned up in a dozen entries in total. Print did play a big part, for example, in the entry History of Trinidad and Tobago. It has references to 14 books on the topic, including the nineteenth-century Starks Guide-Book and History of Trinidad; the entry also cites such online sources as the CIA World Factbook. Still among the 168 cited sources and references in this study’s sample, 32 of them are print sources, with at least one reaching back as far as 1716 with the citing of Martin Martin’s A Description of the Western Isles of Scotland (second edition; London: printed for A. Bell) for “Rocabarraigh” (“phantom rock or island”). Online sources were clearly favored among contributors, as the greater interconnectivity which the Internet represents, compared to print culture, also forms part of Wikipedia’s quality as an instrument of knowledge and learning.

The best example of how the realms of print and digital literacy are being bridged can be found in the five entries among my sample that note that “this article incorporates text from the Encyclopædia Britannica, Eleventh Edition, a publication that is now in the public domain.” This famous edition of the Britannica from 1911 has been made freely available online at a number of sites, including Wikisource, which has copied the text, in turn, from the Gutenberg Project.

The entry with the greatest number of cited sources in this sample was “Louisiana Baptist University,” with 21 endnote references, which were principally to the university’s Web site and related documents, as well as to other sites on higher education. The entry on this controversial institution is largely devoted to documenting its loss of accreditation, through an act of the Louisiana Board of Regents in 1998, in response to “diploma mill allegations.” On this last point, the discussion page for “Louisiana Baptist University” includes a note to the effect that Wikipedia’s entry for LBU was nominated for deletion on 27 November 2005, but no consensus was reached on the matter, and so it remains. The efforts to have the entry deleted may have at least added to the drive for more extensive documentation.

Open access references

Of the two entries that included open access references to research and scholarship, Wikipedia’s entry for “Dictator Game,” which is a strategy used in experimental economics to test the nature of altruism among research subjects, has references in the entry’s “Notes” to the book, Foundations of Human Sociality: Economic Experiments and Ethnographic Evidence from Fifteen Small-Scale Societies (Henrich et al., 2004), and two journal articles. For the first of the cited journal articles, “Dictator game giving: Rules of fairness versus acts of kindness” (Bolton et al., 1998), the contributor provides two links to the article. The first link is to the abstract at the publisher’s site, where Springer-Verlag offers access to the full article published in 1998 in the International Journal of Game Theory for $US30. At the same time, the contributor also provides a link in the entry to a free version of the article, indicated by the icon for Adobe PDF files, in its published form that one of the authors, Elana Katok, has made available on her Pennsylvania State University Web site where she offers, in exemplary fashion, open access links to 18 of her published papers and 10 of her working papers).

By the same token, a link in the final footnote of the Dictator Game entry leads to the publishers copy of Experimental Economics and the Artificiality of Alteration (Bardsley, 2005), which can be purchased for $US22. The payperview system does make the work more accessible than having to subscribe to International Journal of Game Theory for $US726 per year, where the article first appeared, and easier to obtain than visiting a university library. But there is also an open access copy of Bardsleys paper freely available through the ePrints Soton archive at the University of Southampton.

Wikipedia’s entry on “Dictator Game” includes, in a later addition, an informal description of the experiments of John A. List, an economist at University of Chicago, which amount to a critique of the degree of altruism that has been established through dictator game experiments. However, by accessing the Bolton, Zwick, and Katok article, readers can learn of the earlier experiments, review the laboratory protocol for such an experiment, and see how conclusions are drawn from people’s behaviors in these laboratory settings. The entry’s very accessible summary of List’s more naturalistic approach to assessing altruism is convincing, but being able to link through to a published instance of the work on the dictator game gives the reader, I would argue, a more important lesson on how economists work with the idea of a game-experiment, and how they then set up, record, analyze, and draw conclusions from such experiments.

The story with the entry on “Information commons” is more complicated. Here is a topic that deals directly with the access question. It includes an internal link to the entry on “Open access” in Wikipedia. Yet the entry, although it is not marked as a stub, amounts to a little more than a few lines, with links to other Wikipedia entries, which are far more substantial, from commons to privacy. The entry speaks to the restrictions to “our shared knowledge-base” and to how “some believe” that “increasing control and commodification of information restricts our ability to encourage and foster positive developments in our cultural, academic, and economic growth,” but it seems a perfunctory, skeletal review of the issue, with no immediate reference or example. However, the entry is redeemed by its “Bibliography” and “External Links.”

In the bibliography for the entry, there are three books, each with its ISBN leading to a “Book sources” page in Wikipedia. This page allows you to search for the book in libraries around the world, with a search of WorldCat revealing how many miles away the closest library is with a copy of the book. Wikipedia is able in this way to tie in to public and university libraries, which have formed critical points of access for the reading public. In this way, Wikipedias library links establishes a wonderful bridge between one of the principal spheres of public knowledge in the nineteenth and twentieth centuries and the one that is now opening up through the twenty-first century.

The entry on the “Information commons” also provides links to two freely available online reports – one on the information commons (Kranich, 2004) and the other on fair use (Heins and Beckles, 2005) – from the Brennan Center for Justice at New York University School of Law. Links to these two substantial reports (of some 70 pages each) represent the immediate points of access to very detailed analyses of the issues, especially with their extended references to a good deal of equally open literature that can be found online. And while this is, indeed, the ideal Wikipedia entry to conclude on in this study’s sample, the value of these links is lost to a degree by the shortcomings of the entry itself. It offers a cautionary demonstration of how the quality of the sources are a necessary but not sufficient condition for what I imagine as the educational follow-through, which moves from an informative and intriguing entry to a larger world of inquiry on the topic that is made possible by the ability of all online readers to tap into open access sources of research and scholarship.

Wikipedia’s open access potential

In the second phase to this study, a random sample of 20 entries, created out of the original 100, was used to check the proportion of entries that could potentially be equipped at this point with at least one related open access study. The 20 entries in this smaller sample had not, as it turned out, cited any open access sources. A search for relevant open access research resulted in at least one relevant open access instance of scholarly work – whether a peer-reviewed article, scholarly book, or research report – being found for 12 of the 20 entries (see Table 2). To count as a relevant and related study, the open access work had to be either an inquiry into or an application of one of the principal ideas in the entry. Of course, the designation of a study as relevant or related to the subject of a Wikipedia entry is a judgment call. The results reported here should be regarded as suggestive rather than definitive, when it comes to considering the educational contribution that open access research and scholarship could make to Wikipedia.

Table 2: The availability of open access research references for Wikipedia entries (n = 20).

Number

Wikipedia entry

References

Example of available open access researchor [if none available, a description of the topic]

Martha P. Nochimson, 1997.The Passion of David Lynch: Wild at Heart in Hollywood.Austin: University of Texas Press,at http://books.google.ca

13.

Antone Smith

0

[Tailback, Florida State University NCAA football team (c. 1987-)]

14.

Island of Terror

1

[British horror film released in 1966 by Planet Film Productions]

15.

John Morressy

3

[Science fiction writer and professor at Franklin Pierce College (1930-2006)]

16.

Kurtka

0

[Generic word for a jacket in a number of European languages]

17.

Lotus 27

0

[Version of Lotus 25 F1 car for the 1963 formula junior season]

18.

Mount Jefferson

4

[The highest peak in the Tobacco Root Mountains in Montana]

19.

Stephen R. Speed

1

[Non-partisan incumbent Mayor of Dover, Delaware (1963-)]

20.

Thomas Fincke

1

[Danish mathematician at University of Copenhagen (1561-1656)]

Not surprisingly, there was no shortage of open access research available for Wikipedia entries that were as clearly academic as “Fermat Curve,” among the sample used in this case. It was easy to come up with well over 30 open access papers that were made freely available through the arXiv.org database at Cornell University Library, with many other papers on the closely related themes of algebraic curves and Fermat’s last theorem. While Wikipedia’s entry “Fermat Curve” includes no external references, many of the terms used in writing the entry were linked to other Wikipedia entries, some of which were remarkably rich in open access references. If one clicks on “Fermat’s Last Theorem” in “Fermat Curve,” one finds the theorem is provided with links to eight open access papers, a blog, and a bluffer’s guide. The first reference links to two open access versions of Andrew Wiles’ famous Annals of Mathematics paper that provides the long-standing problem posed by Fermat’s promise that there was such a theorem (Wiles, 1995). “Fermat’s Last Theorem” provides a striking, if relatively rare, demonstration of the power of open access to expand the scientific quality of Wikipedia, all the more so with the reference entitled “A Bluffer’s Guide to Fermat’s Last Theorem” leading to a trove of open access papers for those with a strong mathematical interest but not the professional expertise in algebraic geometry necessary for the proof provided for the theorem [20].

At the less-than-academic end of the spectrum among the smaller sample of 20 Wikipedia entries, “Higglytown Heroes” provides coverage of the Disney children’s television series that was launched in 2004 and is based on Matryoshka nesting dolls, which are used to introduce young children to the plumbers, dentists, gardeners, firefighters, and others that make up a city. Among the resources that could be made freely available to readers for this entry is the review article “Children, Adolescents, and the Media: Issues and Solutions,” by Victor C. Strasburger and Edward Donnerstein from a 1999 issue of Pediatrics[21]. The article provides limited coverage of young children who would be of an age to watch Higglytown Heroes, as most of its focus is on studies looking at youth. It does review one study on how children tend imitate an “attractive role model” which bodes well for a series that focuses on the helping professions. The Pediatrics article also provides a list of articles, all of them freely available through Highwire Press, that have cited this piece since 1999, which includes work on how television affects sleeping schedules (Thompson and Christakis, 2005), the American Academy of Pediatrics’ Guidelines for Children’s Media Use (Gentile, et al., 2004), and the effects of television on child health (Bar-on, 2000). In this way, a Wikipedia article on a single children’s series can lead to an education on children and television, starting from a single open access instance.

In our sample of 20 articles, biographies made up three of the 11 entries for which open access research references were found. Wikipedia’s entry for Norman W. Walker (1896-1985) describes an English-American businessman and a “pioneer in the field of vegetable juicing and nutritional health.” Walker’s entry bears a warning label at the top that reads, “The neutrality of this article is disputed.” The brief note on the discussion page that accompanies this entry mentions the absence of “real facts and scientific research on these topics, nor is there any discussion of whether Mr. Walker’s ideas have any merit.” Wikipedia’s entry for “Norman W. Walker” has two external links, both to Norwalk juicers, betraying, one suspects, the business interests behind this entry. A search of Google Scholar on “raw food diet” and “vegetable juicing” provides access to open access research on health benefits for pregnant women (Koebnick, et al., 2001), which leads, in turn, to articles on raw foods benefits for those suffering from fibromyalgia syndrome (Donaldson, et al., 2001), and high cholesterol, although with some caution over its relation to coronary heart disease (Koebnick, et al., 2005). To see these and other studies added to the references would mean that readers could begin to judge for themselves the nutritional value of the diet for which Norman W. Walker lived.

The entry for Sócrates Rizzo (1945-), a former Institutional Revolutionary Party mayor of Monterrey and governor of Nuevo León in Mexico cites, as noted earlier, the Diccionario Biográfico del Gobierno Mexicano, but it might well have been supplemented by M. Baher El-Hifnawi’s open access paper “Modeling the Determinants of Automobile Ownership in Developing Cities: The Case of Monterrey, Mexico” (1998). El-Hifnawi’s paper describes the impact of Rizzo’s political leadership, as indicated in the opening footnote which thanks “Sócrates Rizzo, who as Governor of Nuevo Leone initiated the project that resulted in this paper.” The paper itself provides no further mention of Rizzo, and thus offers no insight into why, for example, as Wikipedia’s entry on Rizzo put its, “he resigned from the post on April 18, 1996 after several political scandals involving some of his closest cabinet members.” The example provided by El-Hifnawi’s paper is also limited by how it does lead off, through its bibliography on a trail of other open access instances, as was the case with the open access references found for the “Higglytown Heroes” and “Norman W. Walker” entries [22].

Finally, with two of the entries, relevant passages could be found through Google Book Search, which is digitizing the contents of a number of major research libraries. For example, Thomas Babington Macaulay’s History of England, from the Accession of James II (1849) proved an interesting and freely available source for the entry “Godolphin/Ministry of Churchill” in Wikipedia. The entry consists of nothing more than a table that establishes the position of Sidney Godolphin (1645-1712), 1st Earl of Godolphin, as First Lord of the Treasury in the Ministry of John Churchill (1702-1704). While “Godolphin/Ministry of Churchill” offers no sources, it does link to the entry “Sidney Godolphin,” which draws exclusively on the Encyclopædia Britannica, Eleventh Edition, with no other references provided. This is where Macaulay might well come in. Google Book Search is able to provide online readers complete access to Macaulay’s History (because of its publication date), and a search on Godolphin turns up King Charles II’s alleged but telling observation that “Sidney Godolphin is never in the way, and never out of the way” [23]. While more recent scholarship on the 1st Earl of Godolphin can be found online through the journal archive JSTOR (which is not open to the public), Google Book Search also provides access to the key historical work on Godolphin, Roy Sundstrom’s Sidney Godolphin: Servant of the State (1992). Access is limited, however, due to copyright restrictions to a “book preview” consisting of a few pages from each chapter, while diligent readers can search the text and consult the pages on which the search term comes up.

In the case of “Windom Earle,” Wikipedia’s entry on this character in David Lynch’s television series Twin Peaks might have been supported by a Google Book Search of Martha P. Nochimson’s book on Lynch with multiple illuminating references to Earle: “Cooper’s former FBI partner, Windom Earle (Kenneth Welsh), whose fall into insanity through his contact with extraterrestrial mysteries suggest a distinctly un-Lynchian attribution of peril to the crossing of the lines between imagination and logic” [24].

Among the other eight entries in the smaller sample, for which no open access sources could be found, it may not be surprising that entires such as the 25-year-old tailback for Florida State University NCAA football team, Antone Smith, or the 33-year-old mayor of Dover, Delaware, Stephen R. Speed, are not yet the subject of open access research, even as they attest to Wikipedia’s ability to think locally. What is less obvious is why that is the case with Thomas Fincke. Fincke was one of the very most important and significant scientists in Denmark during the seventeenth century, a mathematician and astrologer and physician in the beginning of modern science (Schonbeck, 2004). While he has earned the gratitude of many a high school, I am sure, for introducing the terms tangent and secant into mathematics, little can be found on him online, apart from Schonbecks article (for which only the abstract has been translated from the German).

What needs to be done

Wikipedia exemplifies “the writable Web,” to use Yochai Benkler’s term; it provides a leading instance of Benkler’s claim that “the networked information economy will democratize the public sphere” [25]. Part of the democratic thrust of this networked information economy is that readers can readily consult the primary sources, the evidence behind the explanations and claims made – whether they be government documents or research studies – with the result that, as Benkler puts it, “public discourse can rely on ‘see for yourself’ rather than ‘trust me’” [26]. For Benkler, “linking and ‘see for yourself’ represent a radically different and more participatory model of accreditation than typified by the mass media” [27]. My emphasis on what Wikipedia can draw from open access to research and scholarship is all about increasing Wikipedia’s “see for yourself” quality, but it may also have an impact on the research as well.

If Wikipedia were to form more of a public access point to this research and if public expectations around this “see for yourself” posture increases, then researchers and scholars may well have a greater incentive to make their published work open access. They may be more interested in taking advantage of the permission they have been granted by most scholarly journal publishers (see SHERPA http://www.sherpa.ac.uk/) to deposit their work in open access repositories, or they may wish to pay the fees that a number of publishers charge to provide open access to specific articles in subscription journals, or they may seek to publish in journals that make their contents open access some months after initial publications. In addition, researchers may be more inclined, in deciding where to publish, to check the Directory of Open Access Journals, which provides a guide to over 2,500 titles, representing only a portion of the open access journals that make their contents immediately available without charge to readers (http://www.doaj.org). This increased integration of research and scholarship into public sites, such as Wikipedia, could well lead to greater public support for government research allocations. At the same time, this new public awareness and interest may threaten at times to overrun long-term research agendas. Yet the need for ongoing public and scholarly discussions of the relationship between research and democracy doesn’t seem like a necessarily dangerous or distracting side-effect of greater access to this body of knowledge.

My concrete and immediate proposal is that contributors to Wikipedia be encouraged and supported in seeking out and providing links to open access research, and that researchers and scholars do what is now well within their power to assist in this process by making as much of their work open access as possible. To assist with the Wikipedia side of this recommendation, I provide an example in the Appendix of a “Wikipedean’s Guide to Open Access Research and Scholarship,” for the use of contributors. For the longer term, I would propose that the Wikimedia Foundation consider introducing “reading tools” for Wikipedia entries, that allow readers to readily call up related studies, media reports, government documents, teaching materials, discussion groups, and other materials that are being made freely available online and that would lead to a richer engagement with the topic at hand [28].

Wikipedia, in this way, can begin to act as more of a gateway to learning and knowledge, in addition to being a ready reference source. To further link these parallel ways of contributing to knowledges public sphere speaks to nothing less than the human right to know what is known. Finding ways of bringing these new approaches to knowledge into closer proximity and association can only strengthen and extend that commons in both its democratic and educational dimensions.

About the author

John Willinsky is Pacific Press Professor of Literacy and Technology at the University of British Columbia where he directs the Public Knowledge Project (http://pkp.sfu.ca), which has developed Open Journal Systems and other open source software that support open access formats for research and scholarship, and where open access copies of his papers have been made available.
Web: http://www.lled.educ.ubc.ca/faculty/willinsky.htm
Email: john [dot] willinsky [at] ubc [dot] ca

Acknowledgements

I wish to thank Sarah Munro for her skillful assistance with the research and preparation of this paper.

Notes

1. All quotations from Wikipedia are drawn from its Web site as 30 September15 October 2006.

2. Under “Wikipedia: Size Comparisons” in Wikipedia, the Chinese encyclopedia Siku Quanshu or Imperial Collection of Four (17731782) is the one work listed that exceeds Wikipedia, as it said to have 19,337 volumes or 800 million words, compared to Wikipedia’s 511 million. As for degrees of participation, Cass Sunstein reports that half of the edits of Wikipedia are executed by 0.7 percent of the users while two percent of users, “fewer than fifteen hundred people, have done almost threequarters of all edits” (2006, p. 152)

3. A recent Pew Internet and American Life Project survey found that 87 percent of online users have turned to the Internet as a research tool, although more still turn to TV for news and information about science (Horrigan, 2006).

5. That is to say, I would not have guessed that even my best students would, on leaving their school days behind them, spontaneously gather around topics that interest them and on take responsibility for researching, creating, editing, and debating encyclopedia articles for no reason other than the pleasure of making such a contribution, as they might otherwise update their MySpace profiles, personal blogs, or trade music, photos, and electronic files online.

7. When the great polymath Denis Diderot edited the Encyclopédie (17511780), Voltaire, Rousseau, and Montesquieu were among the contributors. The famous eleventh edition of the Encylopædia Britannica was produced in association with Cambridge University, while the twelfth and thirteenth editions included contributions from Sigmund Freud, Albert Einstein, Marie Curie, Leon Trotsky, Harry Houdini, H.L. Mencken, and W.E.B. Du Bois.

9. Larry Sanger left the organization in 2002 over what he apparently felt was a disregard for expertise. He was particularly offended “that nonexperts should be able to treat with disdain anything an expert says,” and has since gone on to found an online encyclopedia, Citizendium, which is to be vetted by experts (Read, 2006). Jimmy Wales, for his part, recognizes that academics are not used to this leveling of the playing field: “There have definitely been cases where there were academics who came to the site, made good contributions, and the roughandtumble of the process really turned them off” (ibid.). See, as well, Scholarpedia, in which articles in the fields of neuroscience and computational intelligence are peerreviewed and placed the control of curator (Read, 2006a).

12. Rosenzweig (2006) also expresses concerns that the entry in Wikipedia on “Haym Solomon,” the eighteenthcentury Philadelphia financier, repeats the myth, backed by historical research since proven wrong on this point, that his loans to America’s revolutionary government were not repaid, as does Britannica and Encarta. Wikipedia’s Solomon entry now carries a cautionary question mark icon at the top along with the statement, “the factual accuracy of this article or section is disputed,” with advice to consult the “Discussion” tab for the entry. It also cites Rosenzweig’s correction, with a footnoted link back to Rosenzweig’s Wikipedia review (2006), as well as more current historical work on Solomon. The discussion page for the Solomon entry addresses Rosenzweig’s concerns under the subheading “Peer Review” with the question of Rosenzweig’s own source resolved in the course of the discussion by reference to “the entry on Solomon in American National Biography by McManus (a professor of history at Queens College).”

13. More recently, Thomas Chesney (2006) conducted a study that asked 69 researchers to examine both an article in their area of expertise and a randomly selected one; errors were reported in 13 percent of the articles while the researchers found the articles in their area of expertise slightly more credible than the randomly selected ones assigned to them.

14.Wikipedia’s homage to the published author has a fascinating historical analogy with the editing of the Oxford English Dictionary, as its principal editor, James Murray set out on the work of OED, himself an amateur lexicographer in the early days of this work, called on the reading public in Victorian England to assemble reliable, extremely welldocumented (doublechecked by Oxford researchers) sources for the definition of the English language on historical principles based on no more than their reading of published sources (Willinsky, 1994).

17.Wikipedia: “Stubs are Wikipedia entries that have not received substantial attention from the editors of Wikipedia, and do not yet contain sufficient information on their subject matter. In other words, they are short or insufficient pieces of information and require additions to further increase Wikipedias usefulness. The community values stubs as useful first steps toward complete articles. Anyone can complete a stub.”

18. The search engines that were used to locate open access research and scholarship: (a) Google Scholar (http://scholar.google.com/), which searches “peerreviewed papers, theses, books, abstracts and articles, from academic publishers, professional societies, preprint repositories, universities and other scholarly organizations”; (b) PubMed Central (http://www.pubmedcentral.nih.gov/), which is an index to open access literature in the life sciences; (c) OAIster (http://oaister.umdl.umich.edu/o/oaister/) which is a search engine for institutional repositories and database archives most of which are run by universities, with much of the open access material on a site in which authors can deposit a copy of their published (and unpublished) work; (d) DOAR (http://www.opendoar.org/), the Directory of Open Access Repositories, which enables the searching of institutional repositories; and, (e) PKP Harvester (http://pkp.sfu.ca/harvester2/demo/), with which a number of open access journals and conferences are registered for indexing purposes.

19. See the Appendix for a method of using Google Scholar to identify open access research. Google Scholar does not allow the search to be limited, for example, to works that use the Creative Commons license (“free to use or share”), but this license is not yet commonly used for open access research, in part because the authors of this literature have signed over the copyright to their work to the publisher who then grants back to the author the right to post an open access copy of their article, typically after it has been revised but before it is in its final published PDF form.

20. Among the 13 other papers listed as references in “Fermat’s Last Theorem,” there is a “bluffer’s guide” prepared by LekHeng Lim, a doctoral student at Stanford, for those who “don’t have easy access to JSTOR [backissue database] or the AMS [American Mathematical Society] periodicals” and who “would nevertheless like to know what the proof of the most celebrated theorem in Mathematics looks like.” The guide also provides links to “much more readable” papers on the Fermat and number theory.

21. The review article from Pediatrics is one of the 1.4 million free articles made available by journals that publish online with Highwire Press a division of the Stanford University Libraries.

22. This chain of open access is supported in large measure by Highwire Press, which allows readers of its articles (from among the 1,000 journals for which they serve as the online publisher) to have open access to any Highwire Press article that is listed in the references to the article they are reading or that cites that article.

24. Nochimson, 1997, p. 74. Just how far “fair use” within Google Book Search will enable readers to read is the subject of an ongoing law suit which the American Publishers Association is pursuing against Google on this program’s alleged violation of copyright (Wyatt, 2005).

26. Benkler, 2006, p. 228. The Pew Internet and American Life Project survey discovered that 80 percent of those who have found science news online have checked its reliability, with 62 percent doing so online and 54 percent going to print sources, such as an encyclopedia or a journal (Horrigan, 2006).

M. Baher ElHifnawi, 1998. Modeling the determinants of automobile ownership in developing cities: The case of Monterrey, Mexico, Harvard Institute for International Development, Development Discussion Paper, number 668, at http://www.cid.harvard.edu/hiid/668.pdf/, accessed 23 January 2007.

Stephen Harnad, 2005. “Fastforward on the green road to open access: The case against mixing up green and gold,” Ariadne, volume 42 (January), at http://www.ariadne.ac.uk/issue42/harnad/, accessed 23 January 2007.

Darcy A. Thompson and Dimitri A. Christakis, 2005. “The association between television viewing and irregular sleep schedules among children less than 3 years of age,” Pediatrics, volume 116, number 4, pp. 851856, at http://pediatrics.aappublications.org/cgi/reprint/116/4/851/, accessed 23 January 2007.

Steven Weber, 2004. The success of open source. Cambridge, Mass.: Harvard University Press.

Appendix

Citing Open Access Research and Scholarship in Wikipedia

As scholarly publications, particularly journal articles, have increasingly appeared online over the last 10 years, a dual economy of access to this knowledge has emerged. There is the traditional feebased access provided only to individual subscribers and members of a subscribing research library (supplemented by payperview for a single article), and there is a second set of materials, including many of the same journal articles, that are made freely available to readers. Articles that are free to read or open access, as it is commonly called, may well include works that are published in feebased journals, because the publishers of those journal now permit their authors to deposit copies of their articles in their university library repository (or on their Web site), thereby making a copy of it open access.

Although current estimates suggest that perhaps 20 percent of the 2.4 million or so articles published each year are open access (whether through library repositories or by the journals themselves), the proportion is growing (Harnad, 2005) As a result, Wikipedia contributors have a substantial body of work available to consult and cite, thereby enabling readers to not only see for themselves the sources on which Wikipedia entries have drawn, but also to pursue their educational interests in topics well beyond Wikipedia.

Finding open access materialsTo locate relevant sources for Wikipedia entries that are open access, the best method at this point is to use Google Scholar (which is one of Google’s optional search engines). The title of the entry should be used as the initial search term (followed by searches on concepts, names or ideas that figure prominently in the entry). Each of these searches in Google Scholar will lead, in most cases, to a list of studies, some promising to be more relevant than others to the entry.

Among the more promising titles in the list, only a small proportion will be open access or freely available to readers. You can click on the title to see if it leads to an open access copy or to copy that requires a password or a fee. However, there are tricks to identifying which studies are open access is to read the URL that appears under the title.

If the URL contains .edu, .org, or the name of an institution, or if the URL has .pdf at the end of it, then there is a good chance that the item is open access. A work is not likely to open access if the URL listed below the title contains .com (e.g., questia.com; ingentaconnect.com) or if a publisher’s name appears (such as Taylor & Francis, Harvard University Press, Blackwell, Elsevier, etc.), or if JSTOR is named.

This Google Scholar listing is likely to be open access:

This one is not open access:

Also, individual titles in a Google Scholar search may be immediately followed by “- group of 3 ».” Clicking on this leads to multiple versions of the paper (three in this case), one of which may be open access and, again, which is again best evaluated by checking the URL before clicking on the title to see if it is open access.

After clicking on the “- group of 3 »” hyperlink, the three online instances of this article will appear. In this case, the first link goes to the publisher’s site for Physical Review E, where the article can be purchased for US$25.00, and the third listing leads to PubMed, which provides an abstract and link to the publisher’s site. However, the second listing leads to the Astrophysics Data System (see “adsabs.harvard.edu/” beneath the title) which has a link to an open access copy of the full text at arXiv.org (following the publisher’s policy permitting authors to selfarchive their work):

Adding Open Access to a ReferenceTo add open access to a reference for a Wikipedia entry, simply capture the URL of the open access copy, and place it at the end of the reference, whether it is listed in the entry’s “References” or more suitably perhaps under “External Links,” or “Related Studies.” For the Capocci et al. study on the Wikipedia, cited above, enter the following on “Edit this Page”: