Wikipedia has gained tremendous attention and success, not only among contributors and readers, but also among researchers. In the middle of a plethora of published scholarly studies, there is a great need to review and summarize the research conducted thus far. Thus, for the past couple years we have been working to systematically collect and analyze the research publications during Wikipedia’s first ten years, 2001-2011. In fact, four of us were working on this literature review together and one of us independently; since last September we have joined forces to provide as comprehensive a review as possible.

The team of four has employed the systematic literature review methodology, which rigorously plans and documents every step taken, including applying strict inclusion and exclusion criteria as to what may be included. The advantage has been to obtain a highly comprehensive view of the literature; the disadvantage is the need to make sometimes arbitrary exclusion criteria in order to keep from being overwhelmed by the thousands of possible studies to include. By contrast, the other reviewer adopted a selective review approach, keeping aware and identifying any interesting and potentially valuable research project on Wikipedia. The advantage of this is that the range of included studies is much broader; the disadvantage is that many studies are missed by this approach. By combining our reviews with these two approaches, we believe we have identified probably the broadest range of research on Wikipedia ever assembled.We aim to offer a highly useful point of reference for the next decade of Wikipedia research and beyond.

By the time of Wikimania, we will have reviewed around 500 scholarly studies of Wikipedia, examing what research has been conducted, what research questions have been asked, and what research methodologies and approaches have been employed to study Wikipedia. We organize the literature into six main categories inquiring about different aspects of Wikipedia, and discusses what is known and unknown about these categories and their subcategories. The main categories are as follows:

Infrastructure includes studies concerning fundamental factors that enable Wikipedia to exist in its current form such as legal and technological infrastructure.

Participation concerns issues related to participation in the Wikipedia community, including studies on contributors that create or edit Wikipedia articles and studies about other collaborators that actively participate in the online community life, such as voting for featured articles or resolving disputes among contributors.

Content includes studies related to Wikipedia content, its growth, its depth, breadth, and reliability.

Readership studies issues related to Wikipedia readers (as distinct from contributors), how they perceive and use Wikipedia, and the purposes of their use.

Corpus discusses research using Wikipedia as a textual corpus for various text analysis studies.

General covers Wikipedia-related issues very broadly, covering a wide swath of aspects of Wikipedia in the study that cannot be confined to any of our other major categories.

The study dataset including all the paper summaries organized within the category tree is presented in a SMW wiki called WikiLit that is openly accessible and editable. The wiki will remain accessible for Wikipedia researchers and hopefully aid future Wikipedia-related research, but our primary goal of releasing the dataset is to facilitate dissemination of the data to long-term research collection sites such as AcaWiki and WikiPapers that have broader scopes than just Wikipedia research. We expect to submit our review for publication to a green open access scholarly journal later in 2012.

We propose one of two presentation formats, depending on the time allotted to us:

One hour format: 10 minutes for overview and background of the project, 10 minutes for presentation of the WikiLit website, 10 minutes to review general research trends in ten years of Wikipedia research, 10 minutes to highlight particular research findings from each of the six categories that have immediate relevance for Wikipedia, 20 minutes for questions and answers.

25 minute format: 5 minutes for overview and background of the project, 5 minutes to review general research trends in ten years of Wikipedia research, 5 minutes to highlight particular research findings from each of the six categories that have immediate relevance for Wikipedia, 10 minutes for questions and answers.

We would appreciate more time than 25 minutes if possible, but we can hopefully nonetheless make a valuable presentation in 25 minutes.

Track

Research, Analysis, and Education

Length of presentation/talk

One hour, if possible, including questions. If not, we'll do our best in 25 minutes