A Wikipedia Literature Review - Owen S. MartinThis paper was originally designed as a literature review for a doctoral dissertation focusing on Wikipedia. This exposition gives the structure of Wikipedia and the latest trends in Wikipedia research.Downloads: 8

Wikipedia word countsA CDB database mapping tokens to little-endian unsigned ints of the number of times that token appears in English WikipediaKeywords:datasetDownloads: 2

How Wikipedia Works - Phoebe Ayers; Charles Matthews; Ben Yates"We cover Wikipedia from soup to nuts: for readers trying to understand what's in Wikipedia, how and why it got there, and how to analyze the quality of the content you might find on the site; for current and future editors, from basic editing techniques and wikisyntax to not-so-basic information on complicated syntax, referencing and researching content, and editing collaboratively and harmoniously; and finally for anyone interested in how Wikipedia's vibrant and complicated community comes tog...Keywords:Wikipedia; Mediawiki; Wikimedia; documentation; Wikipedia--Handbooks, manuals, etc; encyclopedias; Social media; User-generated contentDownloads: 2,228

Scientific citations in Wikipedia - Finn Aarup NielsenThe Internet-based encyclopaedia Wikipedia has grown to become one of the most visited web-sites on the Internet. However, critics have questioned the quality of entries, and an empirical study has shown Wikipedia to contain errors in a 2005 sample of science entries. Biased coverage and lack of sources are among the "Wikipedia risks". The present work describes a simple assessment of these aspects by examining the outbound links from Wikipedia articles to articles in scientific journals with a ...Downloads: 3

Edit wars in Wikipedia - Róbert SumiWe present a new, efficient method for automatically detecting severe conflicts `edit wars' in Wikipedia and evaluate this method on six different language WPs. We discuss how the number of edits, reverts, the length of discussions, the burstiness of edits and reverts deviate in such pages from those following the general workflow, and argue that earlier work has significantly over-estimated the contentiousness of the Wikipedia editing process.Downloads: 19

Entity Ranking in Wikipedia - Anne-Marie VercoustreThe traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we are interested in entity ranking in the field of information retrieval...Downloads: 31

Collaborative Development in Wikipedia - Gerald C. KaneUsing 16,068 articles in Wikipedia's Medicine Wikiproject, we study the relationship between collaboration and quality. We assess whether certain collaborative patterns are associated with information quality in terms of self-evaluated quality and article viewership. We find that the number of contributors has a curvilinear relationship to information quality, more contributors improving quality but only up to a certain point...Downloads: 10

Wikipedia Redirects 2010--12-06 - Eric Normand (derived from the Wikipedia Database Dumps)Cleaned up redirect data from Wikipedia. Tab-delimited lines. Two columns (pageid from wikipedia datadump, title of page redirected to). Note: Some links are recursive. Some could be cyclical.Keywords:wikipedia; redirects; semantic; dataDownloads: 55

Wikipedia Titles 2010-12-06 - Eric Normand (derived from the Wikipedia Database Dumps)Titles of all Wikipedia pages (including redirects) from the Wikipedia database dump. UTF-8 Format. One entry per line. An entry consists of page id, a tab character, and the title of the page.Keywords:wikipedia; semantic data; titles; synonymDownloads: 61

Wikipedia Administration Logs - July 2006Captured as part of a multi-year research project, this archive contains logs of a variety of wikipedia administration IRC channels. Most points of discussion are out-of-band metaissues not considered ideal to include on the discussion pages of wikipedia itself.Downloads: 5

Product/Brand extraction from WikiPedia - K. MassoudiIn this paper we describe the task of extracting product and brand pages from wikipedia. We present an experimental environment and setup built on top of a dataset of wikipedia pages we collected. We introduce a method for recognition of product pages modelled as a boolean probabilistic classification task. We show that this approach can lead to promising results and we discuss alternative approaches we considered.Downloads: 7

Making Math Searchable in Wikipedia - Moritz SchubotzWikipedia, the world largest encyclopedia contains a lot of knowledge that is expressed as formulae exclusively. Unfortunately, this knowledge is currently not fully accessible by intelligent information retrieval systems. This immense body of knowledge is hidden form value-added services, such as search. In this paper, we present our MathSearch implementation for Wikipedia that enables users to perform a combined text and fully unlock the potential benefits.Downloads: 12

Evolution of Wikipedia's Category Structure - Krzysztof SucheckiWikipedia, as a social phenomenon of collaborative knowledge creating, has been studied extensively from various points of views. The category system of Wikipedia, introduced in 2004, has attracted relatively little attention. In this study, we focus on the documentation of knowledge, and the transformation of this documentation with time. We take Wikipedia as a proxy for knowledge in general and its category system as an aspect of the structure of this knowledge...Downloads: 17

Synonym search in Wikipedia: Synarcher - A. KrizhanovskyThe program Synarcher for synonym (and related terms) search in the text corpus of special structure (Wikipedia) was developed. The results of the search are presented in the form of graph. It is possible to explore the graph and search for graph elements interactively. Adapted HITS algorithm for synonym search, program architecture, and program work evaluation with test examples are presented in the paper...Downloads: 6

Dynamics of conflicts in Wikipedia - Taha YasseriIn this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies...Downloads: 18

Time evolution of Wikipedia network ranking - Young-Ho EomWe study the time evolution of ranking and spectral properties of the Google matrix of English Wikipedia hyperlink network during years 2003 - 2011. The statistical properties of ranking of Wikipedia articles via PageRank and CheiRank probabilities, as well as the matrix spectrum, are shown to be stabilized for 2007 - 2011. A special emphasis is done on ranking of Wikipedia personalities and universities...Downloads: 16

Use of Wikipedia Categories in Entity Ranking - James A. ThomWikipedia is a useful source of knowledge that has many applications in language processing and knowledge representation. The Wikipedia category graph can be compared with the class hierarchy in an ontology; it has some characteristics in common as well as some differences. In this paper, we present our approach for answering entity ranking queries from the Wikipedia. In particular, we explore how to make use of Wikipedia categories to improve entity ranking effectiveness...Downloads: 33

Wikipedia: organisation from a bottom-up approach - Sander SpekWikipedia can be considered as an extreme form of a self-managing team, as a means of labour division. One could expect that this bottom-up approach, with the absense of top-down organisational control, would lead to a chaos, but our analysis shows that this is not the case. In the Dutch Wikipedia, an integrated and coherent data structure is created, while at the same time users succeed in distributing roles by self-selection...Downloads: 13

Clustering of scientific citations in Wikipedia - Finn Aarup NielsenThe instances of templates in Wikipedia form an interesting data set of structured information. Here I focus on the cite journal template that is primarily used for citation to articles in scientific journals. These citations can be extracted and analyzed: Non-negative matrix factorization is performed on a (article x journal) matrix resulting in a soft clustering of Wikipedia articles and scientific journals, each cluster more or less representing a scientific topic.Downloads: 4

Two-dimensional ranking of Wikipedia articles - A. O. ZhirovThe Library of Babel, described by Jorge Luis Borges, stores an enormous amount of information. The Library exists {\it ab aeterno}. Wikipedia, a free online encyclopaedia, becomes a modern analogue of such a Library. Information retrieval and ranking of Wikipedia articles become the challenge of modern society. While PageRank highlights very well known nodes with many ingoing links, CheiRank highlights very communicative nodes with many outgoing links...Downloads: 6

Assessing the Value of Coooperation in Wikipedia - Dennis M. WilkinsonSince its inception six years ago, the online encyclopedia Wikipedia has accumulated 6.40 million articles and 250 million edits, contributed in a predominantly undirected and haphazard fashion by 5.77 million unvetted volunteers. Despite the apparent lack of order, the 50 million edits by 4.8 million contributors to the 1.5 million articles in the English-language Wikipedia follow strong certain overall regularities...Downloads: 9