Limits of Wikipedia

If the result of the poll is unclear, because two favorites have the same amount of votes or no one has pressed the button, a peaceful random generator is started.

Update 2018-10-16
The poll is over. “Limits of Wikipedia” has won. Here is the article.

Wikipedia’s next door

Sometimes, the Wikipedia is described as the standard in the internet and the best encyclopedia in the world. This description is right, if WIkipedia is compared with other mainstream encyclopedias like Brockhaus or Encyclopedia Britannica. But if we focus on scientific knowledge Wikipedia is not very advanced. To describe the problem in detail, I’ve found a simple but effective way to demonstrate the limits of today’s Wikipedia.

The searchengine Google provides a tab called Shopping for searching in commercial available products. If we enter the keyword “Encyclopedia” into the box and adjust the list-order to expensive products first, we will find a huge list of commercial available encyclopedias which are not WIkipedia. All of them are created by academic publishing companies like Elsevier, Springer and WIley. They are not general purpose encyclopedia but specialized on a topic from science. Here are some examples:

The list is much larger then only these 4 books, I would guess that at least 200 different encyclopedias are available. Each of them is large and expensive. The reason why they are sold to libraries is because they are better than Wikipedia. They contain more keywords and the description is more accurate. It is simply not true that Wikipedia is the best encyclopedia in the world. It is only the cheapest one and the quality is not very high.

Explaining the difference between Wikipedia and the Elsevier/Springer encyclopedias is easy. The traffic of the keywords is different. Some keywords are also part of Wikipedia. But they have a very low usage statistics. That means it’s not a mainstream topic which generates 1000 hits per day, but it is possible that such keywords have only 2 visits per day. Wikipedia has only a few of these low traffic keywords available. If somebody needs such a specialized information he has to buy a Springer encyclopedia.

Let us estimate how expensive this would be. 200 encyclopedias each for 8000 US$ is 1.6 million US$. A huge price, but it make sense to invest this amount. Most institute libraries in the world have done so, because they need the knowledge. They can not switch to Wikipedia because Wikipedia doesn’t provide this specialized knowledge.

Bringing Wikipedia to the researchers

The Wikipedia encyclopedia is recognized as a mainstream encyclopedia which provides knowledge about the latest Starwars movie, Harry Potter books and Reggae musicians. It is accepted by a non-scientific audience as a reference for getting information quickly without asking websites which containing a lot of advertisement. The internal quality control of Wikipedia works great and avoids that spam and biased information is injected into the encyclopedia.

A researcher in a biology lab has two options. Either he can ask Wikipedia for help or he can search in a Springer Encyclopedia. The Springer version provides a much higher quality. I’m referencing to this fact because right now, Wikipedia only has replaced general purpose encyclopedia like Encyclopedia Britannica. But not specialized versions which are written by scientists. If we compare on an objective base the quality of Wikipedia with a Springer Encyclopedia of a certain topic, we will notice, that Wikipedia is weaker. That means, in most cases the lemma has no entry and if it’s available in the Wikipedia the article is too short. That means, Springer is able to sell their own Encyclopedia for thousands of US$ because Wikipedia is not able to provide the needed information.

I do not know how to solve this issue. But i can give a measurement if it is solved or not. If Wikipedia has better content than a Springer Encyclopedia, the issue is solved. And to determine the progress it is necessary to compare both sources. Left we open the article in Wikipedia and right we open the article in a Springer handbook. The difference is, that a specialized version explains every detail of a subject. The audience is not the whole world, but a researcher who is interested on a concrete subject and has a lot of background knowledge. This kind of audience is not happy with today’s Wikipedia. The problem with Wikipedia is, that it only provides general knowledge but has many missing topics in scientific sub disciplines.

To overcome the problem it is necessary to create articles in Wikipedia with a low amount of visitors. That are specialized entries which are relevant for not more than 100 people worldwide and which will generate only 1-2 visits per day. These subjects are not very attractive for Wikipedia authors because if an article is not read by the public it is useless.

The good news is that the overall structure of Wikipedia doesn’t have to change. Specialized articles can be handled like any other article too. That means, the workflow of creating and evaluating the content is the same. The only new thing is, that these kind of articles will generate a ultra-low amount of traffic. That means it seems to specialized for a general purpose encyclopedia. But at the end it will help to increase the acceptance of Wikipedia in the research community.

Let us examine some examples from the “Springer Encyclopedia of Algorithm”. None of the following lemmas are available in Wikipedia:

The reason is, that these entries are very specialized. Apart from computer scientists nobody will use these terms. But all of them are available in the Springer Encyclopedia, and this is the reason why the Springer version is used in an Institute library but Wikipedia not.

What have these lemmas in common? They are three word lemmas. That means, the question is not what “approximation” means. (This is explained in WIkipedia very well) the question is what a certain short sentence mean. Wikipedia has only a handful of two words and three words lemmas in the database. For example “Approximation error”, “Newton’s method” and “Tolerance relation” is all explained very well in the Wikipedia. But there are many more lemmas which are more specialized and doesn’t have an article right now.

What Wikipedia can learn from Springer

Springer has a unique position to the researchers. The company is perceived as close to the problems. That means, a Springer book fulfills the needs of a researcher. What is the secret? The secret behind every Springer book is, that it is focused on a detail problem. A handbook about Nanotechnology is specialized on only this topic but describes it in detail. And the Springer encyclopedia are domain specific encyclopedia too. They are not written for a broad audience but for experts in the field.

Is it possible to transform this concept into the Wikipedia ecosystem? Yes, it is possible but it is hard. The main problem is, that today’s Wikipedia authors are not experts in their field but have a general knowledge. They have much in common with general Liberians from a public library who know from any subject a bit, but nothing in detail. In contrast, the Springer encyclopedia was written by experts which bring in a strong background knowledge. This make the content so relevant for the readers.

Wikipedia have tried to become more important to researchers in the past but failed in doing so. It was not possible to motivate existing researchers in contributing content. Instead Wikipedia has it’s strength in topics with a general interests for example movies, sports and political information. Nearly all aspect of everyday life is available in the Wikipedia, but that is not enough for a scientific encyclopedia. The future vision is to enrich Wikipedia with more specialized information which goes very deep into a subject.

I think WIkipedia can’t learn anything from classical encyclopedia like Encyclopedia Britannica or Brockhaus. Both are death today. But WIkipedia can learn a lot from Springer. The people there know more about creating an encyclopedia than the authors / admins at WIkipedia. And they are experts for specialized knowledge which is teached in universities.

On the other hand, Springer can learn something from WIkipedia. And this is how to reach a huge audience. Wikipedia has the top #1 rank in Google and is read by millions of people. Springer doesn’t have such a kind of traffic. A normal Wikipedia article has around 100 visits a day. In one year it is 182000 visits. Wikipedia is a mass medium, while Springer is a specialized medium. If Springer want’s to sell more books they need WIkipedia, and if WIkipedia want’s to get high-quality content it will need Springer.

Springer Link

Let us take a look what the commercial publisher Springer has to offer. In the section “reference works”, encyclopedia and handbooks are listed. An encyclopedia is similar to Wikipedia an alphabetically ordered list of articles, while a handbook contains overview articles which are much longer. Each subject like mathematics, engineering and physics has a huge amount of Springer reference works. It is possible to view example chapters, but the full text access is restricted to users who pay. This principle is well known under the term paywall.

What is unique in the Springer encyclopedia? It contains usually very complicated and specialized subjects. For example these one:

None of these keywords is available inside Wikipedia. If a researcher needs them, he has to buy the Springer book. What they have in common is that they sounds complicated and that they contains of more than a single word. Instead they are 3 words and even 4 words lemma titles. That means, it is a specialized entry for a specialized audience.

And this is the main difference between a mainstream encyclopedia like Wikipedia which is read by the mainstream and an academic encyclopedia from Springerlink which is read by researchers.

What the researchers have done in the last 10 years is to build their own Wikipedia version which is protected behind a paywall. That means, the researchers within universities reading and contributing to the Springer encyclopedias but not to Wikipedia. In contrast, Wikipedia is written by journalists, bloggers and amateurs. The Springer encyclopedias are written by real researchers with a deep knowledge in their field.

Springer Link statistics

The Springer Link website contains of 24 categories like Biomedicine, Chemistry and Computerscience. Each category has around 50 different encyclopedias to offer which are listed in the reference-work section. The total amount of scientific encyclopedia from Springerlink is 24×50=1200. Each encyclopedia costs around 4000 US$ and provides around 4000 pages. The total number of printed pages is 1200×4000=4.8 million. Elsevier, a Springer competitor, has also many encyclopedia to offer.. They are listed at the Sciencedirect website. The price tag is similar. That means a book with 2000 pages will cost around 2000 US$.

A size comparison with Wikipedia is possible. The printed wikipedia has 7473 volumes with 700 pages each, https://en.wikipedia.org/wiki/Print_Wikipedia The amount of pages is 5.2 million. While the Springer encyclopedias containing in total of the above mentioned 4.8 million pages.

Wikipedia vs. academic encyclopedias

Wikipedia strength is, that the encyclopedia is cheap and covers mainstream topics. His weakness is, that specialized lemmas from scientific fields are missing. The commercial encyclopedias from Elsevier and Springer have the opposite profile. They are expensive, but provide specialized academic topics. The content is created by experts.

Having fun with Wikipedia

In the beginning of the famous encyclopedia, it was easy to vandalize the project. Vandalizing means to destroy something, to rant against the admins and to make clear who the boss is. The best practice method in doing so is to to search for a high traffic lemma for example “Artificial neural network”, delete all the content and press the save button. Now, Wikipedia is shutdown and the world sees nothing if they need information about the topic.

After 30 minutes or so, some admin is alarmed because we have deleted his work and he is complete irritated. That means, the admin doesn’t know what is happend with his encyclopedia and he must first consult the manual to rollback the information to a previous state. In this time, Wikipedia is offline and we have won.

Unfortunately, the time has changed. Modern admins are prepared for such kind of vandalism. They are better informed how to use the mediawiki system and in worst case they will block the attacker completely which is a bad situation, if we want to vandalize the Wikipedia a bit more. What can we do, if the aim is to have a bit fun with the admins?

What a good vandal is doing is to upgrade his tools. Instead of simply clearing an article the better idea is to produce a non-sense article. A non-sense article has the advantage that automatic spam protection are not able to recognize it and sometimes it took weeks until an admin will detect the problem manually. The best way to create a nonsense article for Wikipedia is the Scigen generator, https://en.wikipedia.org/wiki/SCIgen It was invented with the aim to fool an academic journal but it works also for wikipedia.

The first step is visit the Scigen website and press “generate new paper”. Then the document has to be converted into the wikisyntax. If everything looks fine, it can be uploaded to wikipedia. The advantage over normal vandalism is, that on the first look the Wikipedia article is similar to a real article. The automatic incoming filter of Wikipedia which checks all the content will not be alarmed, because it is normal text, contains no plagiarism and provides references to other academic papers. To recognize the problem, somebody must read it in detail, but this is never done. Most admins are in hurry because each day around 700 articles are created from scratch. So our non-sense article can stay in the encyclopedia and we had a lot of fun during the break.

Sometimes, a wikipedia article with a high amount of traffic is blocked as default. But that is no problem, because many others can edited freely. Here is the list of most visited lemmas. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Computer_science/Popular_pages For example, the topic “Support vector machine” has over 2000 views per day, but it is not protected. So it is the ideal starting point to drop some nonsense. If the aim is, that the Scigen content stays longer in the wikipedia, it is good idea to search for a low traffic lemma. That is not observed carefully, and we can make edits without being interrupted by the admins.