This series is proving a lot more popular than I'd figured. Who would have thought so many people enjoy noodling around with Web servers? By popular demand, "Web Served" now enters the bonus round with two things I didn't think I was going to be able to get to: MediaWiki in this piece, and Etherpad Lite in the next.

Wikipedia is a staple of the World Wide Web, used by millions of folks every single day. From casual readers checking a quick fact to journalists who need to verify esoteric details of a story to students too lazy to go to the library and consult more reliable primary sources, it's the go-to crowdsourced information site on the Internet.

Wikipedia is powered by a PHP-based application called MediaWiki. The concept of a "wiki" is simple: MediaWiki provides a framework where anyone can create pages, which can be edited by anyone else. The usage isn't limited to an encyclopedia—MediaWiki can power any kind of collaborative environment. Want to set up something for a working team to quickly throw ideas against the wall? MediaWiki can do that. Want to set up a photo library or other document repository? MediaWiki can do that. Want to make your own documentation library, complete with version tracking? MediaWiki can do that.

It is by no means the only game in town—there are lots of different wiki applications, including DokuWiki (which I used for a while and very much like) and Foswiki—in fact, there's an excellent wiki comparison page here (though the fact that I'm linking a Wikipedia page should tell you something about what the dominant application is). DokuWiki is particularly nice, because it can be skinned to look very much like Wikipedia and it doesn't require a database, storing all of its pages as flat files.

However, MediaWiki is the big dog, and if you want to set up a wiki, it's the one you'll most likely want to go with.

Thoughts on security

Enlarge/ Just a small sampling of the spam accounts a wiki can collect.

Lee Hutchinson

The very concept of a wiki is at odds with a lot of the normal ideas of security. In its purest form, a wiki should encourage even anonymous collaboration and shouldn't restrict the creation of accounts and the addition or modification of content by anyone. This can be seen with Wikipedia, where anyone really can edit anything (within certain limits and rules imposed by the gatekeeping editors). However, controlling spam accounts is difficult. I run a wiki for documenting the cool stuff folks have made on my Minecraft server, and spam account creation is an unstoppable force.

Fortunately, MediaWiki has role-based security model, so you can require accounts to be added to a security group before they are allowed to post. This adds administrative overhead—as in, it gives you the administrator more stuff to do—but for a personal wiki it's not at all a problem.

MediaWiki's popularity also makes it a pretty big attack target, and a large number of the vulnerabilities MediaWiki sites get hit with come from unmaintained plug-ins. As with WordPress, you should only install a MediaWiki plug-in if you are absolutely sure you need it, and you should keep your plug-ins up to date to avoid vulnerabilities.

Prerequisites

MediaWiki works with a number of different databases, and since we've already got MySQL installed, we'll use that. You'll need to create a new user and database for MediaWiki to use. By now, you should be familiar with how to do this—if not, check part 5 or part 6 for the details. Creating a new database and user for each Web application is a good idea because it limits the amount of damage that can be done if the Web application is compromised—it helps keep an attacker's access limited only to the database controlled by the compromised application.

Creating a new database isn't always an option if you're using a Web hosting service—some give you only a single database to use among all your applications. Since we're self-hosting, we have no such restrictions.

After you've created a MediaWiki database and user—which for this tutorial I'll assume are both named "wikidb"—you'll need to install a collection of utilities called ImageMagick, if it's not already installed. MediaWiki (and other Web applications you might want to install in the future) use ImageMagick's various utilities to modify the pictures you upload—most obviously, ImageMagick is used to resize images to provide thumbnails. Launch a root shell and install the ImageMagick package with aptitude:

aptitude install imagemagick

Installing MediaWiki

MediaWiki is available as a package you can install with aptitude, but the problem with installing an application like MediaWiki from the official curated sources is that it can take time—sometimes weeks or longer—for the official sources to be updated with new versions. Plus, those updates, when they come, are typically only done in response to security issues, not new features.

MediaWiki is a popular enough attack target that we want to make sure we always have the most current stable version installed, and to do that we need to install and maintain the application directly from the MediaWiki Foundation. It's possible to use Git (which we installed in part 6) to clone different MediaWiki releases to your server, but we're going to go for the regular old-fashioned tarball download.

Head to the MediaWiki download page in a browser and copy the target of the big prominent "download" link to your clipboard. This link will always point to the latest stable release of MediaWiki. As of this writing, that's version 1.20.2.

With that link on your clipboard, return to your terminal window and change to your Web root directory and download the release using wget. After it's downloaded, decompress it with the tar command. This will create a destination directory for MediaWiki; as with previous web apps, we'll need to modify that directory's ownership to your local Nginx user. We're also going to rename the directory (with the mv command) so that its name is a little easier to remember, and then finally we'll delete the source archive file to keep our Web root directory clean.

Lee Hutchinson
Lee is the Senior Technology Editor at Ars and oversees gadget, automotive, IT, and culture content. He also knows stuff about enterprise storage, security, and manned space flight. Lee is based in Houston, TX. Emaillee.hutchinson@arstechnica.com//Twitter@Lee_Ars

61 Reader Comments

I'm VERY happy to see that you're featuring MediaWiki and Etherpad. These are two fantastic tools, especially for education, in that they support collaborative "knowledge building" among students. In my opinion, knowledge building is one of the "best practice" activities that technology in the classroom enables.

Better yet, there's lots of teachers (or at least, growing numbers of teachers) who are "tech noodlers" with fledgling tech skills who will really benefit from the thorough coverage you're giving. By extension, those teachers' students will benefit too. On behalf of all of them, thanks!

- "Install (or want) only what you use; use only what you need; need only what you require."

The last part may sound redundant, but it came to mind that some needs aren't really requirements. (You might "need" something because someone told you to install it, but it's not necessarily "required" for the functionality you want.) It's basically there to reinforce that fact that you should only install the bare minimum essentials you positively need/require to provide the functionality you want and actually use.

Any chance this Web Server Guide will be put into PDF or EPUB format for future reference? I'd like to stick this on my e-reader. I've been interested in learning basic web server administration and this would REALLY help in the long run.

Etherpad Lite is a real-time collaborative editing app. We actually use it here at Ars occasionally for editing--an editor and a writer will sit down with Etherpad, paste in a story, and work it over while talking through it. You can see each others' edits in real-time. It works much like Google Wave used to, and in fact Etherpad Lite and Google Wave share some DNA. It's great when two or more folks want to work on a document at the same time, and it's much more real-time than Google Docs.

Quote:

Any chance this Web Server Guide will be put into PDF or EPUB format for future reference?

We're talking about it. The difficulty is in keeping things up to date. Some (or perhaps even most) of the config files and options discussed will eventually become outdated as all the different referenced applications get updated and updated and updated. It's not a problem now, but in a year or two, there'll be some noticeable drift.

Setting up the AMP stack on a Windows PC can be easy and fast if you use XAMPP, and slow and painful if you don't. Personal experience, having set up several MediaWiki (as well as other wikis). XAMPP is a free download from SourceForge. It is your friend.

Etherpad Lite is a real-time collaborative editing app. We actually use it here at Ars occasionally for editing--an editor and a writer will sit down with Etherpad, paste in a story, and work it over while talking through it. You can see each others' edits in real-time. It works much like Google Wave used to, and in fact Etherpad Lite and Google Wave share some DNA. It's great when two or more folks want to work on a document at the same time, and it's much more real-time than Google Docs.

How can something be more real time than the real time capabilities of Google docs?

Any chance this Web Server Guide will be put into PDF or EPUB format for future reference?

We're talking about it. The difficulty is in keeping things up to date. Some (or perhaps even most) of the config files and options discussed will eventually become outdated as all the different referenced applications get updated and updated and updated. It's not a problem now, but in a year or two, there'll be some noticeable drift.[/quote]

Hmm... There is that. You could just due it the way textbooks and other guides do it - Title Here: 20xx Edition - or some such. On the other hand, I'm not sure if you would care to maintain an in-depth beginner's guide like this long-term. IT development of any type is a relative fast-paced industry (especially with the current quick development cycles and updates many projects have adopted.)

How relevant do you wish to keep this series for the future? I wouldn't mind getting a current PDF/EPUB version now and then grabbing an updated version as needed however many months down the line later. (For sure, you can't really afford a monthly refresh like Chrome or FF ;-)

Dokuwiki (https://www.dokuwiki.org/dokuwiki) doesn't require a database engine and is easier to install. It is on par with MediaWiki feature wise. The advantage of MediaWiki is the better performance for a large number of users but it doesn't make any difference with less than 100 users.

I'll bite. Any reason to back up your statement, or should I add you to my ban list for being a troll? Win95 was popular, as I understand it, for being an operating system that was (fairly) compatible with market-leader Win3.1 AND it ran market leader Microsoft Office. PHP's reason for being popular are not, as I understand it, related at all.

We're talking about it. The difficulty is in keeping things up to date. Some (or perhaps even most) of the config files and options discussed will eventually become outdated as all the different referenced applications get updated and updated and updated. It's not a problem now, but in a year or two, there'll be some noticeable drift.

I'll point out that configuration drift will occur regardless of distribution mechanism. If the people want to read the series as an ebook, you should consider doing it. Are you really going to come back and regularly update these "live" articles?

Any chance of getting one of these articles about setting up a mail server?

I'd like to see something about setting up a mail server too, since it was mentioned at the end of the article, though the article does also say its too in depth. If there is too much to it, could someone recommend some good open source/free email servers I could mess with and try to get working? I have looked myself but its hard to see which are well thought of/active/stable/supported in the time I have to research it.

Also, since education was mentioned, I'd love to see an article about getting Moodle working on this setup. I don't know if its possible (its definitely aimed at Apache) and it might be a bit niche, but if you have the time...

I'll bite. Any reason to back up your statement, or should I add you to my ban list for being a troll? Win95 was popular, as I understand it, for being an operating system that was (fairly) compatible with market-leader Win3.1 AND it ran market leader Microsoft Office. PHP's reason for being popular are not, as I understand it, related at all.

In the past PHP had some really badly built features, and to maintain backwards compatibility a lot of them still exist and are just labelled as deprecated on php.net (For example mysql_escape_string, which was replaced by real_mysql_escape_string, but both still exist). It also didn't introduce object orientated code until version 3.0 and namespacing until version 5.0. Lastly, the low barrier for entry makes for a lot of people who aren't familiar with good coding practices being able to get jobs in the industry and build things which work but are a real nightmare to maintain for anyone else working on them

It's improved a lot over the last few years, but it does still have issue with security and bloat (being a scripting language it's difficult to get around this). As a language though it's probably on par with other equivilent languages, it's more the developers working with it who cause problems. Or possibly Bengie is just a Python/Ruby fanboy who likes bitching about how inferior PHP is, that's not uncommon either

How can something be more real time than the real time capabilities of Google docs?

Near-instant refresh, very little lag (obviously bound by the latency of your connections). Etherpad lite is pretty darn close to actual-for-real instant realtime. Plus, changes are automatically annotated and tracked and can be played back and forth. It'd be correct to say that google docs is a text editor with the capability to do real-time document collaboration, whereas EPL is a real-time document collaboration tool. One can do it, the other is built for it.

agrouf wrote:

Dokuwiki (https://www.dokuwiki.org/dokuwiki) doesn't require a database engine and is easier to install. It is on par with MediaWiki feature wise. The advantage of MediaWiki is the better performance for a large number of users but it doesn't make any difference with less than 100 users.

I mention DokuWiki in the article as an alternative. I used DokuWiki for about six months and though I liked it, in the end I switched to MediaWiki because I found my users demanding MediaWiki-like functionality that DokuWiki couldn't provide, particularly with respect to laying out images. I worked with DokuWiki plugins and tried to approximate, but in the end I gave up and installed MediaWiki.

Honestly, DokuWiki could outperform MediaWiki in a single-server setup any day of the week, because it can take advantage of whole static objects being stored in a fast file system cache.

Muzos wrote:

Any chance of getting one of these articles about setting up a mail server?

A few months ago, I would have said to simply use the free version of Google Apps...but that's dead now. The tool to use would be either postfix or something like the free version of Zimbra, but mail from home isn't always a good idea because many mail servers blacklist residential IP blocks to protect themselves from spam zombies. Additionally, a lot of ISPs block outbound SMTP traffic. And postfix isn't the friendliest thing to set up.

In short, probably not, but there are tutorials out there that aren't awful.

undervillain wrote:

I'll point out that configuration drift will occur regardless of distribution mechanism. If the people want to read the series as an ebook, you should consider doing it. Are you really going to come back and regularly update these "live" articles?

Potentially--the series is proving popular enough that it might be a good use of time to sweep through every 6 months or so and ensure the procedures are valid. I do at least have the option that way. Plus, creating an ebook isn't a zero-work proposition--I can't just snap fingers and it's done. We are indeed considering it, though, as I mentioned. Ultimately, that decision is up to Nate and Ken.

You've been downvoted, but it is true. So, so true. Most people are not aware of the depth of PHP's badness. Probably the best article on PHP's awful nature is PHP: a fractal of bad design. It's hard to believe how much poorly implemented and broken crap they've managed to jam into the language and its libraries.

We're talking about it. The difficulty is in keeping things up to date. Some (or perhaps even most) of the config files and options discussed will eventually become outdated as all the different referenced applications get updated and updated and updated. It's not a problem now, but in a year or two, there'll be some noticeable drift.

I'll point out that configuration drift will occur regardless of distribution mechanism. If the people want to read the series as an ebook, you should consider doing it. Are you really going to come back and regularly update these "live" articles?

You've been downvoted, but it is true. So, so true. Most people are not aware of the depth of PHP's badness. Probably the best article on PHP's awful nature is PHP: a fractal of bad design. It's hard to believe how much poorly implemented and broken crap they've managed to jam into the language and its libraries.

The idiot was downvoted because he's a content-free troll.

At any rate, not using something because you don't like the language it's written in is a shitty reason to not use something.

You've been downvoted, but it is true. So, so true. Most people are not aware of the depth of PHP's badness. Probably the best article on PHP's awful nature is PHP: a fractal of bad design. It's hard to believe how much poorly implemented and broken crap they've managed to jam into the language and its libraries.

The idiot was downvoted because he's a content-free troll.

At any rate, not using something because you don't like the language it's written in is a shitty reason to not use something.

I'd just like to underscore how important it is to implement some kind of user account limitations to your wiki (and I'm glad the article covered it)-- The amount and tenacity of signup/edit bots is truly mind-boggling and will make your life miserable if you let them.

I have an obscure, small-traffic mediawiki and that used to get hit all day, erryday. But since I don't expect to get new users without knowing about it, going President of Madagascar and shutting down signups solved that.

Since we have memcached installed already why wasn't it used as the object cache for the wiki? Just wondering....

That is an excellent question. The answer is two-fold: first, because using APC requires less configuration and is easier to implement. Second, not using memcache opens the possibility of using Varnish cache later, which I'll touch on in the closing piece. The config file spot used to specify memcache is the same one used to specify Varnish or another external cache.

However, absolutely nothing is stopping you from using memcache if you'd like--go for it!

-restrict editing rights of regular users as you suggest (unless one has a high volume self-policing wiki and really wants as many users as possible-For bots that use variations of the same username for the spam accounts (e.g. Podpole*, Puwok*), https://www.mediawiki.org/wiki/Extensio ... _Blacklist can be helpful-Install "Kitten auth": http://www.mediawiki.org/wiki/Extension:Asirra. It asks users to pick only pictures of cats out of thumbnails of either cats and dogs. This has been hugely effective and has stopped almost all spam account creation. I did this because I didn't like spam account creation flooding out legitimate entries in the recent changes log.

Second, not using memcache opens the possibility of using Varnish cache later, which I'll touch on in the closing piece. The config file spot used to specify memcache is the same one used to specify Varnish or another external cache.

Nope, memcached takes the place of APC (via $wgMainCacheType) although it doesn't cache opcodes like APC. On the other hand, you can chose Varnish instead of Squid (via $wgUseSquid), as it's more specifically designed as a proxy cache server and easier to set up than Squid.

As far as which configuration to chose, MediaWikirecommends using APC on single-server setups and memcached where you can set it up on a separate, more dedicated server.

Anyway, I'd like to add to this. I run a fairly busy multilanguage MediaWiki site (~6-8GB/day of bandwidth). We've found using nginx instead of Apache to be a huge performance boost, to the point of Apache not even functioning under the same load nginx breezes through without issues. Also we've set up Scribunto to do some pretty crazy Lua-based template programming not otherwise available in MediaWiki markup. Finally, to control spam account creation, we used QuestyCaptcha from ConfirmEdit to do text Q/As at account registration, since image-based Captchas are pretty much useless now. It's completely stopped spam account registrations, for the most part.

Any chance this Web Server Guide will be put into PDF or EPUB format for future reference?

Quote:

We're talking about it. The difficulty is in keeping things up to date. Some (or perhaps even most) of the config files and options discussed will eventually become outdated as all the different referenced applications get updated and updated and updated. It's not a problem now, but in a year or two, there'll be some noticeable drift.

It'd still be great if you guys could have them all in one central location that was available easily! I appreciate the thought towards keeping things up to date though.

Nope, memcached takes the place of APC (via $wgMainCacheType) although it doesn't cache opcodes like APC. On the other hand, you can chose Varnish instead of Squid (via $wgUseSquid), as it's more specifically designed as a proxy cache server and easier to set up than Squid.

And THAT'S what happens when I answer questions from memory in an airport without checking! Thanks for the correction.

Quote:

Finally, to control spam account creation, we used QuestyCaptcha from ConfirmEdit to do text Q/As at account registration, since image-based Captchas are pretty much useless now. It's completely stopped spam account registrations, for the most part.

Finally, to control spam account creation, we used QuestyCaptcha from ConfirmEdit to do text Q/As at account registration, since image-based Captchas are pretty much useless now. It's completely stopped spam account registrations, for the most part.

I'll definitely look into this.

I'll echo that I've seen lots of people have good experiences with QuestyCaptcha. I think it is particularly good if the users of a site would all be expected to know a specialized question. For example, a Simpson's site might ask: "What's the name of Homer's son?"

I edit my www and wiki.conf files mostly by copy/paste so I would be really surprised if it was a typo, but after some time researching it online, and with my limited time using nginx I'm still at a loss.