User:HighInBC/Integration with S3

The purpose of this article is to describe how to migrate your existing mediawiki setup so that it serves its files from the Amazon Simple Storage Solution(called S3 from here on out).

This tutorial is assuming that your mediawiki is installed on a Debian Lenny server and that you have root shell access to that server. The ideas put forth here should work fine on any Linux/Apache based mediawiki setup, though the details may be different. If you do not have root shell access to your server then this technique may not be workable for you.

Unless otherwise stated all of these commands are ran as root.

I use the editor "joe" in this tutorial, you can use any text editor you prefer. If you don't have joe and want to use it you just do:

Go to the Amazon S3 sign up page and click "Sign up for Amazon S3". Follow the instructions. S3 is a pay service and does require a credit card, it may take a day or so for them to activate the account. This setup only needs to be done once.

Once your S3 account is setup log in and under the "Your Account" menu select "Security Credentials". On that page you should see a table showing your current "Access Key ID" and "Secret Access Key". You will need to click the "Show" button to reveal your secret key. Record both of these keys into a text editor to use later, be sure to keep track of which one is which.

These are the keys that allow you to access your file storage area, keep them secret or people can abuse your account and create charges that you will be responsible for.

Give it your access key and secret key when it asks you for them. Just hit enter for the other questions to use the default settings

Now that you can access your s3 account with the s3cmd tool you can create a bucket:

s3cmd mb s3://static.mydomain.com

Where "mydomain.com" is a domain that you have DNS control over.

Set the CNAME of static.mydomain.com to:

static.mydomain.com.s3.amazonaws.com.

Notice the "." at the end. If you do not know how to create a subdomain and set its CNAME record then you need to ask your hosting provider or just google for the information as this goes beyond the scope of this article.

I don't know if this is a mediawiki bug or an s3fs bug, however when I first tried this it worked great except when creating small sized images. When

I rendered an image small it would say "Thumbnail creation error: ". I added the following line to me LocalSettings.php file:

$wgDebugLogFile = "/tmp/wiki.log";

I looked at the debug output and found the line that was complaining about the failure, it said: "Removing bad 0-byte thumbnail". I ran the same imagemagick command that mediawiki showed in the log file and I saw that it worked just fine. I searched code for the phrase "Removing bad" and found this:

It appears after it created the thumbnail it looked at it and saw it was 0 bytes long. I am assuming this is caused by the latency that s3fs can experience. I do have the local cache option on so I don't know why s3fs would show it as 0 byte. This does not happen on larger files.

My solution was to bypass the subroutine altogether. The result of this could be that a real 0 byte file is not deleted, I can live with that.

I really wanted to implement this s3 integration without performing code changes to mediawiki itself, but to make it reliable I had to.

Just find the file "/var/lib/mediawiki/includes/media/Generic.php" and replace the above section with(in my copy it starts on line 260):

Look at your wiki, you should be able to see all of the images you had before. You should be able to create thumbnails from those images in different sizes. You should be able to upload new images and render them at different sizes. Test making a small image, about 25px.

Now right click on an image and click "properties". Look at the image url, it should be pointing at your mediawiki server. Now put that url in a new window and load it, you should see the url switch to your s3 domain and the image should be speedily presented to you. You mediawiki server apache logs should record the request.

If everything works well and you feel it is safe, you may now delete the "images.bak" folder to recover your disk space.

If you truly want to store more media than you have available space then some sort of tool needs to be made to remove files from the /s3fscache path when disk space becomes low. As it is the cache will hold a copy of every file that is sent to S3 and increase in size until the disk space is used. Any file in the local cache can be safely deleted at any time, if s3fs needs the file(for the creation of a new sized thumbnail for example) it will download it from S3 again.

If you create a good solution for this please let me know.

Direct upload

With this current setup when one uploads a file to the mediawiki it will go to the mediawiki server which will stream the contents of the file to the S3 bucket. When the download is complete it is in S3, but it does use your server's bandwidth.

It is possible to craft a special upload form that points to the S3 bucket directly and contains a signature. This signature authorises a user to send a file with a very specific file name, and under a maximum size directly to your S3 bucket. The mediawiki server would then see the file in its s3fs folder and can work on creating thumbnails.

The mediawiki server would still need to download the image from S3FS at least once for the creation of thumbnails, but it would not have to receive it once and then send it.

I have not delved into this at it almost certainly involved significantly modifying the mediawiki source code. Specifically it is essential that the file name is known at the time that the form is created, so it may have to be a 2 stage process instead of the current 1 stage process.

Two changes to the process to use the latest version (s3fs-1.61):
1) Before MAKE, you have to run CONFIGURE (./configure)
2) Both direct MOUNT and /etc/fstab only want the BUCKET NAME, not the CNAME:
(Otherwise you will get an "is not part of the service specified by the credentials" error message.)
$ /usr/bin/s3fs mydomain-static /var/www/test/images
$ umount /var/www/test/images
$
$ cp /etc/fstab /etc/fstab.bak ; vi /etc/fstab
# Mounting S3 on /var/www/test/images
s3fs#mydomain-static /var/www/test/images fuse allow_other,default_acl=public-read,use_cache=/s3fscache,retries=5 0 0
--Standsure Wikimaster 16:56, 19 October 2011 (UTC)