2009-09-30

The second part of my tutorial on optimizing your server for hosting high traffic websites: installing and configuring memcached and the memcache php extension for your server. This is a little easier than the first step (setting up nginx as reverse proxy, see article below) and can be applied to any kind of dynamic website.

What's the point ?
On highly dynamic websites such as forums, news sites or any user content based website, the database server load is often very high. The more traffic you get, the more cluttered your database server becomes, sometimes rendering your website completely unavailable to visitors. Using a data caching daemon will allow you to save some data in memory instead of fetching the data from the database every time. You should know that memcached is used by major websites such as Wikipedia, SourceForge, SlashDot... need I say more?

What is memcached ?
Memcached is the daemon running on your server. Its usage is extremely simple, there are no configuration files, all you do is start the daemon on a given port, and your websites will connect to this daemon to store data in memory. Yes, the data is stored in your RAM. So when starting memcached you'll have to decide how much RAM memcached will be allowed to use. If you start memcached with a 1GB memory space, memcached will store this much data; when the cache is full some of the older data will begin to disappear from the cache.

What is memcache ?
Memcache, in our case, is the PHP extension that will allow us to connect to and make use of Memcached. This PHP extension is not part of the default ones so you'll have to download and install it (see step 2). It provides classes as well as functions that I must admit are very easy to use and understandable. In this article, I provide a mysql+memcache wrapper class for anyone to use.

What is the difference between memcached and memcache ?
Well if you've read the two points above, you should already know. In short, memcached is the daemon running on your machine; memcache is the PHP extension allowing you to make use of memcached.

1. Setting up memcached

I haven't found memcached in my repositories (might aswell try # yum install memcached just in case?) so I'll download the source and compile it. First go get the latest version from the official website.

# wget http://memcached.googlecode.com/files/memcached-1.4.1.tar.gz

# tar zxvf memcached-1.4.1.tar.gz

# cd memcached-1.4.1

# ./configure

If like me you get this message "libevent is missing" or something, you can run this command:

# yum install libevent-devel

And then run configure again:

# ./configure

Install memcached: # make install
That's it, you're set! That was pretty easy wasn't it? We'll now see the command line arguments for starting memcached:# memcached -d -m 1024 -l 127.0.0.1 -p 11211 -u nobody
The arguments are:-d : start as daemon, running in the background-m 1024: allow memcached to use up to 1024 MB of RAM (1GB)-l 127.0.0.1: listen on local interface-p 11211: listen on port 11211-u nobody: run as user "nobody"

If you're not sure how much memory you should allocate to memcached, try running this command first:

# free

It will tell you how much free RAM you've got left.
Note that upon starting memcached, if all is OK, you will see no output message. To see if memcached is correctly started, run this command:

2. Setting up memcache PHP extension
The memcache PHP extension should be found in the classic repositories, so try this command:

# yum install php-pecl-memcache

If you're lucky (why should you be unlucky anyway?) the install will work fine and you'll be seeing these messages:Installed: php-pecl-memcache.i386 0:2.2.3-1.el5_2Dependency Installed: php-pear.noarch 1:1.4.9-4.el5.1Complete!
Just for reference, here's a link to the official memcache website, if you need to grab the sources.

Let's see if memcache was installed properly. First restart the httpd: # service httpd restart
Then place a simple php file on your website containing the following code:

phpinfo();

Open the PHP file in your browser (eg. http://mydomain.com/phpinfo.php ) and have a look at the output. If you can find a "memcache" section looking like the following picture, it means memcache was successfully installed.

We will now have a look at the memcache configuration file. First locate your php module configuration files folder, in my case /etc/php.d/ . You should find the newly installed "memcache.ini" configuration file. Open it up to see a list of configuration keys and their meaning.

The default options are just fine, but if you're interested, you should know that memcache offers load-balancing features through the "allow_failover" configuration key. I'm not going to make use of this feature so I will not be editing any of the settings.

3. Using memcache in your code
Unfortunately, installing both components isn't enough. You'll have to edit your code in order to make use of the caching features. Be reassured though, it couldn't be easier! There are a couple of functions you'll need to use, nothing complex.
If you want to find out the complete listing of the memcache php functions, visit the official website. Basically we'll be using 5 methods:
- Memcache::connect($host, $port, $timeout): connect to your daemon
- Memcache::get($key) : fetch data from your cache
- Memcache::set($key, $var, $flag, $expire): store data in your cache
- Memcache::delete($key): remove data from your cache
- Memcache::close(): disconnect.

$mc->set("news_articles", serialize($news_articles), MEMCACHE_COMPRESSED, 60*60*24*7); // store for 7 days, but don't forget to rebuild the cache when a new article is posted!

} else {

$news_articles = unserialize($news_articles);

}

// Display articles..

$mc->close();
As you can see in the example above, I use the "serialize" and "unserialize" php functions. Why is that? The reason is because the Memcache::get() function always returns a string. So if you want to store an array of data (or an object), you'll have to serialize said array, and unserialize it after having read it from the database.
If you know a better workaround for this problem please feel free to leave a comment.

4. Wrapper class for memcache & mysql
I have just written a simple wrapper class for MySQL, making use of the powerful caching system offered by memcache. You can download the class here, I included a simple example for testing the class.
The principle is very simple: when executing a query, the script will check if the query result is already in the cache. If the data is in the cache, it is returned immediatly (no query executed). if the data is not in the cache, the query is executed, and the results are then placed in the cache with the specified "time to live".

2009-09-29

Following my blog article on optimizing your web server by using nginx and memcached, I'll now detail the first step: setting up nginx as reverse proxy on your server. This is going to be a bit tricky, and you'll be getting your hands dirty, so be warned.

What does this consist in?
Well basically, your website will be served by two daemons: nginx for the static content (images, js, css, html...) and Apache for the dynamic content. Nginx will be listening on port 80, will serve static content to visitors, and redirect any dynamic data query to Apache, running on another port -- in our case we'll be using port 8080.

What is nginx?
Nginx is a lightweight open-source http daemon (http server). It is said to be extremely fast, a lot more than Apache, and I have to admit by personal experience this seems to be very true. Using nginx for serving static content dramatically improved the speed of my high traffic website. Actually, some major websites such as Wordpress.com, *cough* Youporn.com, use nginx exclusively for serving web content.

Major issues
This configuration is a bit tricky and can be difficult to achieve particularly if you have numerous domains & subdomains. There are several issues with this configuration:
1) I happen to be using Plesk 9 (admin control panel) for easy domain & subdomain management. Unfortunately it doesn't seem to be compatible with nginx at the moment, it only works with Apache. So we'll run into a few problems very soon.
2) We'll have to work out the configuration files manually (including vhosts - virtual hosts configurations) so be careful about what you're doing or you might run into annoying problems.
3) Plesk rebuilds the virtual hosts configuration files every time you make the slightest change in the web configuration.

Step 1: download and install nginx
nginx requires the PCRE library. If it's not installed on your system, run the following command:

# yum install pcre

# yum install pcre-devel

Try these libraries aswell just in case they aren't on your system already:

# yum install zlib

# yum install openssl

# yum install openssl-devel

# yum install gcc

Visit the official nginx website (nginx.net) and find the latest version. From your shell, run this command:

# wget http://sysoev.ru/nginx/nginx-0.7.62.tar.gz

Unzip the files with the following command

# tar zxvf ./nginx-0.7.62.tar.gz

Change directory to the nginx folder

# cd nginx-0.7.62

Run the following commands:

# ./configure

If you get no errors, you're all set, go on with the next couple of commands. If you get an error, try to make sure all the libraries are installed.

# make

# make install

2. Nginx base configuration
Congrats if you've made it this far! Now let's have a look at the base configuration of nginx. By default, the main configuration file should be found here: /usr/local/nginx/conf/nginx.conf or /etc/nginx/nginx.conf
Open it up and we'll have a look at some of the settings:- worker_processes : the amount of processes that will be ran. In most architectures, 1 process = 1 core; so if you want to fully make use of your multi-core CPU, might aswell use as many processes as your CPU has cores. In my case my CPU's a quad-core, so I'll be using 4 worker processes.- worker_connections : how many connections a process will accepted simultaneously.
For more configuration keys, I suggest having a look here ! Excellent article.
If you can't be arsed, here's the configuration I'll use:- user apache apache; # might aswell use the same user and user group as apache! this will allow nginx to have read permissions on the same files as apache
- #tcp_nopush on -- leave this commented.

There isn't much more to configure here, so we'll start configuring Apache. But before doing so, there's one little additional configuration directive we'll add to nginx.conf, which will make handling virtual hosts a lot easier. In the nginx configuration folder, create a new folder: sites ( /usr/local/nginx/conf/sites/ ).

In the configuration file, below the configuration directives you've put above, insert that new one:

include /usr/local/nginx/conf/sites/*.conf;

3. Configuring Apache
This is where it'll get dirty. If like me you run Plesk, you probably already have some vhost configuration files all over. You'll have to edit these configuration files one by one, after having modified the main conf file.

- Open your main apache configuration file, probably located somewhere like: /etc/httpd/conf/httpd.conf. Find the "Listen" directive at the beginning of your configuration file. It's probably already set to listen on port 80, so change it to port 8080, and add the line below.

Listen 8080

NameVirtualHost X.X.X.X:8080

Replace X.X.X.X by your actual server IP address. Save and close the file.

- Open your vhosts configuration files one by one, we're going to make some changes. If like me you're using Plesk, the config files for each domain should be located here: /var/www/vhosts/mydomain.com/conf/httpd.include
Replace all references of port 80 by 8080.
Example:< VirtualHost 49.32.113.160:80 > => < VirtualHost 49.32.113.160:8080 > ServerName mydomain.com:80 => ServerName mydomain.com:8080
Do the same for all domains and subdomains that use port 80. We can't allow Apache to use port 80 as it'll be used by nginx! No need to edit the 443 references though, we'll still use Apache for all our https content.

- Once you've edited the configuration files of all your websites, reload your httpd service: service httpd restart.
An error may (or may not) appear upon restarting: [warn] VirtualHost 49.32.113.160:8080 overlaps with VirtualHost 49.32.113.160:8080, the first has precedence, perhaps you need a NameVirtualHost directive.
No need to worry, the fix is simple. Pick one of the vhosts configuration file. Find a vhost directive section such as this: , and add below: NameVirtualHost X.X.X.X:8080 where X.X.X.X is your server's IP address. Save the conf file and reload httpd.

If your httpd reloads without warnings or error, you can proceed to the next section. Otherwise, read carefully the steps I've described above to see if you missed anything.
You can test your changes by accessing your website on port 8080, for example: http://www.mydomain.com:8080/ . Your website should load, even though there might be some display errors due to the port change.

Major issue: when you make any change to the web configuration in Plesk, Plesk rebuilds the vhosts configuration files, which means you'll have to make these changes every time you modify the configuration! There may be some way to prevent this, if you know any, please let me know by posting a comment, I'd be very grateful.

4. Configuring nginx as reverse proxy

So far, we've only installed nginx, and made Apache listen on port 8080 instead of 80. If you stop here, everything's pretty much broken. So read on.
The next step is to configure nginx in order to redirect dynamic content requests to Apache, and return them to the user properly. Start by creating a new file in the nginx configuration folder (same folder as your nginx.conf). Name this file proxy.conf. In this file we'll define the proxy settings. I won't detail each of the settings, this would take ages and you might aswell use the settings below as they should be valid for most sites:

Credit: papygeek.com
Paste the above lines in the proxy.conf file you've created. We'll be using this file to configure the proxy options in each of our virtual hosts. That's not all though, there is a problem introduced by the proxification of our architecture: how is Apache going to know the real client's IP address? Since nginx will forward the http requests, Apache will be receiving the nginx IP address, in other words, your local IP (your server's IP address). In order to fix this problem, an apache module was created: mod_rpaf.

Begin by installing said module:

# wget http://stderr.net/apache/rpaf/download/mod_rpaf-0.6.tar.gz

# tar zxvf mod_rpaf-0.6.tar.gz

# cd mod_rpaf-0.6

# make rpaf-2.0 && make install-2.0
If you run apache2, replace "apxs" by "apxs2" in the command below. If apxs/2 isn't installed on your machine, run this command first: yum install httpd-devel

# apxs -i -c -n mod_rpaf-2.0.so mod_rpaf-2.0.c
# a2enmod rpaf

# service httpd restart

5. Configuring the virtual hosts

Let's now see the final part of this tutorial: configuring the virtual hosts for nginx. First go to your "sites" folder, which you created in the nginx configuration folder (default should be /usr/local/nginx/conf/sites/ ). We'll do this the clean way: for each domain hosted on your machine, create a new .conf file.

Here is the configuration for the "mydomain.com" domain (and thus the content of your mydomain.conf) :

Feel free to copy the subdomain section as many times as you have subdomains.
Try your nginx configuration by running the following command:# /usr/local/nginx/sbin/nginx -t
You should be receiving this message, provided you've done it correctly: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
configuration file /usr/local/nginx/conf/nginx.conf test is successful

6. Setting nginx as service & startup
I've written an init.d script for you to use. Nearly all the credit goes to Slicehost for writing the original Ubuntu one; the one I wrote is for CentOS, although it should work for other systems: download here. Unzip it and place it in your /etc/rc/init.d/ folder. If you have placed the nginx binary in a different folder, you'll have to open up the script and change the $DAEMON path. Give it execute permissions ( chmod +x /etc/rc/init.d/nginx ), after which you'll be able to use the following commands:
Starting the server: # service nginx start
Stopping the server: # service nginx stop
Reloading the configuration: # service nginx reload
Restarting the server: # service nginx restart

If you wish, you can add the "service nginx start" command in the /etc/rc.local file, this will allow you service to be ran on startup.

Well, I guess that's about it!
The next article will deal with memcached, so stay tuned!
Clem

2009-09-26

Been a while!But I'll make up for the huge time gap: this post will probably be one of the most useful I'll ever post.

I happen to be running a high traffic website, have been running it for about 5 years now. Over the past few years though, my website has known a major traffic increase which resulted in my servers being regularly cluttered and my website inaccessible. My website profile: an Invision Powerd Board based website (heavily modded though), running under PHP 5 and MySQL 5. Servers are hosted in France at OVH.com.At first, my reasoning was quite simple: spend more money on a more powerful server. I ran about 5 or 6 server upgrades over the years. I must say it worked at first, since I was running low-end servers. But for the last couple of months the traffic became way too high, which resulted in my website being completely inaccessible for a part of the world (for visitors in remote countries such as Canada, connections frequently timed out) and just plain slow for everyone else. At that time the traffic was: nearly 60K unique visitors/day, about 10 million visits/month.My server setup: a quad-core with 8GB of ram for Apache, and a quad-core with 4GB of ram for MySQL, both using SATA2 RAID0 HDDs. Connected to eachother with a 1 Gb link.

Well, I've finally settled for a solution that seems to be working great. The website's fast for everyone, even for me over there in China.

1. Running PHP with Fast-CGI

My website is a community based website, which means the site is strongly dynamic. Every page served is PHP, strictly no HTML. The "default" option for serving PHP is to use PHP as an Apache module. The problem with this solution is that for every page served, a new Apache/httpd process has to be loaded in memory. With high traffic website this isn't necessarily a good solution especially if your server doesn't have much RAM.So the first thing I did was to switch PHP from Apache module to Fast-CGI module.

Apache will be serving dynamic content via PHP as FastCGI module. But on top of Apache, we'll be using another webserver, a very lightweight one, for serving static content. Basically, this means php pages will be served by Apache, but other content (images, javascript, css, static html...) is served by nginx, which is extremely fast and reliable for such things.How to put such a set up in production? Simple as that: nginx on port 80, Apache on port 8080 (or another) and nginx is configured to redirect all dynamic content to Apache. It's called "using nginx as reverse-proxy".

I had a bit of trouble figuring that one out as I couldn't really find any article explaining the difference between memcache and memcached. So here's the deal.- Basically "memcached" (note the trailing "d", which stands for "daemon") is a process that runs on your machine and that allows you to easily cache data in memory--RAM. It's basically a simple and efficient cache manager. It listens on a given port and you can connect to it via...- memcache: this is the name of the PHP module that allows you to make use of memcached. You're going to need to install this because it doesn't come with PHP! Memcached and memcache can be found in the usual repositories (eg. rpmforge)

Installing and configuring both isn't the only thing you have to do. You're going to have to make use of the memcache PHP module functions. That's the trick! But I'll guide you through it.

Here are a couple of handy functions:memcache_connect ($host, $port, $timeout) : connects to the memcache server you've set up on your machine.memcache_get ($key) : gets a string from the cache. Returns null if the string was not found.memcache_set ($key, $data, $flag, $ttl) : save a string into the cache.memcache_delete ($key) : deletes the string from the cache

Now how to use these functions: this couldn't be any simpler.- Begin by connecting to the memcached server using memcache_connect()- Before running any SQL query, ask yourself: can this query be cached? In theory, most queries can be. In my case, I used the memcache functions to optimize my portal page (index.php) which is basically a simple news article display. In other words, the content almost never changes, so this kind of query can definitely be cached.Here is a simple code example:

// Save the data into the cache. // Since memcache_get can only return a string, you'll have to serialize the data before saving it into the cache. // The data is saved for 1 week as defined with the last parameter. memcache_set( "news_articles", serialize($articles), MEMCACHE_COMPRESSED, 60*60*24*7);}// $articles is not nullelse { $articles = unserialize($articles); // unserialize the data}// You now have a fully loaded $articles array, ready for display!

Conclusion

I managed to reduce the amount of SQL queries of my main page from an average of 15 to... 5. Every single page of my website loads nearly instantly even during high influx of visitors. Lately I had about 3000 users online simultaneously, and I didn't notice any slowdowns.So I can safely say that these 3 points described above actually solved all my issues.