Ramblings of a techie

Archive for June, 2005

After my clustering experiments which you read about in my last post we were capable of handling more than a 1000 concurrent visitors. We will see if the current growth can be managed by this clustering on the 4 systems currently running the site. It is clear to me now that the bottleneck was actually the MySQL, not the web part. Switching off the second webserver makes no difference, while switching off the second MySQL will completely stop the site from responding. So I removed the one webserver and the site is running on 3 webservers now.

The MySql installation has been tweaked for absolute performance with all the tricks in the books, but it cannot do more than it it doing now on that hardware. I need to figure out a way to see, in the next concurrent userbarrier, which part of the system needs improvement. I will think about it; if anyone has an idea; let’s have it (put a comment please).

My next projects are;

– make a seperate site for chat and use that instead of the current chat (dunno which chat yet; have to find a nice one still)– make a seperate site for the forum and use that instead of the current forum (probably phpbb) – making a MySQL/PHP cache for the database queries on the current site– fixing the master-slave replication so it won’t fail if the master fails

I am running different sites on a lot of different hardware configurations. When a site becomes too slow you have different options:

buy a new,faster server

buy more servers

fix the software (optimize; better/more cache)

Usually we host sites built on Java/J2EE platform; these sites are made for medium size to (very) big companies and are all built to last and the be scalable. Clustering/failover/loadbalancing are options which are built into these systems by default. The main products my employer sells support clustering out-of-the-box.

Now enter my hobby projects, all of them in PHP and written in a few weekends and nights. One of them got a bit out of hand. When there were 75-100 people concurrent, my single CPU, cheapo hardware took 20 secs to render an average page. The entire site was not cached, it was loaded with features (the most feature-rich dating platform in the world, I dare to say) and written badly.

This site is running on very cheap (< Euro 300) hardware (Celeron 2,4 with 512 mb mem and 2 * 40 gb IDE hds); these boxes are homemade with the cheapest material money can buy. It is hardly possible to do it cheaper than that. Why did I choose this? Because it is actually a free (as in beer) site. So I don't want to spend a lot of money on hardware/hosting/anything.
I first fixed the code to be slightly more efficient and started, at the same time, to build in cache. You must know; datingsite have caching ‘challenges’, because users are continuesly doing searches and changing stuff while being logged-on. So I had to make cache-per-person for most of the pages. After these changes the site could hold out, on the same hardware, to 400-500 concurrent users.
After a while, the amount of concurrent users per day began to go over 500 and after a while, even over 600. The pages, at night, took 20 sec again to load 🙁
The site was completely not prepared for loadbalancing, in any way. So the next step I took was seperating the webpart and the database part and put them on different servers. You have to know that the webpart, with all user photos weighs about 4 gigabytes currently (with more than 30.000 members) and the database is 3 gigabytes. I seperated it on 2 servers with the same homemade crappy hardware described above.This trick was great; the site could handle up to 700-800 users with ease suddenly.

But ofcourse, it was still growing and the 800 concurrent level was made 4 weeks ago. More users made the site slower and slower and slower and…….

The problem seemed to be high traffic on both the webfrontend and the database. The webfrontend could easily be fixed; I would add another box (or even more) and DNS them to be called www1, www2, www3 etc. Then I would have the index on the main domain throw a user to one of those boxes randomly:

This is the real lame-brain loadbalancing; better loadbalancing would be, ofcourse, using a hardware loadbalancer or a BIND DNS round-robin loadbalancer. But for hardware I don’t have the money and for software I would need the domain to be managed by my DNS and it isn’t and the provider I have it at doesn’t know what loadbalancing is. So I will move, but for now, this works fine.

This helped a bit, but ofcourse overloaded the MySQL database servicing all these frontends…

Balancing MySQL is a bit harder. It is actually much harder. You need to be capable of replicating the MySQL to different servers and this is difficult; MySQL only support master-slave replication which is quite lame; you cannot write to the slaves. Also, you need to put on Logs for this replication which take I/O power and fill up your harddisk space fast.

So I decided to do this differently; I added another box (running 4 boxes now) and rewrote the software by changing all references to mysql_query to _mysql_query. I tested using:

This way it will do all writes to all slaves and the master, and it will alternate reads between the master and the slave to some given random %.Flaws: I will fix this function to round robin between X slaves and I will fix it to fail a slave (after which you have to re-make it with a DB dump from the master. For this last problem I will also figure something out I hope.

The current system is capable of packing a much larger punch and it is actually quite scalable, cheap and very easy to make and setup, even in environments you don’t have control over the MySQL and Apache installs.

I am webmaster of a free dating site for Dutch people. After passing the 30000 members, we started to experience problems with the session handling. People started to receive the same session ids. Ofcourse this is not very nice for a dating site 🙂After searching a bit on Google, I found people with similar problems. They ‘fixed’ it by adding entropy file = /dev/urandom and entropy length = 64. Problem was, I already had this set for my site… So that didn’t work.To fix it, I simply created the following code;

were gen_rand generates 35 random letters and numbers. Replace all your session_start() with _session_start() and all will be fine.

On a datingsite this works fine; no big companies behind one IP address. You can use more eleborate schemes for creating this session id. For instance, adding a second random cookie on the client will actually make the chance of duplicate ids very very small.