Tag Archives: performance

Just a quick post on why sharding makes large WordPress installs happy with a real life example and how seeing the code makes debugging and lifting a service out of a downtime much easier.

We sharded the db of UBC Blogs about a year into the project in 2010 after reading about potential bottle necks a large WPMU install can hit and generally just makes the db happy. Why this makes a DB happy are situations like what happened last week. We had a plugin on UBC Blogs running happily since May with no problems churning out xAPI statements by the thousand . We needed to move this to another WordPress multi site install with far fewer sites and users as UBC Blogs (still a fair amount 2000+ sites) but not partitioned / sharded. It ran for a week no problems noticed… once the students arrived though bam the CPU was devoured on the database server quickly watching the idle drop to 20% it was clear a crash was going to occur.

Thoughts were initially
1) Did the devs push anything new? confirmed that was a no.
2) Was it just more traffic? Looking at the analytics it was not up near a peak that we handled no problem in the past?

Thinking it might have been more traffic and some bad code we asked for more CPU / RAM from IT which they delivered. That brought the idle up but not out of the woods still around 25% and eventually tanked briefly.

Quickly started digging through the web logs nothing out of the ordinary then onto the db logs no MySQL errors. Then moved onto the db console using the old SHOW PROCESSLIST; It was revealed a nasty SHOW TABLES LIKE query was running often which is absolutely devastating on a multi site install with an un partitioned database, this query had to traverse 50000+ tables just running the query raw took 4 seconds to execute. On UBC Blogs because the db was partitioned and the table is located in a global db this query was pretty much instant zero performance impact. Once realizing this was the issue quickly asked Richard to kill the plugin that was doing this and we were back to normal. The plugin fix was very easy since the plugin only had to run that query once on installation Richard patched and put it back into production quickly.

Take aways?

Partitioning is good.

Having an application performance monitor would be best to locate bad code / issues more efficiently (time for New Relic?) .

Having the ability to see fix and re-deploy code quickly in an open source enviro makes getting out of bad situations manageable in some instances relatively painless (I actually have no idea how we would get out of this if it was a black box closed source app call the vendor???) .

We were throttled on the UBC CMS with kick back traffic from Connect going down. Always impressed with how well our pretty basic setup can handle it. Properly tuned PHP opcode cache with a content cache still goes along way 🙂

The global wp_usermeta table in WordPress which is huge on a site like UBC Blogs is constantly being updated so the Qcache would constantly be invalidated. Wondering if we would be better off with it off? Does invalidation cost more than the savings… deep thoughts for a DBA…

Pretty sure if we committed to implementing Memcache for object cache we would would see gains.

WordPress is great when used as a general CMS/Blog platform it can scale quite easily using some basic caching methods, but when you start using it as an app with a fairly large number of users logged in like in a live classroom backchannel/response system scenario some performance issues start to pop up.

We are going through all the components and trying to see how we can improve performance. One is moving away from Apache and switching to Nginx. I did a very quick test comparing a tuned Apache with PHP and APC enabled vs a Nginx PHP-FPM APC install results are below. Should note I did not do any tuning with Nginx (will be the next test) but as you can see Nginx performs much better under high load.

Most people go with the default Apache config and hope for the best. Generally this can be ok if the traffic is light but once you start having more concurrent connections this may not work out for you. I am not going to go into details on tuning (there are a ton of posts on it) but this is a graph of a before and after of apache under load after merely removing excess apache modules and tweaking the KeepAlive settings.