Caching & ADOdb for Tweekly.fm

The best thing about building a project that gets popular is the same as the worst things about a project that grows rapidly. Tweekly.fm is currently gaining 1,500 users per month most of which decide to publish on a Sunday. This increases the load and run time of the delivery engine that pushes tweets outbound. The other side of an increased user base is that user pages get more popular. Although this is a good thing overall and shows our popularity – it also has its downsides.

February 26, 2010

~

The best thing about building a project that gets popular is the same as the worst things about a project that grows rapidly. Tweekly.fm is currently gaining 1,500 users per month most of which decide to publish on a Sunday. This increases the load and run time of the delivery engine that pushes tweets outbound. The other side of an increased user base is that user pages get more popular. Although this is a good thing overall and shows our popularity – it also has its downsides.

The queries to show simple people and shared by counts are expensive to run. We’re indexing just short of 25,000 user records and running the queries in the original way brought the servers down to a sluggish speed. My response to this was to rewrite the entire user pages and introduce caching on a couple of levels.

The diagram below shows the original connections between our servers and our users. As the diagram shows, there was no caching present through the system. This presented problems under load and ended up with the user pages of the site being temporarily postponed.

After a little research and crunching of numbers I came down to the conclusion that caching points at both the database server and the website (abstraction layer). Implementing the ADOdb layer underneath the existing database links has proven to be a great success. Its boosted response times over 300%. The next diagram shows the new caching points added the the system.

Point A is the abstraction layer at the site level. At peak times, the queries issued here are cached for an hour. This scales back to 30 minutes at low load times. Point B is the connection between the web server and database server. Queries are cached at the database constantly via the MySQL Query Cache. Finally Point C is the connection to the Twitter API. All calls are cached for 30 minutes where possible, but due to the nature of the API and service in general I’m not currently caching too much here. What is not shown on the diagram is the connection to the last.fm API because this data is automatically cached and updated when needed.

Effects of Changes for Users

The effects of these changes will be noticeable by all users because the front-end of the site will be a lot snappier and responses should be quicker too. Hopefully this is just one of many changes that can improve the system for everyone.