Thursday, September 25, 2008

I have added a new Javascript function to the Radio NZ site called Oggulate.

This function parses through pages that have Ogg download links and replaces them with a Play/Pause button. The button plays (and pauses) the Ogg file using Firefox 3.1's built in support for Ogg Vorbis.

The Oggulator can be activated by installing a small bookmarklet:

javascript:oggulate();

Clicking on the bookmarklet runs the function and gives you a nice page full of buttons to play Ogg. You will need one of the latest Firefox 3.1 builds for this to work.

The functionality in the script is rudimentary, and 3.1 is still in development, so don't expect it to be perfect. (I notice there is often a delay before playback starts for the first clip you play on a page, and I have had a few crashes and lock-ups).

This feature of the RNZ site is experimental at the moment, so I'm not able to offer support. It is primarily so people can test Firefox.

If anyone wants to add features or improve the function, I'll upload it (after reviewing the changes) so everyone can access them. webmaster at radionz dot co dot nz.

Wednesday, September 10, 2008

We run a master/slave server pair. Each has a web server and database server. The master is accessed via our intranet, and is where we do all our editing and importing of content. This is replicated to the public slave servers.

The slave server has a Squid reverse proxy running in front of it to cushion the site against large peaks in traffic. These peaks occur when our on-air listeners are invited to go to the site to get some piece of information related to the current programme. The cache time in Matrix (and therefore in Squid) is 20 minutes.

The database is replicated with Slony, while the filesystems is syncronised with a custom tool based on rsync.

If we update content it can take up to 20 minutes for that content to show on the public side of the site. This is a problem when we want to do fast updates, especially for news content.

We've looked at a number of solutions, but none quite do what we wanted.

In a stand-alone (un-replicated) system clearing the cache is simple. There is a trigger in Matrix called Set Cache Expiry, that allows you to expire the Matrix cache early. This works OK on a single server system but not if you have a cluster and use Squid. The main issue in that case is that even though the trigger is syncronous, the clearing is not. If Slony has a lot of work to do, there is still a chance that the expiry date has passed before the asset is actually updated on the slave.

A clearing system needs to be 100% predictable, which led me to devise an alternative solution.

We needed to do three things:

a) Determine when changes made on the Master have been replicated to the Slave.

b) Collect ids of assets that are changed and the pages they appear on.

c) Clear the assets collected in b) when we know a).

This is how we do it.

a) There are two queries that can be run on the Master database to get this information:

If you grab the master sequence number after a content change (a database query), you can tell when that change has reached the slave when it's sequence number is the same or greater.

b) We have a script that imports news items to matrix. One of the attributes in the imported data is a list of asset ids that are affected by the import action. We know in advance what asset lists and pages the content will show on.

When the script runs it collects these for each imported asset, and compiles a list of asset ids (with no duplicates).

c) This is how it is bolted together.

After some assets have been imported, the import script calls a second script which adds the items to a queue:

A second script runs on the machine as a worker process, watching the queue. This uses the loop code I outlined in my last post.

The queue itself is Perl's IPC::DirQueue, a very cool module for managing a filesystem-based queue.

When an item is found on the queue it checks the Slony sequence number that was saved with the data. If the number has passed on the slave, then yet another script is run, but this time on the slave (public) server.

This last script resolves the asset id numbers in Matrix to a list of file system cache buckets and URLs. The cache buckets are removed, and the URLs are also cleared from the Squid cache. The cache is them primed with the new page. The script was written by Colin Macdonald.

The script is looping and checking every three seconds for a new job (queued asset ids to clear). The * means it checked for a queued job and none was found. The ? means that a job was found but that the slave had a lower sequence number than the one stored with the job.

The top 5 stories (the ones on the home page) are also cleared and refreshed.

The Flush cache has a URL filter so you can exclude certain URLs from being flushed - an example is the query ridden script kiddie hacks that people try to run against sites. There is no point in re-caching those.

Another is URLs ending in /. In our case this mostly means the someone has deleted the story off the end of the URLs to see what they get, so there is no reason to refresh these either.

A feature that I'm working on will clear just the Matrix and Squid caches for the non-front page stories. These all have a 2 minute expiry time and if we expire all the caches the end user will re-prime the cache. There is no performance hit in doing this as the browser and squid come back for these pages every two minutes anyway.

The system I have just outlined allows us to remotely add and update items in Matrix and for those changes to appear on the site within 5 minutes. I hope someone finds this useful.