Sunday, October 3, 2010

Postgres is a database software that powers a large number of production databases around the world including Indian Kanoon. Postgres 9.0 was recently released after more than a year and half of development. The version number was jumped from 8.4 (the previous release) to 9.0 considering the inclusion of two important features namely Streaming Replication and Hot Standby.

Streaming replication, as the name implies, allows a database administer to replicate a primary database on multiple secondary nodes. So when the primary database dies due to hardware failure or software error, one of the secondary nodes can become the primary database to serve production queries. Previously, to replicate a postgres database, one has to copy the base directoy and then periodically copy the Write-Ahead-Log (WAL) of the database. Then the secondary node would run in a continuous recovery process reading the WAL. As the secondary database was still recovering, it could not be used for querying. As such previous standby nodes were called "Warm-Standby".

The WAL shipping happened at time outs or when a WAL segment was full. And so the secondary database could be significantly behind the master in seeing the WAL. Streaming replication fills that gap now. It does not wait anymore for the filling of WAL segments as it "streams them directly" to the secondary server on a TCP connection. Note that currently streaming replication only allows asynchronous transaction, which means that a database transaction need not wait for the replication to finish. So there is a small window of time in which a transaction committed on primary database is not replicated on the secondary and the primary crashes.

The Hot-Standby feature allows read-only queries on a standby node. Previously, a standby was not good for anything other than waiting for the primary to crash. Now the standby can serve production queries as well while keeping the secondary database up to date with the primary. This is tremendously useful as read-only database queries can be load balanced on primary as well as secondary.

I had lot of fun trying to play around with making these two features work for Indian Kanoon databases. I followed the wiki doggedly for setting up streaming replication. However, unable to replicate a dog, I would miss something or the other and the thing will not turn up. In the end the Indian Kanoon database was successfully replicated in a hot-standby mode on an another server in the same data center. It has to be noted that data transfer with in the data-center was clocking close to half a gigabit per second that reminds me of the old 100Mbps networking cards that are obsolete now.