Beetle2010-07-23T06:50:24-07:00http://xing.github.com/beetle/Xing Engineering Dudesopensource@lists.xing.comAutomated Redis Failover2010-07-24T00:00:00-07:00http://xing.github.com/beetle/2010/07/24/redis-failover<p>When we released Beetle back in April, we already knew that the <a href='http://code.google.com/p/redis/'>Redis</a> server, which is the key component for deduplicating redundant messages, was a single point of failure in our system. However, we also trusted it to run for a few months without crashing, giving us enough time to develop a system for switching Redis servers automatically. And we were already able to switch manually, by just shutting down the the master server and promoting the slave to a master role. We made use of this mechanism to upgrade our Redis servers without any effect on system availability.</p>
<p>We&#8217;re now happy to announce a new release of beetle (0.2.1). This release adds the possibility to configure the system so that it performs a deduplication store master switch automatically, should the currently active Redis master become unavailable due to a hardware or network failure. When the previous master server becomes available again, it is automatically reconfigured as a slave of the new master. This effectively turns beetle into a 24/7 messaging solution, which can be run without operator intervention (except for replacing failed nodes, upgrading software components, or repairing a partitioned network).</p>
<h3 id='motivation'>Motivation</h3>
<p>Because the deduplication store it is such a critical piece in our messaging infrastructure, it is essential that a failure of this service is as unlikely as possible. As our AMQP workers are working in a highly distributed manner, all accessing the same Redis server, automatic failover to another deduplication server has to be very defensive and ensure that every worker in the system will switch to the new server, or none switches. If the new server isn&#8217;t accepted by every worker, no switch should be performed. This ensures that even in the case of a partitioned network it is impossible that two different workers use two different Redis servers for message deduplication, thereby avoiding data inconsistency in the deduplication store.</p>
<p>Note that this places our solution into the CA space of the <a href='http://www.julianbrowne.com/article/viewer/brewers-cap-theorem'>CAP theorem</a>.</p>
<h3 id='feature_summary'>Feature Summary</h3>
<ul>
<li>automatic switch in case of redis master failure (duh)</li>
<li>tolerate single machine failures without problems</li>
<li>switch doesn&#8217;t cause inconsistent data on the redis servers</li>
<li>switch is only performed if all worker machines agree</li>
<li>opt-in, only use the redis failover solution if you need it</li>
</ul>
<h3 id='how_it_works'>How it works</h3>
<p>On each machine which runs message processors for redundant messages (and therefore needs access to the deduplication store), we store the currently configured redis master in a non volatile memory location. Currently this is just a file on disk (the redis master file), but this could be changed if we experience performance problems with this approach. The Beetle client library uses the information stored in the file to connect to the configured redis master and checks the modification time of this file for each redis operation to reconfigure itself if a master switch has occurred. This is done in such a way that no redis operation fails, provided the master switch happens within a configurable time interval.</p>
<p>The contents of the file is maintained by a small daemon process (RCC - Redis Configuration Client), which communicates with a master configuration process (RCS - Redis Configuration Server), running on an arbitrary machine in the network. The RCS and the RCCs communicate via non-redundant Beetle messages, so they don&#8217;t need to know each others addresses. However, since we only want to switch the redis master if all worker machines are willing to do so, the RCS needs to know how many RCCs exist and the RCCs need to be distinguishable. This is achieved by providing the RCS with a list of known RCC names (they default to the hostname the RCCs are running on).</p>
<p>The Redis Configuration Server constantly checks the availability and configuration of the currently configured Redis master server. If it detects that the Redis master is no longer available, it selects one of the configured (and available) slaves to become the new Redis master and initiates a master reconfiguration process.</p>
<p>The reconfiguration process is started by sending a &#8220;ping&#8221; message to see if all known RCCs are still online. As soon as all RCCs have answered the ping message, the RCS sends out an &#8220;invalidate&#8221; message, asking the RCCs to invalidate the current master. Upon receipt of the &#8220;invalidate&#8221; message, the RCCs delete the contents of the redis master file on the machine they&#8217;re running on, which in turn will signal the workers running on that machine that a redis master reconfiguration is in progress and that they should wait for the result.</p>
<p>Immediately after the file contents has been cleared, the RCCs respond to the invalidate message by sending an &#8220;invalidated&#8221; message back to the RCS. When all RCCs have responded, the RCS knows for sure that it is safe to switch the Redis master. It then performs the master switch by sending a &#8220;SLAVEOF no one&#8221; command to the selected slave and a &#8220;reconfigure&#8221; message with the new Redis master to the RCCs, which then update the redis master file their machines, enabling the messaging workers to proceed with any pending redis operations. master switch by turning the selected slave into a master and sending a &#8220;reconfigure&#8221; message with the new Redis master to the RCCs, which then update their redis master file, enabling the messaging workers to proceed with any pending redis operations.</p>
<p>If one of the above mentioned steps fail, the RCS proceeds by starting a new reconfiguration round, sending out a system failure notification message using Beetle. You can subscribe to these system failure notification messages with a custom worker, e.g. to send out emails to your operators.</p>
<p>Additional information on the failover mechanism can be found in the <a href='/beetle/rdoc/index.html'>ruby documentation</a>.</p>
<h3 id='implementation_status'>Implementation Status</h3>
<p>The new Beetle version has been running in production for a few weeks now. A lot of effort has been put into testing the failover mechanism. We have built cucumber tests for various failure scenarios and we also have C0 test coverage. Failover has been tested using normal Redis server shutdowns and also by disabling network cards on the Redis master in our production system.</p>
<p>So far the failover system has worked flawlessly and we are confident it will continue to do so. We are very happy with it. If you have used Beetle to build your messaging system, we strongly suggest you upgrade to the new version.</p>
<p>By the way, we would love to here from you if you&#8217;re using Beetle in a production system.</p>Introducing Beetle2010-04-14T00:00:00-07:00http://xing.github.com/beetle/2010/04/14/introducing-beetle<p>At the end of of 2009 it became clear that the messaging system we&#8217;ve been using had to be replaced with something better. We&#8217;ve had problems with messages getting stuck for a week inside the message broker sometimes and on one occasion we even needed to repair it by deleting the message store completely. This was unacceptable and we started looking around for alternatives.</p>
<p>In the end, we decided that our new system should be based on a broker implementing the AMQP protocol, for a number of reasons:</p>
<ul>
<li>Several broker implementations available</li>
<li>Messaging system configuration is done using the protocol itself</li>
<li>Excellent Ruby client library support</li>
</ul>
<p>We chose the <a href='http://www.rabbitmq.com/'>RabbitMQ</a> broker because of its good reputation in the Ruby community and the fact that all Ruby client libraries had been developed and tested against it. And we had already used it to build centralized Rails application logging, which needs to handle a much higher load than what we needed for our application messaging.</p>
<p>For our new messaging system, we had the following design goals:</p>
<ul>
<li>Important messages should not be lost if one of the message brokers dies due to a hardware crash</li>
<li>It should be possible to upgrade broker software/hardware without system downtime</li>
<li>It should be scalable</li>
<li>Using the system should not require our appplication developers to be AMQP experts</li>
</ul>
<p>RabbitMQ provides high availability through clustering of several RabitMQ nodes. Each of the nodes has a replica of the complete messaging configuration and the cluster keeps running even if one of the nodes goes down.</p>
<p>However, in the current implementation of RabbitMQ, message queues are not replicated between the nodes of the cluster: only one node holds any given message queue. Which means that if a node dies irrecoverably, due to a hardware crash for example, all the messages stored in the queues on this node will be lost forever. For some of the messages we send around in our application, this is unacceptable.</p>
<p>The obvious solution to this problem is to store a copy of each queue on two broker instances and have the message consumer subscribe to both queues and discard duplicate messages.</p>
<p>This sounds relatively straightforward, but an actual implementation of the deduplification logic needs to address some nontrivial issues:</p>
<ul>
<li>it should work reliably across process/machine boundaries</li>
<li>it should guarantee atomicity for message handler execution</li>
<li>it should provide protection against misbehaving message handlers</li>
<li>it should provide a way to restart failed handlers without entering an endless loop</li>
</ul>
<p>These requirements led us to try out a persistent key value store to hold the information we need on the consumer side: how often a message has been seen, how often we have tried running a message handler but failed, how long we should wait before retrying and whether some other process has already started a handler for the message (execution mutex).</p>
<p>Our current implementation uses <a href='http://code.google.com/p/redis/'>Redis</a>, as it seemed to be the simplest key value store out there, requiring the least amount of configuration with no additional software components necessary to get it up and running. However, we use only a small subset of Redis&#8217; functionality, so we&#8217;re by no means bound to it.</p>
<p>Now, after several weeks of design and implementation work, we have ported all our old message processing code to the new infrastructure. So far everything went really smooth and we haven&#8217;t had a single problem (which is what we expected :-). Which means we have finally reached a state where we can publish our work as a ruby gem.</p>
<p>So &#8230; if you&#8217;re looking for a messaging system with high availability and reliability, we think you should give Beetle a try.</p>
<p>Have fun!</p>