Wednesday, June 25, 2014

This book is an excellent book for those who is thinking about integrating their software architect with RabbitMQ. The book is using an Application Inbox to step by step walk you through different technical decision to design and build an Inbox using RabbitMQ. Well, I personally wouldn't build an Inbox application using RabbitMQ because I think RabbitMQ is not a database, messages suppose to be consumed as fast as possible and shouldn't be queued up. Another debate is that it would require you to have each queue for each users which is not ideal to me. The book mentioned we can create thousand of queues without any problem, I wouldn't say yes or say no to that claim because I've never tried it but I wouldn't try to do that to my solution if I have other approach. I've been experimenting enough problem from RabbitMQ cluster network partitioning that made me to rebuild the cluster from scratch. Having thousand queues to backup and restore when disaster happen is possibly a pain. However, I think the application example in this book is fine and a perfect example to help someone who has never used RabbitMQ to look into messaging world. I strongly think you would learn all the skills you need to work with RabbitMQ from basic knowledge such as Exchange, Queue and Binding to setting up a HA cluster or using federation.

In some beginning chapters, It was very interesting to know why RabbitMQ hasn't supported latest AMQP standard and I completely agree with that. I would expect to read more about RabbitMQ ops in later chapters such as trouble shooting network partition, or ideal cluster set-up depend on hosting environments but It's not there in the book. Anyway, here is a brief table of contents:

Chapter 1. A Rabbit Spring to Life

Chapter 2. Creating an Application Inbox

Chapter 3. Switching to Server-push

Chapter 4. Handling Application Logs

Chapter 5. Tweaking Message Delivery

Chapter 6. Smart Message Routing

Chapter 7. Taking RabbitMQ to Production

Chapter 8. Testing and Tracing Applications

In summary, If you are about to learn RabbitMQ, you should read it. It's short enough for you to read on the train but still cover all important aspect. If you are using RabbitMQ but still want to reinforce your knowledge about RabbitMQ, you should also read it. The only thing I think the book doesn't have is more real-life production experiences from people. But I'm sure we can always find it from online community.

We have a RabbitMQ cluster of 3 nodes hosting with Storm. Everything has been very smooth for a years without a problem until Storm upgraded their switches recently, twice actually. Unfortunately, both times they caused network partition problem to our cluster even though Storm said that the upgrade wouldn't affect their customers, just a little bit latency between servers :).

Anyway, our mode in the cluster is pause_minority which based on an incorrect assumption. It worked great when a single node crashed for some reason but it's absolutely a bad choice in network partition. In this case, all these 3 servers couldn't communicate with any of the other 2 so they all thought they were minority and paused. As a result, the whole cluster stopped working even after the server could ping each others again.

So what did I do? It was a little panic actually because every second we don't have the queue, we lost real time data for that moment. I tried to rabbitmqctl stop_app, none of these servers responded immediately. So I tried to CTRL C and rabbitmqctl start_app again, same thing, it didn't seem to response. I thought the backlog on the queue/disk needed time to start the app. Fortunately, I had a backup json file of the whole cluster so I decided to reset everything and rebuild the cluster from scratch accepting that we would lost all the backlog existing in those queues.

With this approach, I had to kill the beam.smp process, delete all files/folders in mnesia directory, reboot the nodes and restored the backup file. That was cool and whole cluster configuration was load successfully and we had the cluster back. It took me a few hours to make this happen, we lost the real time data and the backlog but we had a way to get those data back from other sources so It was just some more hours for us to get the system back to normal.

One thing to note here about the second network partition, those servers recovered quickly after the switch upgrade but they seemed to lost all the backlog on their queues, apparently they were all in pause_minotiry mode but I would expect they wouldn't lost data on the queue. I was pretty sure those data was published to the queue with persistence mode 1 which made the servers write the data to disk. I guess RabbitMQ was right when saying that "Other undefined and weird behaviour may occur" during a network partition.

After this disaster, I've changed our cluster mode to autoheal and wait to see how it would cope with the next network partition. With this setting, we can have 2 nodes cluster and we could have 2 nodes running separately after the next network partition but at least we wouldn't lost the backlog. I've been thinking about another approach using federation which possibly a better approach for us in such not so reliable network situation where the second nodes will have everything published to the first node via a upstream. Anyway, we'll wait and see.

Monday, June 23, 2014

In RabbitMQ, messages are published to the Exchange. However, subscription will be made against queues. Binding is the key to make the right messages to go to the expected queue.
In this post, we will setup an exchange to filter messages that have ip = 1"0.10.0.1" in the header go to an expected queue.
This will involve following steps:

- Create an exchange type Header

- Create a queue

- Bind the queue to the exchange with following arguments

ip = "10.10.0.1"
x-match = "all"

- Publish messages to the exchange with the ip in the header

- Subscribe the the queue and process such messages

RabbitMQ header exchange is capable of filtering multiple header value by allowing us to set x-match = all or any. And using a combination of the bindings will facilitate any complex routing logics.
In this post, we are filtering only the ip in the header so x-match = all or any doesn't matter.
Creating exchange, queue and binding can be done manually. However, below is the code to do those using Burrow.NET library:

Assume that the exchange name is : Burrow.Demo.E.Headers
the queue name is : Burrow.Demo.Q.Message
and the message to use is : IpMessage

Normally, an application should have an implementation of IRouteFinder, I use ConstantRouteFinder in this example for simple demonstration. After executing the above code, if you check the created queue or exchange,
you should see the expected binding.
Note: I set MessageTimeToLive = 100 seconds, it's optional.

Summary, I normally make a wrapper class on top of the publishing code to set the required header value to the dictionary from the message fields. We should have in mind fields in the header we're going to use to create the correct binding. However, if later in your project, a new field is added and it's a criteria to put in the header for routing messages, a new binding including that field should be added and we can simply delete the old binding.

Thursday, July 18, 2013

- I used to do the hard way which is creating another queue which is bound to the same exchange with same parameters, routing keys. Then consume from that queue and count the message in 5 minutes, 15 minutes, 1 hour etc.

- I've just got another idea which is more simple. The solution is similar to the first approach, but you apply "Message TTL" parameter when creating the queue. So you will have to create 3 queues with Message TTL to 5 minutes, 15 minutes, 1 hour. At any point, the total messages in each queue is the actual number you want. And with latest Burrow.NET (From 1.0.17) you can actually count the message in the queue this way:

Monday, October 15, 2012

- This post is an update of my previous post. Basically, it's the same approach of 90% similar except I won't use Remote Powershell in this post. Config remote PowerShell is such a pain in the ass, I reckon. I had to setup a deployment on another box which have IIS actually but someone in my team had already removed the Default Web Site so I couldn't find any quick and easy way to make Remote Powershell work. The error message Powershell prints out is so stupid which does not even tell me what it wants.

“Set-WSManQuickConfig : The client cannot connect to the destination specified in the request. Verify that the service on the destination is running and is accepting requests. Consult the logs and documentation for the WS-Management service running on the destination, most commonly IIS or WinRM. If the destination is the WinRM service, run the following command on the destination to analyze and configure the WinRM service: “winrm quickconfig”. At line:50 char:33 + Set-WSManQuickConfig <<<< -force + CategoryInfo : InvalidOperation: (:) [Set-WSManQuickConfig], InvalidOperationException + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.SetWSManQuickConfigCommand“

- Ok, there is another reason to hate Microsoft. It's time to abandon PS, I tried pstools before Remote Power Shell and got other problems so I won't waste time to go back to the very old tool as Power Shell is much more power full. So writting a simple console WCF application to communitcate between TeamCity and the Remote server is my choice.

- And the tool's name is DD which is a shortname of "Distributed Deployment". In this post, I'll sum up with details how to setup deployment for a windows service from TeamCity.

- Unlike web application, a window service is often a long running process in background. A .NET windows service has an OnStop method for you to clean up resource before stopping, which is cool. HOWEVER, when you try to stop the service using "net stop servicename", it does stop the service but the process will not end as fast as it can. I reckon a .NET window service can host multiple window services which are classes inherit from ServiceBase class so it could be a reason that makes the window services manager wait a little while for all potential services within a process to stop before ending the main process.

- In some cases like mine, I want the service stop immediately when it can so I have to somehow call Environment.Exit to make the process stop asap. Apparently I cannot use TASK KILL like the previous post as it was such a hacky way and it could corrupt my data. So my approach is letting the program listen to a command, when receiving an exit signal, the app should cleanup resources and invoke Enrironment.Exit. So if you need something like this, go on reading.

I/ Things you need to prepare:

Remote server: aka target server, a server that we'll install the service on

1 Project slot in Teamcity as I want to deploy whenever I click "Run" on a Teamcity deployment project instead of auto deployment after code build, so it'll cost you 1 project in 20 available slots of free TeamCity

3/ Prepare the deploy.bat script. I like to control the deployment process in a batch file instead of implement the steps in a program as I think people will have their own steps and a batch file is simple enough to manage. Again this batch file will basically do these things:

Use wget to download the latest artifact from TeamCity.

Use 7zip to extract the artifact, copy it to a temporary folder.

Save the artifact to Artifacts folder

Backup the whole current service folder

Stop target window service

Copy over new files and old configuration files

Start target window service

Clean up

Here is the basic code that anyone can use, just keep it somewhere, we'll need to copy the batch file to RemoteServer.

- The url to the artifac in TeamCity will look like:
http://teamcity-ip-address:80/repository/download/bt2/3907:id/service.name.1.12.1234.zip
- So dedend on the build type id of your artifac, change it in the above deploy.bat

III/ Setup steps:

* On Remote Server

- Download and install 7zip
- Assume that you put all custom tools and Deploy.bat in C:\Deployment. Create folder C:\Deployment\Artifacs to store your teamcity artifacs.
The Deployment folder should look like this:

That's it. Your AwesomeService should be deployed whenever you click Run on the deployment project from TeamCity.
Obviously you could adjust some thing in the deploy.bat to suit your needs, let me know if you have any problems.