http://techblog.geniussports.com/Ghost 0.7Thu, 15 Feb 2018 02:26:50 GMT60SignalR is a great piece of kit for rapidly developing a push communication layer for a user interface. However there are a couple of things we would like it to do better:

]]>http://techblog.geniussports.com/strongly-typed-signalr-hubs-with-typescript/4ed3f743-7af8-4f3d-9ed1-b59294c70b12Mon, 26 Sep 2016 10:39:24 GMTSignalR is a great piece of kit for rapidly developing a push communication layer for a user interface. However there are a couple of things we would like it to do better:

The major problem with the dynamic hub proxy generation is that if the service is down and can't respond with the JavaScript then your page will actually break on load because the JavaScript you have written to hook up to the hub client methods will be full of null reference exceptions, unless of course you litter your code with horrible null checks. Due to this you don't have the opportunity to show a good connection error message to the user because your app can't even start. A much better solution would be to generate these at compile time and distribute them via internal package feeds to include during the front end build. You can then rely on the hub definitions always being there.

The loose typing issue can easily be solved with the use of TypeScirpt. We have started using TypeScript to produce all of our front ends which is proving to be incredibly successful at reducing bugs. SignalR doesn't provide any support for producing TypeScript definitions at the moment and the only tools I found on GitHub were not quite right for us so I forked the best one I found and have made some changes to support our requirements.

SignalR.ProxyGenerator

This project is quite straight forward. SignalR provides a default HubManager that generates the JavaScript proxies for you but I have wrapped it to provide more configurable build time support. The console application loads the assembly you are generating proxies for and all of it's dependencies, then generates the proxies and strips out all of the generated comment information because the reference paths to jQuery may not be correct in your environment and could break your build. There are also arguments to specify the default connection url and additional metadata comments you would like included at the top of the file.

Signalr.Hubs.TypeScriptGenerator

The tool I forked by yetanotherchris did the job that I was looking for however he has only generated a pre-release NuGet package. I also wanted functionality for the TypeScript generator to respect DataMember attributes with the member Name overridden. We serialize using camelCase and instead of relying on a serializer behaviour I wanted to do this using the serialization attributes:

Putting It All Together

At compile time of our C# back end SignalR hubs we now generate the hub proxies and TypeScript definitions. These are published to an internal Nexus NPM repository which our front end team can easily use to consume the packages in their build pipeline. The packages are also versioned using the semver convention allowing much better tracking of which versions of our APIs we can attribute issues to. We now have a full continuous integration cycle protecting us from accidental breaking changes and a self documenting API for our front end team to use.

Both of the GitHub projects have well documented readme files for full details.

]]>Operability.IO is a yearly two-day event focused on "DevOps from the Ops point of view". This was only its second year.
It's organised by Marco Abis of Highops.com. It was a great event with a relaxed atmosphere in a lovely venue - I'd recommend it. Thanks to Marco,]]>http://techblog.geniussports.com/operability-io-2016/9e760199-caa0-498d-9e5a-766eb43d92b8Fri, 23 Sep 2016 12:45:54 GMTOperability.IO is a yearly two-day event focused on "DevOps from the Ops point of view". This was only its second year.
It's organised by Marco Abis of Highops.com. It was a great event with a relaxed atmosphere in a lovely venue - I'd recommend it. Thanks to Marco, the speakers and all the sponsors.

The following is what I took from the event and is not meant to be complete. It's just what appealed to me and the context I'm working in. I've tried to attribute to the speakers where possible, any mistakes are in my recollection. I've mixed in some other bits too, where I've dug deeper on a topic and found other sources.

Favourite talk: Sarah Wells' "Why would anyone
do out-of-hours support for free?"[1]. The alternative title is "What I
learned about DevOps at The Financial Times". An experience report of a DevOps
transformation littered with wisdom. The slides alone are not enough to appreciate
this talk - it needs to be heard.

On culture:

Culture has a disproportionately bigger impact than anything else on your
success - and it is the hardest thing to get right (Casey
West[2])

Trust is crucial: systems run on trust
(Daniel Otte and Tom Shacham[4])

Teams can have overlapping concerns instead of hard edges. At Google
they visualise a team's responsibilities as a normal
distribution centred at a point on the tech stack spectrum
(running from hardware to UI) Typical responsibilities would lie within one
standard deviation but they are not limited by it (Niall
Murphy[5])

A certain level of cultural maturity should be achieved before
undertaking microservices. It would be irresponsible not to!
Rebecca Parsons[8]
referencing "You must be this tall to use microservices" by Martin Fowler[7].
Maybe this concept of a fitness test should apply to other initiatives too?

"GrownOps" by Daniel Otte and Tom Shacham[4]: a
code of conduct driven by communication problems and friction:

Favour working together over delivering fast

Contribute instead of accuse

You're always in a state of partial knowledge

Ask for information don't state judgement

Systems are based on (human) trust

Align team responsibilities with business intentions - "incentives
matter"[9]

The Financial Times embedded their "TechOps" people into
their product teams, inspired by Werner Vogels' quote "you build it, you run it"[10].
"If you're not doing this, you're not doing DevOps": Sarah
Wells[1]

Google pushed out responsibility for the correct configuration
of MySQL to every team involved in the stack. For example this
meant the DNS Team were also on the hook if their MySQL configuration
test suite was failing (Niall Murphy[5])

USwitch centralised their teams and gave them horizontal responsibilities.
This led to friction so they decentralised into vertical product teams.
However this meant the same problem being solved multiple times so they
recentralised some of their horizontal functions again. This time though the
central teams were charged with being "caring but not responsible"
So they could recommend an approach and develop a toolset but product
teams were not bound to use it (Tom Booth[11])

Embracing change:

Pick one area to change at a time. If you're lucky you might have one
team/division/acquisition already heading in the direction you want to go.
They can serve as example to the rest (Sarah Wells[1] and
Tom Booth[11])

Many companies consider all decisions to be final (irreversible or type 1).
However in reality only a fraction are: most decisions are changeable
(reversible or type 2). Be brave with your type 2 decisions!
Jeff Bezos[13] cited by Sarah Wells[1]

Your emergency patching process may be the most agile/DevOps process in your company.
It is much faster then the full release process having trimmed all the fat.
What if you just released all your changes this way? Subversive advice from
Casey West[2]

On complex systems:

All systems of sufficient complexity operate in a constant state of partial
failure - the key is to remain operable. Advice from the paper "How Complex Systems Fail"[12] cited by Adrian Colyer[6]
and the same thoughts were echoed by Niall Murphy[5].

The behaviour of complex systems cannot be understood by looking at their
constituent parts. We must observe the activities of the system as a whole.

I have reconstructed this from memory as I cannot find it in my notes nor
remember which talk it came from - but it stuck in my mind.

There is automation and then there is autonomy. Google reached a limit of efficiency
by automating processes for humans. MySQL failovers could not be done faster than
30min. To reach the next level they needed to remove humans
altogether and build autonomous, self-healing systems (Niall
Murphy[5])

On technology:

Distributed tracing is the future - Steven Acreman[14].
Also correlating
anomalies across your metrics! The following are all from Adrian Colyer[6]'s
excellent talk and links are to his blog:

"Even your best engineers often get it wrong when
they're working from guesses and intuition" - Adrian
Colyer[6]

]]>Each year the Puppet “State of DevOps Report” is published which has become authoritative in predicting trends and directing DevOps effort. It's an analytical report based on a worldwide survey, this is its fifth year. Authors include Jez Humble (Continuous Delivery book) and Gene Kim (Phoenix Project/Visible Ops books)]]>http://techblog.geniussports.com/2016-puppet-state-of-devops-report/db000364-030e-478a-b127-70aaa9a7b88eTue, 06 Sep 2016 08:37:24 GMTEach year the Puppet “State of DevOps Report” is published which has become authoritative in predicting trends and directing DevOps effort. It's an analytical report based on a worldwide survey, this is its fifth year. Authors include Jez Humble (Continuous Delivery book) and Gene Kim (Phoenix Project/Visible Ops books).

Summary

Their analysis over the last few years has found 3 clusters of organisation they have labelled high, medium and low performing. These organisations cluster on deployment frequency, lead time for changes, MTTR and change failure rate (see the table on page 15)

High performers are not only better on all these measures but are improving at an accelerating rate. A few years ago Amazon and Netflix stunned the world by revealing they deploy at least once a day. They are both now deploying thousands of times a day. If an organisation is not high-performing they are being left behind.

Employees at high-performing companies are more than twice as likely to recommend their workplace to friends. Employee satisfaction is correlated to performance. High engagement is known to drive revenue, see The Chemistry of Enthusiasm(bain.com): “Companies with highly engaged workers grew revenues at 2.5x those with low engagement” (page 20).

They attempt to put an ROI on undertaking a technology transformation and provide a formulae you can apply to your own company.

Those returns should be ploughed back into activities to help you become a high-performing organisation. The benefits begin to accrue as more time can be spent on improving existing processes and the gains accelerate.

What activities lead to a high-performing organisation?

Lean Management

Limiting WIP and use of information radiators increase performance and reduce burnout. They also predict a generative, performance-orientated culture.

Lean Product Management

Predicts higher-performance and lower deployment pain. Defined as:

Small batches, completed in less than a week, released frequently and use of minimum viable product concept.

Understanding and visibility of flow from business to customers including status of products and features.

Actively and regularly seeking customer feedback. Applying that feedback to the design of the product.

NEW THIS YEAR: use of an experimental approach to product development.

Continuous Delivery Practices

Predicts less time spent on unplanned work, maintenance and meetings. Predicts more time spent on new work.

Maintain a minimal amount of data needed to run tests as maintenance is costly

]]>As all developers will know, centralized logging is the key to a happy life. Without it you spend your life trawling through servers to find the logs you need. Up until early 2015 we were achieving this with a SQL database to which all of our applications would directly write]]>http://techblog.geniussports.com/graylog-has-taken-over-our-centralized-application-logging/30fd276e-fffb-4ee3-abcd-56cb6daa6f7bThu, 03 Mar 2016 14:29:29 GMTAs all developers will know, centralized logging is the key to a happy life. Without it you spend your life trawling through servers to find the logs you need. Up until early 2015 we were achieving this with a SQL database to which all of our applications would directly write their logs.

There are many great solutions out there to support centralized logging such as Splunk, Logstash, SQL (love it or hate it, it's a stable system), etc but most of them are expensive. I am very happy that we happened to stumble across Graylog, which I have to say is by far the best centralized log collector I have used. We have not been using it in Production for over a year now.

Graylog sits on top of Elasticsearch so it scales very well. The front end to query your logs not only looks great but is blisteringly fast, with queries across millions of messages taking less than 100 milliseconds. The query syntax takes a little while to get used to but allows you to do some fairly complex querying on any number of user defined fields. It also allows you to setup log streams based on complex pattern matching rules which we use to categorize products and teams, and from these you can configure email alerts again based on your defined criteria. On any given busy sporting Saturday we are storing 20 million messages a day and hitting messages rates of up to 30k a minute which is very impressive for the hardware it's running on. We currently run 3 seperate instances to support our Dev, UAT and Production environments. As I write this our Production instance is holding 400 million messages which we are planning to scale up to 1 billion by the end of the year.

A couple of months ago we removed all of our SQL appenders and are now solely running with GELF (Graylog Extended Log Format) UDP appenders. The cost of a UDP publish is much lower than that of a SQL request so we have much more efficient logging, although we use Log4Net.Async to put all of our logging onto background threads to improve application efficiency. It's great to see such an impressive open source product being actively developed and made available for free.

If you are not using Graylog, you should be. Head over to their website and check it out.

]]>Have you ever wondered if it's possible to run Gitlab CE in a HA environment? Simple answer is yes and most important you don't have to pay a dime for this.

Gitlab CE HA explained

Long story short for a basic Front-end HA setup you're going to need at least

]]>http://techblog.geniussports.com/simple-gitlab-ce-ha-solution/ad286975-ca52-42ec-872c-036bb72e8bb0Wed, 27 Jan 2016 16:35:42 GMTHave you ever wondered if it's possible to run Gitlab CE in a HA environment? Simple answer is yes and most important you don't have to pay a dime for this.

Gitlab CE HA explained

Long story short for a basic Front-end HA setup you're going to need at least 3 servers. This article won't get into the details of setting up database replication and load-balancing it.

Installation steps

PostgreSQL Database

First of all begin with the DB node. Install PostgreSQL 9.2 on the server and create 2 database instances, one for Gitlab, one for Gitlab CI and assign users to them. For the purpose of this tutorial I'm using gitlab-prod and gitlab-prod-ci. Gitlab requires Redis instance as-well so install it. With this our initial work on that node finished. We're going to come back to it later on in this article.

First front-end node

Download and install Gitlab CE's RPM from their repository. NOTE: Don't run gitlab-ctl reconfigure as it installs PostgreSQL and Redis on the same node by default. Instead we want to point them to our already installed PostgreSQL and Redis server. The following example is the bare minimum you need to proceed. Don't forget to replace db_username and db_password with the ones you're selected while creating the DB server.

Gitlab will do the magic and install and configure all necessary components skipping PostgreSQL and Redis local instances. At this point your DB is initialized. You can test proper functionallity by accessing http://node-gitlab01.invalid. We can move now to the next step.

Second front-end server

Again install Gitlab CE from their official repository skipping gitlab-ctl reconfigure step. Then copy gitlab.rb and gitlab-secrets.json from your first server. The important bit is your gitlab-secrets.json file that holds all encryption keys to access DB data.

Please note that I'm using the alias IP for accessing Gitlab. You probably already realized that gitlab-ha points to it as it's configured as external URL for my Gitlab servers. One final change is required before you attempt to start HAproxy. Edit /etc/ssh/sshd_config and set your listen address to the primary IP of node-pgsql01 else the port will collide with the one in HAproxy. Start HAproxy and try to login to Gitlab CE at http://gitlab-ha.invalid

]]>We've been spending a little more time recently at work moving our code repo away from subversion and over to git. Whilst we were doing this, I came across this cool tip to optimize the Maven Release Process, which allows you to skip using the heavy Release Plugin. So, if]]>http://techblog.geniussports.com/epilogue-the-maven-release-plugin/c33829b2-93a7-4e84-bc20-91cb4f273060Tue, 26 Jan 2016 22:27:48 GMTWe've been spending a little more time recently at work moving our code repo away from subversion and over to git. Whilst we were doing this, I came across this cool tip to optimize the Maven Release Process, which allows you to skip using the heavy Release Plugin. So, if there's anyone left fighting in the eternal war against maven, this article is the roundup.

The inspiration for our solution is derived heavily from Axel Fontaine's insightful series on the Maven release process, which you can read the conclusion of here.

In summary, it suggests swapping out the release plugin for the scm and versions plugins, and then getting Jenkins to do some smart work during the release build to orchestrate tagging and versioning. It works pretty well, and the core advantage of not having to wait for the same build to complete over and over is incredibly satisfying.

There were, however, some snagging issues that initially stopped us adopting this approach. These essentially boil down to the following :-

Versioning

No matter how many times I see

<project ...>
...
<version>0-SNAPSHOT</version>
...
</project>

I can't get used to it, I try to avoid unusual workarounds as much as possible when dealing with Maven, so if it was possible to keep the snapshot version intact, I was in favour of doing so. I also find it beneficial to keep the last stable release visible for developers working on the current development iteration.

SCM Tagging

The SCM tag doesn't actually commit the version change via jenkins to GIT, so your tags will all have the weird snapshot version 0-SNAPSHOT. There's also no commit to start the next development iteration, so you may end up tracking release versions via some other system, which is just another maintenance headache in my way.

Better Housekeeping with Jenkins

Turns out it's actually pretty cheap to keep the snapshot version if you still have some use for it. Here's a breakdown of what's involved :-

Set up a Jenkins job to perform the release.

Add some parameters using the Parameterized Build Plugin. You'll need a parameter for the release version and one for the next snapshot iteration. I creatively called mine Release_Version and Next_Snapshot_Version. We'll need these environment variables orchestrate the scm plugin.

You need 4 steps in the build itself. The first of these is to set the release version versions:set -DnewVersion=$Release_Version

Now we need to check in the release version so, when we tag, the tag has the right version in git scm:checkin scm:tag -Dmessage="[Jenkins] Tag commit $Release_Version". If you've had your Jenkins user set up using a deploy key until now, you'll need to migrate this to a real user with access to push to your repo.

Finally we set up for the next development iteration : versions:set -DnewVersion=$Next_Snapshot_Version scm:checkin -Dmessage="[Jenkins] Next development iteration $Next_Snapshot_Version"

And thats it ! Another nice touch, if you have the Groovy Postbuild Plugin, is to execute the following script after the build :

This will add a nice tooltip badge on your build which will tell you the released version and link back to the released tag for convenience.

]]>Imagine you have a Tomcat 8 server running two different contexts with URLs like this:

someurl.example/context1/....
someurl.example/context2/....

and you need to capture requests to:

someurl.example/clientname/context1/....

where clientname is dynamic, and you'd like to redirect those requests to:

someurl.example/context1/clientname/....

Normally you

]]>http://techblog.geniussports.com/configuring-tomcats-rewrite-valve-to-work-with-different-contexts/05a2f4f3-34ef-45f8-ae3d-5202fc82d1d3Thu, 20 Aug 2015 11:20:59 GMTImagine you have a Tomcat 8 server running two different contexts with URLs like this:

someurl.example/context1/....
someurl.example/context2/....

and you need to capture requests to:

someurl.example/clientname/context1/....

where clientname is dynamic, and you'd like to redirect those requests to:

someurl.example/context1/clientname/....

Normally you can use Tomcat's RewriteValve to rewrite URLs, but the RewriteValve is configured in the Context element of Tomcat's configuration xml file. This means it will only work for URLs under the context (e.g. someurl.example/context1/thingsaffectedbyrewrite).

The solution to this is surprisingly simple. All that needs to be done is to have an empty ROOT context in Tomcat (without any deployed applications in it) with the following configuration:

This is the content of the context.xml file that needs to be in your Tomcat webapps directory under the ROOT/META-INF/ directory. The only other thing that is necessary is to have a rewrite.config file in the ROOT/WEB-INF directory in Tomcat webapps directory, with the following contents:

RewriteRule ^/.*/context1/(.*)$ /context1/$1 [L]

This is the configuration for the RewriteValve and describes the actual URL rewriting via regex as explained in the documentation.

]]>In Visual Studio 2013 Microsoft introduced a new setting Prefer 32-bit for projects that compile as executables, such as Console applications. This setting causes the compiler to set a flag in the header of the executable instructing it to be run in 32-bit mode if possible. This is switched on]]>http://techblog.geniussports.com/32-bit-flag-in-visual-studio-2013/b52c353d-1f86-4f36-99b4-0f3da337a7c1Tue, 24 Mar 2015 14:07:12 GMTIn Visual Studio 2013 Microsoft introduced a new setting Prefer 32-bit for projects that compile as executables, such as Console applications. This setting causes the compiler to set a flag in the header of the executable instructing it to be run in 32-bit mode if possible. This is switched on by default for new projects.

It is possible to see if an application is running in 32-bit or 64-bit mode through the Task Manager. On Windows Server 2012 there is a column "Platform" with this information. On Windows Server 2008 the name of the application is appended with "*32".

This seems a strange choice of default value from Microsoft as it is inconsistent with the behaviour in previous versions of Visual Studio which would implicitly have run as 64-bit and all modern machines are likely to have a 64-bit processor architecture. It would seem likely that Microsoft's rationale for this decision is simply that a 32-bit application may consume less resources.

An application running in 32-bit mode will only be able to access 2 GB of memory by default (there is a setting to increase this to 3 GB). Many of our applications may need to use more memory than this at peak times. Once the memory limit has been reached any further requests for memory will result in an OutOfMemoryException being thrown. As a consequence we recommend to untick this settings so our applications run in 64-bit mode by default.

Unfortunately when switched on this setting doesn't initially appear in the Visual Studio .csproj file, although it does appear if switched off and subsequently switched back on again. The following lines of Powershell search recursively from the current directory for C# Projects that appear to have this switch turned on:

It is possible to change an executable to remove the flag to prefer 32-bit using the command:

CorFlags.exe MyApplication.exe /32BITPREF-

Running this command without the /32BITPREF- lists the current flag settings. The CorFlags utility is installed with the Windows SDK which can be downloaded free from Microsoft.

]]>Overview

Blue / Green Deployment is a common pattern for managing deployment of a sizeable estate to allow teams to release frequently. It gives you a simple way of smoke testing deployments in the environment which they will run before cutting over to the new servers. It also gives you intrinsic

Blue / Green Deployment is a common pattern for managing deployment of a sizeable estate to allow teams to release frequently. It gives you a simple way of smoke testing deployments in the environment which they will run before cutting over to the new servers. It also gives you intrinsic rollback support as you can cut back to the old slice in case of problems.

This article will discuss some of the more practical aspects of bringing this pattern to life in a JVM based stack. It will mainly focus on the cutover, as I think there's plenty of literature already out there which discusses the deployment of packages themselves.

Stack

Lets start by discussing the technology that we use to deliver our software to production.

httpd/apache2 - This will be used as a health check to determine if the node should be serving traffic. It will also be used to report on what colour a machine, for the purposes of automating scripts.

Tomcat - This will be the application server on which our software is deployed. Obviously this is pretty much interchangable with your favourite application server.

Jenkins - This will be .used to orchestrate the cutover, with a build step that plugs in after your deploy

Ant - Simple and lightweight, this will be the tool which actually performs the cutover. It's interchangable with gradle or bash directly.

Loadbalancer Healthchecks

Apache / Httpd Setup

First lets configure httpd to serve some sort of simple response on whether its staging or active. This assumes you have no existing httpd installation, if you do, you can probably figure out enough to expose a virtualhost on 81 to do the same thing.

Set Listen Address To :81

This is important because, in all likelyhood, you're going to be serving your main traffic on 80 / 443 and we don't want the health check to interfere with that. In your /etc/httpd/conf/httpd.conf set the following contents

Next in your /var/www/html directory create two files lb_check and colour. Colour should either contain Green or Blue and lb_check should contain LIVE or STAGING. I suggest using some simple scheme to separate the servers like odds and events but its actually pretty flexible. Obviously one colour should be Live and another Staging.

Configuring Loadbalancer

The next step is to configure our load balancer to test the lb_check response to determine whether a server is supposed to be active or staging.

At Betgenius, we use f5s to loadbalance web traffic to our servers. They support some quite simple health checks which can be configured to check a particular url on a particular port and expect a particular response. We configure two pools, one to serve live traffic and one to serve staging traffic ( i.e. for performance tests ). Each pool should have both the live and staging nodes assigned, where 'activeness' is determined by the health check. This health check should be looking for LIVE for the live pool and STAGING for the staging pool on port 81, which is what we set up earlier in httpd.

You should now be able to see two pools where the active nodes are flipped. Here's how one of ours looks

Scripting The Cutover

We're using ant to do our deployments currently on the Connextra team, I know we have other teams who have slightly nicer deployment tooling and we'll probably invest some more time in this in the near future, but it works for now so i'll discuss how we use it.

Firstly, we actually have only minimal coupling between the blue green paradigm and our deploymently scripts. Rather there is a common deployment script which we use to deploy a release version of the software, and the environment is configurable. With that in mind, we simply configure separate environments for green and blue.

Jenkins

Your jenkins job should enable one slice and disable the other. I would suggest to hook it up as a manual step after your deployment.

Heres how the build section of our job looks.

This particular job assumes you're trying to activate the blue slice and deactivate the green.

And Thats It!

Hopefully this can serve as a simple guide on how to plug blue/green deployments into your stack, in a way thats hassle free and quick!

]]>We are excited to announce that we will once again be returning to Silicon Milk roundabout on the 16th November. We are looking for .NET developers at all levels and a DevOps engineer. Check out our careers page for the job specs and come say hi on the day!]]>http://techblog.geniussports.com/silicon-milkroundabout-november-2014/3d4c811a-e691-4a20-bc72-6c521f5bb6daWed, 29 Oct 2014 18:19:40 GMTWe are excited to announce that we will once again be returning to Silicon Milk roundabout on the 16th November. We are looking for .NET developers at all levels and a DevOps engineer. Check out our careers page for the job specs and come say hi on the day!]]>A couple of years ago we made the decision to move our software to use headers exchanges. We did this because it provides extensible routing information that can be extended without breaking any existing bindings. For example, most of our messages contain a FixtureId and a BookmakerId, so when we]]>http://techblog.geniussports.com/performance-tuning-our-rabbitmq-routing-strategy/7c180176-f8b3-40e1-961f-a6d0c6e2b9caTue, 21 Oct 2014 12:54:10 GMTA couple of years ago we made the decision to move our software to use headers exchanges. We did this because it provides extensible routing information that can be extended without breaking any existing bindings. For example, most of our messages contain a FixtureId and a BookmakerId, so when we bind to these messages we can easily just pick up everything for a single sporting fixture and/or customer. Now if I wanted to get every message for a single sport I could add SportId into the message headers and easily bind to it without breaking any other binding implementations. If I were to add that information into a routing key, any queues bound to that would need to be changed to * or # out the new part of the routing key. Another huge benefit is that by using well defined names for the header keys we can further decouple components. When using topic routing keys you are more likely to end up in a situation where producers and consumers share some knowledge of how to create the relevant routing keys to speak to each other, adding some level of coupling to your supposedly decoupled messaging systems.

All of this sounds great but we have recently started to see strange load profiles on our RabbitMQ servers during busy periods. The servers are not scaling in a linear fashion as we would expect them to so I started thinking about our use of RabbitMQ and our routing strategies. I know there are other users out there who have achieved much higher throughput than us so we must be doing something wrong.

I recently posted this question to the RabbitMQ team:

@RabbitMQ do you have any comparative metrics on the computational cost of headers vs topic routing?

And they replied with:

@cjbhaines we don't have any numbers but the headers exchange is rarely used not optimised

@cjbhaines on the other hand, the topic one is very often used and is optimised

The optimizations they made to the topic routing algorithms in 2010 (2.4.0) are documented in these 2 blog posts titled "Very fast and scalable topic routing": Part 1Part2. This is an excerpt from the end of the second blog post: “the performance improvement in this graph varies from 25 to 145 times faster” (than the 2.3.1 version of the topic routing algorithm). I have to commend them on that piece of work, those are some impressive stats.

To performance test the different exchange types I wrote a simple test harness.

1000 queues

10 message types – All have 4 headers/routing key parts. Each message published only hits 1 of the 1000 queues

50 publisher threads – Each thread publishes a random message type with a random message number every 100-200ms result in ~330msg/s

50KB message size

5 second message TTL – There are no consumers running so we need a small TTL

5 minute test run time

64 core server running RabbitMQ 3.3.5

Results:

It is a shame that they have made an assumption that users of RabbitMQ do not use the headers exchange because it's difficult to gauge the uptake without surveying users regularly. Headers provides a lot of power to the implementer to keep code clean and simple, however it is their product and we have to respect their choices. Lets hope they do some optimization on this in the future.

Back to topic exchanges we go!

]]>Here at Betgenius we have a very strong technical team with a passion for open source software and an open technology stack. Our team consists of specialists using C#, F#, Java, Scala, JavaScript, Google Closure, R and many more technologies. We are in the process of getting our GitHub and]]>http://techblog.geniussports.com/welcome-to-the-betgenius-tech-blog/fec8d04f-721d-47de-8425-e2b5b37eebfaTue, 23 Sep 2014 16:42:52 GMTHere at Betgenius we have a very strong technical team with a passion for open source software and an open technology stack. Our team consists of specialists using C#, F#, Java, Scala, JavaScript, Google Closure, R and many more technologies. We are in the process of getting our GitHub and open source initiative off the ground. We aim to contribute back to the open software community that we so heavily depend on and we hope that you enjoy reading and learning from what we have to offer.]]>