Ceryx - A dynamic NGINX

Reverse proxying hundreds, or even thousands of contained micro-services is an interesting problem and one that we face daily at Sourcelair. That's why, today, we're glad to announce Ceryx, a dynamic reverse proxy using OpenResty, Lua and Flask that can be used to proxy hosts to any number of services, with it's configuration being available instantly. Ceryx is a project we've been working on the last couple of months and we're open sourcing now.

Abstract

In SourceLair, we're quickly provisioning dev environments and try to make web development frictionless, leveraging the power of the cloud. One of the nice goodies we provide, is a public URL for each project, that is always available and auto refreshes with your newest code. Thus, we need to start and stop multiple user containers per hour and route each user's public URL to the correct container without downtime.

Previous solutions

Ceryx is an internal project that we've being developing over the past year. We have experimented with different technologies and stacks in order to find the best solution for our case. Throughout the way, we've kept the same API making these changes seamless - that's a nice topic that we won't cover here though, you can find my slides from the API Meetup Athens.

Twisted, MongoDB and Redis

At first, we started with a custom reverse proxy, based on Twisted. Twisted is a very nice, event-driven networking engine written in Python. The service was querying MongoDB for routes, if it did not find one in its Redis cache. The cache was populated either by a database query hit, or an API call to add, update or invalidate a route. This was working nice for us, was fast and Twisted had a nice reverse proxy API to work with. The things that we missed from this implementation was that by default Twisted did not have every reverse proxy header you would expect set up and this would lead to invalid redirects in several cases. It was great fun working with it though!

tproxy and Redis

After Twisted, we looked towards tproxy. tproxy is a TCP routing proxy (layer 7) built on Gevent by the creator of the famous Gunicorn server and heavily inspired by Ruby's ProxyMachine. We created a lookup layer, that instead of static routes or a file, it would lookup the route at a Redis backend. We had completely split the service from MongoDB at that point, since the routes are ephemeral and can be easily recreated. Also, we split out the API to a separate service using Flask. The main issues were that tproxy development was a bit abandoned - last commit is over a year ago and that we would need to rework several bits and pieces for optimal performance. Also, there was an interesting bug with responses that did not contain a response length, which would be kept open until timed out.

NGINX and etcd

We have decided that we would leave the proxying to a server that did it well out of the box, so we looked at NGINX and HAProxy. Since we were more familiar with the former and happy with its performance - all of our front facing servers are NGINXs - we went with that. We created a watch script, which would watch for key changes in etcd and reload the configuration of the NGINX. Also, we changed our API to work with etcd as its backend. This dramatically improved performance, but after several weeks we noticed that the configuration change was not as fast as we thought. NGINX reload returns almost instantly, but the configuration is applied to accepting workers over time, which resulted in slow updates in the routes. The most important issue for us, was that sometimes this reload took over 10 seconds and led to our users seeing the "Server is hibernated" page multiple times until the new configuration was applied.

OpenResty and Lua to the rescue

Recently, Github blogged about them using Lua scripting in NGINX to make reloading of Github pages more often - previously they did so every 30 minutes or so.As we were looking for an alternative and intrigued by the post, we started looking deeper in OpenResty - which is an NGINX flavor, compiled with the Lua JIT and other 3rd-party NGINX modules.

We decided to go back to Redis as the backend, since we already had the API ready and Redis is in-memory, thus pretty fast in querying. We also use Redis for other services and caching, so we would not need another cluster to take care of.

The result was Ceryx, which is now Open Source and available to every one. This contains both the NGINX lua scripts and the API and can be easily deployed using Docker Compose.

Stitching everything together

NGINX provides several hooks for Lua code to be executed in several stages of a request. Ceryx takes action just before the proxying stage - at the "access_by_lua_file" stage, where it calls a Lua router.

The router then queries the local in-memory cache of NGINX and the Redis backend, to determine the target host and port of the incoming host, or returns a wildcard target if not found. If a Redis query returns a result, then this result is cached for 5 seconds, so that subsequent requests do not hit Redis - this is a nice improvement for when static files like CSS, JavaScript and images are needed, where multiple requests are being thrown at the same time. The caching timeout can be increased, in order to easily tailor Ceryx to each application's needs.

At the same time, a simple Flask service is available, which provides a CRUD API for routes and updates the Redis backend with new routes. Both the proxy and the API service share the same environment variables for configuration of Redis for consistency.

First impressions

After the first weeks of the last Ceryx flavor, we've seen great improvement in the speed of our reverse proxy and a big reduction in the times a user would see the hibernated server page. In more detail, before the upgrade of Ceryx we had an average of 10 page views per development session, while now we have only 2.5.

Next steps

Ceryx is Open Source under the MIT license, so we'll be glad to see contributions or bug/feature requests. What we're planning on doing is add StatsD metrics to Ceryx, so that we can further improve and optimize several bits and pieces accordingly.