Tag Archives: python

It’s been quite some time since I wanted to use a consistent hashing based distribution of my workload in a quite big cluster at work. So when we finally reached the point where this became critical for some job processing I rushed to find out what python library I could use to implement this easily and efficiently.

I was surprised not to find a clear “winner” for such a library. The more “popular” named hash_ring has a rather unoptimized source code and is dead (old issues, no answer). Some others stepped up since but with no clear interest for contributions and almost no added features for real world applications.

So I packaged the ketama C library and its python binding on my overlay to get intimate with its algorithm. Then I started working on my own pure python library and released uhashring on Pypi and on Gentoo portage !

When working on distributed systems, we often have to distribute some kind of workload on different machines (nodes) of a cluster so we have to rely on a predictable and reliable key/value mapping algorithm.

If you’re not sure about what that means, just consider the following questions when working on a cluster with a lot of machines (nodes):

how could I make sure that all the data for a given user always gets delivered and processed on the same machine ?

how could I make sure that I store and query the same cache server for a given key ?

how do I split and distribute chunks of a large file across multiple storage machines and then make sure I can still access it through all those machines at once ?

A lot of technologies answer those kind of mapping questions by implementing a hashing based distribution of their workload (be it a distributed processing, file or cache).

Consistent hashing is one of those implementations and I felt like taking some time to explain it further.

Example use case

Here is my attempt to explain what is consistent hashing and why it is needed on distributed systems. To make things fun, we’ll take this simple use case:

I have 5 broken cars

There are 4 repair factories nearby

I have to implement a way to figure out where to send each car to get it fixed

I want to ensure that an even number of cars will get fixed on each factory

This gets down to two major questions to solve:

what is my selection criteria ? this will be my key.

what is the expected answer ? this is my value.

Static mapping

The first approach we could implement is to manually distribute the car based on their colour.

key = car’s colour

value = factory number

To implement this, you use what we usually call dictionaries on various languages : those are static data structures where you assign a value to a key.

We would then write a mapping of “car color” -> “factory n” and apply this simple rule to decide where to ship a broken car.

Now we have a way better car repair distribution !… until something bad happens:

What if a factory burns ?

Our algorithm is based on the number of available factories so removing a factory from our array means that we will redistribute a vast majority of the key mappings from our hash table !

Keep in mind that the more values (factories) you have in your array the worse this problem gets. In our case, given a car’s plate number we are sure that we wouldn’t be able to figure out where a vast majority of them were sent any more.

Even worse is that when factory 3 gets repaired and back in my hash table, I will once again loose track of all my dispatched cars… What we need is a more consistent way of sorting this out.

Consistent hashing

The response to this kind of problem is to implement a consistent hashing algorithm. The goal of this technique is to limit the number of remapped keys when the hash table is resized.

This is possible by imagining our factories as a circle (ring) and the hash of our keys as points on the same circle. We would then select the next factory (value) by going through the circle, always on the same way, until we find a factory.

Red hashed plate number would go to factory 1

Blue hashed plate number would go to factory 3

This way, when a factory gets added or removed from the ring, we loose only a limited portion of the key/value mappings !

Of course on a real world example we would implement a ring with a lot more of slots by adding the same factories on the ring multiple times. This way the affected range of mappings would be smaller and the impact even more balanced !

For instance, uhashring being fully compatible and defaulting to ketama’s ring distribution, you get 160 points per node on the ring !

I hope I got this little example right and gave you some insight on this very interesting topic !

This is a quick heads-up post about a behaviour change when running a gevent based application using the new pymongo 3 driver under uWSGI and its gevent loop.

I was naturally curious about testing this brand new and major update of the python driver for mongoDB so I just played it dumb : update and give a try on our existing code base.

The first thing I noticed instantly is that a vast majority of our applications were suddenly unable to reload gracefully and were force killed by uWSGI after some time !

worker 1 (pid: 9839) is taking too much time to die...NO MERCY !!!

uWSGI’s gevent-wait-for-hub

All our applications must be able to be gracefully reloaded at any time. Some of them are spawning quite a few greenlets on their own so as an added measure of making sure we never loose any running greenlet we use the gevent-wait-for-hub option, which is described as follow :

wait for gevent hub's death instead of the control greenlet

… which does not mean a lot but is explained in a previous uWSGI changelog :

During shutdown only the greenlets spawned by uWSGI are taken in account,
and after all of them are destroyed the process will exit.
This is different from the old approach where the process wait for
ALL the currently available greenlets (and monkeypatched threads).
If you prefer the old behaviour just specify the option gevent-wait-for-hub

pymongo 3

Compared to its previous 2.x versions, one of the overall key aspect of the new pymongo 3 driver is its intensive usage of threads to handle server discovery and connection pools.

Now we can relate this very fact to the gevent-wait-for-hub behaviour explained above :

the process wait for ALL the currently available greenlets
(and monkeypatched threads)

This explained why our applications were hanging until the reload-mercy (force kill) timeout option of uWSGI hit the fan !

conclusion

When using pymongo 3 with the gevent-wait-for-hub option, you have to keep in mind that all of pymongo’s threads (so monkey patched threads) are considered as active greenlets and will thus be waited for termination before uWSGI recycles the worker !

Two options come in mind to handle this properly :

stop using the gevent-wait-for-hub option and change your code to use a gevent pool group to make sure that all of your important greenlets are taken care of when a graceful reload happens (this is how we do it today, the gevent-wait-for-hub option usage was just over protective for us).

modify your code to properly close all your pymongo connections on graceful reloads.

Foreword

Let’s say we have to design an application that should span across multiple datacenters while being able to scale as easily as firing up a new vm/container without the need to update any kind of configuration.

Facing this kind of challenge is exciting and requires us to address a few key scaffolding points before actually starting to code something :

having a robust and yet versatile application container to run our application

having a datacenter aware, fault detecting and service discovery service

Seeing the title of this article, the two components I’ll demonstrate are obviously uWSGI and Consul which can now work together thanks to the uwsgi-consul plugin.

While this article example is written in python, you can benefit from the same features in all the languages supported by uWSGI which includes go, ruby, perl ad php !

Our first service discovering application

The application will demonstrate how simple it is for a client to discover all the available servers running a specific service on a given port. The best part is that the services will be registered and deregistered automatically by uWSGI as they’re loaded and unloaded.

The demo application logic is as follows :

uWSGI will load two server applications which are each responsible for providing the specified service on the given port

uWSGI will automatically register the configured service into Consul

uWSGI will also automatically register a health check for the configured service into Consul so that Consul will also be able to detect any failure of the service

Consul will then respond to any client requesting the list of the available servers (nodes) providing the specified service

The client will query Consul for the service and get either an empty response (no server available / loaded) or the list of the available servers

Et voilà, the client can dynamically detect new/obsolete servers and start working !

Setting up uWSGI and its Consul plugin

On Gentoo Linux, you’ll just have to run the following commands to get started (other users refer to the uWSGI documentation or your distro’s package manager). The plugin will be built by hand as I’m still not sure how I’ll package the uWSGI external plugins…

Running the server and its UI is also quite straightforward. For this example, we will run it directly from a dedicated terminal so you can also enjoy the logs and see what’s going on (Gentoo users have an init script and conf.d ready for them shall they wish to go further).

First, we’ll run the client which should report that no server is available yet. We will keep this terminal open to see the client detecting in real time the appearance and disappearance of the servers as we start and stop uwsgi :

$ python client.py
no consul-demo-server available
[...]
no consul-demo-server available

Open a new terminal and get inside the project’s directory. Let’s have uWSGI load the two servers and register them in Consul :

And check the miracle on your client terminal still running on your local box, the new servers have appeared and will disappear if you stop uwsgi on the cheetah node :

consul-demo-server found on node drakar (xx.xx.xx.xx) using port 2001
consul-demo-server found on node drakar (xx.xx.xx.xx) using port 2002
consul-demo-server found on node cheetah (yy.yy.yy.yy) using port 2001
consul-demo-server found on node cheetah (yy.yy.yy.yy) using port 2002

Go mad

Check the source code, it’s so simple and efficient you’ll cry 😉

I hope this example has given you some insights and ideas for your current or future application designs !

I had the chance to participate to europython 2014 as my company was sponsoring the event.

This was a great week where I got to meet some very interesting people and hear about some neat python use cases, libraries and new technologies so I thought I’d write a quick summary of my biased point of view.

ZeroMQ

I had the chance to meet Pieter Hintjens and participate in a 3 hours workshop on ZeroMQ. This was very interesting and refreshing as to go in more depth into this technology which I’ve been using in production for several years now.

Pieter is also quite a philosophical person and I strongly encourage you to listen to his keynote. I ended up pinging him in real life for an issue I’ve been waiting for bug correction on the libzmq and it got answered nicely.

uWSGI

Another big thing in our python stack is the uWSGI application container which I love and follow closely even if my lack of knowledge in C++ prevents me for going too deep in the source code… I got the chance to speak with Roberto De Ioris about the next 2.1 release and propose him two new features.

I came across a recurrent problem at work which was to convert special characters such as the French-Latin accentuated letter “é” to ASCII “e” (this is called transliteration).

I wanted to avoid having to use an external library such as Unidecode (which is great obviously) so I ended up wandering around the unicodedata built-in library. Before I had to get too deep in the matter I found this StackOverflow topic which gives an interesting method to do so and works fine for me.

This version features a lot of stuff that I wanted to add to py3status for a long time and which I had in a small todo floating around. After the recent i3wm 4.6 release, I found it the perfect time to implement them while benefiting from the new click event support from i3bar.

The idea of allowing py3status users to have their modules respond to clicks got my head fuzzing and I started slowly to implement my whole todo while adding some great new features thanks to the enhanced i3bar protocol.

I ended up rewriting (again) completely (and slowly) py3status 🙂 But it’s for its own good, and yours hopefully. I strongly encourage you to have a look at the empty_class.py and pomodoro.py examples as they showcase the new on_click system.

You can already benefit from this bump without modifying your modules thanks to the default middle click event which forces a refresh of the module’s method you click on (handy isn’t it?).

changelog

support for i3bar click_events, they’re dispatched to user-written py3status classes based on their name/instance

add support for on_click methods in user-written modules to handle i3bar click_events (see the pomodoro example)

default is to clear the method’s cache (force a refresh) if you middle click (button 2) on a method’s output and the module does not support click_events

rewrite pomodoro example to showcase the on_click usage

use i3-nagbar to display warnings/errors to the user and also log them to syslog

new user-written module output ordering mechanism is more intuitive as it uses strictly numeric then alphabetical sorting

use select/poll() to implement a non-blocking I/O reading mechanism on threads

new Events thread is responsible for reading i3bar JSONs and dispatching them to the correct user module (click_events)

each user-written module is started and executed in its own thread

remove the pointless -d option

add a –debug option to be verbose in syslog (useful for debugging your modules)

add a real CHANGELOG

add a proper LICENSE file

make sure all examples are PEP8 compatible

update the empty_class example to explain on_click and kill usage

Note for Gentoo users : starting with this release, py3status is now available in portage so you don’t need my overlay anymore.

Yep, a quick bump of py3status to fix a bug reported by @lathan using python3. The private and special methods detection didn’t work on python3 because the class methods are reported differently from python2.

A special thanks to @bloodred and @drahier too for debugging, testing and proposing some solutions to this problem. First time I see multiple members of what I could humbly call the py3status community working together, it’s very nice of you guys !