Python Pieces – Working with etcd

Ah ha! Surprise – I’ve decided that in addition to the blog posts on MPLS and ExaBGP that I might as well start up a third series. Well – that’s not entirely true – but instead of trying to mix all sorts of details about Python into the blog posts, I thought I might split out some of the larger pieces. So Im starting a new series called “Python Pieces” where I’m going to pick one module, concept, or whatever else I decided warrants a post and talk about how to use it. Then – if/when I use that in one of my other posts – you’ve got a handy reference and starting point. I hope this makes the other posts less “all over the place” but we’ll see.

So – in my first edition of Python Pieces we’re going to talk about working with etcd from Python. For those of you that don’t know what etcd is – it’s a pretty popular key value store that’s used with lots of the more recent projects (Kubernetes comes to mind). What’s likely more important about etcd though is that it’s capable of being a distributed key value store which makes it a pretty popular way to store data across a cluster of machines. This also means that it’s designed to handle all sorts of failures and bad things. So to summarize – it’s a pretty darn cool key value store. Luckily for us – working on it from Python is pretty easy to do. The first thing we need to do though is install etcd on a test server. To do that, we simply do a…

Cool! So it’s running and we know it’s responding to commands. So now let’s talk about how to interact with it from Python. To do that, we’ll use the etcd3 Python module which we can also easily install using pip3…

Note: If you don’t have pip3 installed you can install it with sudo apt-get install python3-pip. Also note that Im not doing this work in a virtualenv. I know – but trying to keep things simple for now.

pip3 install etcd3

Alright – so once we have all the bits – we can start writing some code! Let’s start with the basics, writing and reading keys….

Alright – so this is pretty easy. We import the module we just installed then use it to create a new client. Since the etcd instance is running locally, we don’t need to tell it where to look for etcd. Next we do a put operation to install key/value pairs. And lastly, we retrieve the key/value pair by using the get function and passing it the key we’re looking for. Note that the return of the get function is returning us two values which is why we set the return equal to both value and metadata. If we run this code, we should see output like this…

So above we changed our get function to a get_all function. Notice that since we are asking for what I like to summarize as “all the things” we don’t need to specify a key to look for. Also – since get_all is returning more than one key/value pair we need to include it as part of a for loop so we can iterate through each set of items. The output is just as we expected though – all of the keys and value we were looking for.

So while you can certainly keep adding unique key/value pairs at this base level – it often makes more sense to use what looks like a directory structure. So let’s start over by deleting all of the keys currently in etcd using this handy dandy command…

Note: For some reason to get the get_range function I had to install the latest code from Github and then use the setup.py command to install the module. Best I can tell the get_range function is only in the ‘latest’ build and for some reason not in the ‘stable’ build. Any who – you can do that with this command..

git clone git://github.com/kragniz/python-etcd3

Or just stay tuned until later on and use one of the other functions we talk about (cough get_prefix cough).

In the above program, we create 2 sets of values insides 2 different prefixes. The first prefix is /names and includes a series of keys where the values are the last name. The second prefix /addresses includes a set of keys where the address is the first name, and key is the address. Notice that we changed the call once again and are now specifying the function get_range. Also notice that we’ve indicated that we want a range starting at /names/jon and ending at /names/mary. Let’s see what happens when we run it…

Huh. Not what I expected. So what’s going on here? Why did I only get the key related to Jon back? To understand this – and future concepts around etcd – it’s important to understand how etcd stores data. Allow me to digress (hopefully just momentarily).

The etcd data model is described as being flat. That is – all of these keys are just sitting in the same place. There is no hierarchy despite the fact that we are making what looks like one by using the / to separate keys. So that being said – when you want to select a range of keys you need to have some means of selecting just the ones you want. Setting things up the way we’re doing it now with slashes helps – but you need to understand a bit about the searches to understand how to select what you want. This “what you want” it typically a set of keys with a given prefix. For instance – above we had two distinct prefixes – /names and /addresses. So since we have all these keys in one big flat space how does selecting a range of them help? Well when we ask etcd for a range, it sorts the keys for use by default in an ascending order. By doing so, you ensure that similar keys are next to each other and are sorted. This is the reason for the odd return we just saw above. Etcd is sorting the keys and then applying our range request to them. So while we entered the keys in a particular order…

So what all that fancy conversion of the string did was to encode the string in utf-8, then convert it to a byte string. If we referenced an Ascii chart we’d see that character 47 is /and 115 is s etc. So looking at this – it should be obvious that at character 8 we have a divergence and that the Bob and Jon keys are out of order. When etcd sorts this, our get_range command now makes a heck of a lot more sense. If we start at Jon and go until Mary then the only item to return is Jon.

Some of you might be asking about Mary and why she isn’t included in our return. To include Mary, we’d need to step back even further. Let’s change the code once more to look like this…

Now our range start has been defined as /names/ and our range end is /names0. So while I can see wanting to start at the root of the prefix we’re interested in – what’s going on with that end range? Let’s look once more at the output we saw from our converted byte string…

So if we want to capture all of these keys – and all of the other keys that might land in /names/ we need to look at the entire range of possibilities for that given prefix. To do that, we need to be interested in character 7, the second instance of the number 47 which in string format was the second instance of the / character. If we look at our Ascii chart again, we’ll see that the character following 47 (/) – number 48 is the number 0. Do you see where this is going now? By specifying that we want to see a range that includes everything from /names/ to /names0 we’re asking for everything that comes before /names0in our huge flat (sorted) namespace. So we could have a million records with the prefix /names– but the instant that prefix changes – and goes to something like /names2/ those entries no longer fall within the range. The key to remember here is that it is a sorted list.

Luckily for us – the Python module we’re using has another function simply called get_prefx which does the range work for you. Let’s use it below as part of our second for loop as we extend the example further…

Above we tied the two lists together by creating a 2nd loop. In the first loop we loop through all of the names. On each name, we loop through the second lookup which looks at all of the addresses and tries to match the keys up. If we find a match, we use the information from that lookup in conjunction with the first lookup to print out a string. We also use the split function to split off the prefix part of the key so we’re only dealing with the key we actually want. Notice that we’re achieving a similar return in each loop despite using the get_range function in the top loop and the get_prefix function in the bottom loop. Your output should look like this…

Again – notice that etcd has sorted the returns for you (Bob comes before Jon). So this is sort of handy – but not terribly efficient. What if we could put more data inside of a value? Is there a way we could combine these two prefixes into one key/value pair? You sure can. For those of you familiar with Kubernetes – you might already know that you can store chunks of JSON as the value for an etcd key. Here’s an example I shamelessly stole from this site that shows this…

Whoa! Ok so that’s a lot – but not really that complicated. What I’ve done here is created one big Python list called list_of_people. Inside that list I have 3 entries, each one has a first and last name dictionary item, and a third dictionary called address which has a list of dictionaries as it’s value. Each of those dictionaries gives more information about the address. We then take that list, iterate through it, and turn each of the 3 base dictionaries into a JSON object. We then push this JSON object as a value into etcd using the address as the key. Since we assume two or more people won’t be at the same address we use that as the key hoping that will be unique. If you run the above code, you wont get any output, but we can then use etcdctl to query the key/value pairs…

So. We started with a Python data structure (a list) and then used the json module to turn that set of lists and dicts into a JSON object. That JSON object can then be stored as a value in etcd. Pretty cool right?!?! Let’s now do our same loop to print out fancy sentences…

So the only real change here is that I added that last if block to check and see if the key we found in etcd matches the one we pulled from the value. This darn well better match and when it does it will print a statement telling you it’s deleting the key and then delete the key from etcd. A not terribly useful example – but it at least shows you an example of deleting keys from etcd. Note that I had to add the the rest of the key prefix into the delete statement since the value we set to key had that stripped off. If we run this, we should see the message that etcd is deleting the key, and then if we run the command to see what’s in etcd afterwards we see that it’s empty…

Alright – cool. But that wasn’t super interesting. Let’s explore one of the other features etcd has beyond simple read/write/delete actions that I think is particularly appealing. Etcd has the ability to “watch” a certain key. Let’s make a new Python script called etcd_watch.py …

Ok – so the above it pretty straight forward. We first define a function called etcd_watch_callback that simply prints out a statement. Then we tell the etcd client to add a “watch callback” for the key /list/jon and reference the function etcd_watch_callback. Lastly – we start an infinite loop so the Python script keeps running. The idea here is that etcd will watch for the key /list/jon and when it sees it created it will run the function referenced for that key in the callback. So let’s run it…

Nice! But this by itself isn’t super handy because we aren’t using any information about the key/value pair in our function. Luckily, we can pull the key and the value out of the event variable. Let’s change our script as follows…

Now if we run it and push the following key/value in our second window…

ETCDCTL_API=3 etcdctl put "/list/jon" "Langemak"

We should see this in the first window…

[email protected]:~$ python3 etcd_watch.py
You created the key /list/jon I was looking for with value Langemak

Cool! So that can be pretty handy if we want to keep an eye on certain keys that are created. But now go ahead and try inserting that same key again through etcdctl and then try deleting it. Your Python program output will show this…

You created the key /list/jon I was looking for with value Langemak
You created the key /list/jon I was looking for with value Langemak
You created the key /list/jon I was looking for with value Langemak
You created the key /list/jon I was looking for with value

Two more puts and then the last one was a del to delete the key/value. So clearly the callback is being triggered on any action related to the key. So how do we distinguish? Let’s try this…

The real different here is that we’re checking to see what ‘type’ of event the callback is receiving. Now if we do a couple of creates and deletes we should see different messages…

You created the key /list/jon I was looking for with value Langemak
You deleted the key /list/jon I was looking for with value
You created the key /list/jon I was looking for with value Langemak
You deleted the key /list/jon I was looking for with value

Nice! But again – here we’re only looking for a single key and that by itself isn’t incredibly useful since that implies you know exactly what key to look for. What if you want to watch for keys you aren’t aware of? The problem is there isn’t a add_watch_prefix_callback feature for us to lean on here. However – the add_watch_callback feature does support ranges. And if you read this whole blog – you now know how to work with them too! So let’s try it out…

You created the key /list/jon I was looking for with value Langemak
You created the key /list/marty I was looking for with value Smith
You created the key /list/sheila I was looking for with value Oryan
You created the key /list/ryan I was looking for with value Timmbi
You created the key /list/taylor I was looking for with value Bartholemu
You created the key /list/billy I was looking for with value Biscuit

So it works! Yeah! Well – there’s actually a lot more we can cover like locks and transactions – but Im hopeful this is enough to get your feet wet and get you excited about working with etcd. Maybe if there’s interest I can come back and cover further items later on.