Last active September 25, 2013

Embed URL

HTTPS clone URL

SSH clone URL

An IPython Notebook walk-through of how to make an interesting key function for comparing (sort/min/max) in Python. View it at the notebook viewer: http://nbviewer.ipython.org/6696813
Corrections welcome

"source": "At the moment I am rather enamoured with key functions in Python. The functions [`sorted`](http://docs.python.org/3/library/functions.html#sorted), `min` and `max` take an optional key argument that, if specified, should be a function that is applied to each item in the iterable. The result of this key function is what is used when comparing items in the list to order them.\n\nHere is an example simplified from my work. I have a list of 'sentences'. Each sentence is essentially a sentence of a forecast, describing a weather element or type, such as precipitation (precip), thunderstorms (TS), sky, fog (FG), large hail (H), gusty winds (w). Each sentence has a time range associated with it (TS may only be present in the late evening, for example). So what order should things be reported in?\n\n * In general, things should be reported in the order that they occur, so based on the start time. If the start times are equal, then compare the end times.\n * If the start and end times are equal, report them in the same order as a reference order (e.g. [sky, FG, precip, TS, w, H])\n* But also, make sure the TS are reported after the precip, if there is precip. (This is a bit of business logic: because the TS and precip are related phenomena and generally occur together, it could be jarring or absurd to those who know about such things to see TS mentioned before precip, even if the TS is being reported as starting earlier.)\n * Also also, some items like H only occur in the presence of other elements like TS. In this case their sentence will actually be something like _Thunderstorms possibly severe in the afternoon with large hail_ which emphasises their relationship to the TS. In this case we need to mention the H directly after the TS regardless of the time ranges.\n\nComplicated rules like this make it tempting to give up on using the built-in methods at all and just use a hand-crafted method which picks apart the elements and painstakingly pieces them back together according to requirements. However I would argue that is more error-prone, difficult to debug, hard to modify, and that with a couple of tricks we can keep using the built-in methods and still have a clue what is going on.\n\nI will also add, don't overlook the benefit of using these with min and max, where you may want to find the \"best\" result amongst a set according to cascading criteria. I use this frequently at work in combination with exhaustive search (aka [brute-force search](https://en.wikipedia.org/wiki/Brute-force_search) or \"generate and test\"). If the combinatorics are not going to explode on you too badly, I think it's a good way of knowing you have arrived at the \"best\" result.\n\n\nOK, so our data looks something like this:\n"

"source": "Already we are using the first feature of Python sorting - tuple sorting. Python sorts tuples by comparing all first elements, then if they are equal, second elements, and so on. Hence our winds (w) sentence is coming before the H, because 12 < 15. It seems pretty obvious, but this is the basis for making quite more complicated key functions.\n\nHowever we haven't incorporated the requirement to sort by priority, if the time ranges are equal. It happens that 'w' is sorting after 'precip' in the above example but we haven't enforced it, which we can confirm directly:\n"

"source": "Also to confirm that what is happening is what we expect, we can print out the value of calling the key function on each item in the iterable. This is super useful in debugging your sort and is a major benefit over [the old method](http://docs.python.org/3/howto/sorting.html#the-old-way-using-the-cmp-parameter) of influencing sort by writing a `cmp` method (that takes two items from an iterable, and returns -1/0/1 according to which should be ordered first). I can't praise this highly enough. My workmate recently rewrote a cmp function that was \"mostly\" right (read: I couldn't figure out how to fix it) into a key function that I have the highest confidence in - because of this exact reason."

"source": "Now I have confidence in our sorting!\n\nTo return to the trickier requirements. The TS is being listed ahead of the precip. What can we do about that?\n\nWell, maybe a useful thing to do is try and come up with the results of a key function that would give us what we want, and work backwards.\n\n<pre>\n(0, 0, 0, ...), Sentence(sky, 0-0)\n(12, 24, 2, ...) Sentence(precip, 12-24)\n(?, ?, ?, ...) Sentence(TS, 9-24)\n(?, ?, ?, ...) Sentence(H, 15-21)\n(12, 24, 4...) Sentence(w, 12-24)\n</pre>\n\nSo the thing that jumps out here is that the TS and H need to have the same values as the precip, so that they are sorted next to each other. But in our key function we only have one element of the iterable at a time. If the TS Sentence is passed to the key function we wouldn't generally have access to information about other elements in the iterable. So what we can we do? Well making a closure can help us out..."

"source": "Here we are relying on Boolean sorting - False < True. The value of this fourth field is irrelevant to the non-precip/TS/H sentences as these are already being sorted on the first three fields.\n\nNow TS is following precip, but we also need the H to follow the TS."

"source": "Success! The fifth field is forcing the H after the TS.\n\nNow this key function is not totally correct. There is one immediate error. If we have TS and H but no precip, we haven't done anything to ensure that the H will still follow the precip. Witness:\n"

"source": "Note that we had to generate a fresh key function because our input data was different.\n\nCorrecting `generateKeyFn` is left as an exercise to the reader. :) Here are some test cases, though. (Written for [py.test](http://pytest.org), likely to work with nose although untested)"

"source": "----\nWritten with thanks to [this post](http://www.garann.com/dev/2013/how-to-blog-about-code-and-give-zero-fucks/) for a reminder to blog every now and then.\n\nReference reading:\n\n * [Official sorting HOWTO](http://docs.python.org/3/howto/sorting.html)\n * Google Developers [Python sorting](https://developers.google.com/edu/python/sorting) (good diagram for explaining use of the key function)"