Monthly archive for December, 2013

Every computer program expands until it can read e-mail – or so they say. But many applications need not to read, but to send e-mails; web apps or web services are probably the most prominent examples. If you happen to develop them, you may sometimes want a local, dummy SMTP server just for testing this functionality. It doesn’t even have to send anything (it must not, actually), but it should allow you to see what would be sent if the app worked in a production environment.

By far the easiest way to setup such a server involves, quite surprisingly, Python. There is a standard library module called smtpd, which is built exactly for this purpose. Amusingly, you don’t even have to write any code that uses it; you can invoke it straight from the command line:

$ python -m smtpd -n-c DebuggingServer

This will start a server that listens on port 8025 and dumps every message “sent” through it to the standard output. A custom port is chosen because on *nix systems, only the ports above 1024 are accessible to an ordinary user. For the standard SMTP port 25, you need to start the server as root:

$ sudo python -m stmpd -c DebuggingServer localhost:25

While it’s more typing, it frees you from having to change the SMTP port number inside your application’s code.

If you plan to use smtpd more extensively, though, you may want to look at the small runner script I’ve prepared. By default, it tries to listen on port 25, but you can supply a port number as its sole argument.

There is a particular way many programmers tend to think about “algorithms”. I put the word in quotes because it’s a really just a small subset of actual algorithms that they refer to by this term.

In short, “algorithms” are perceived as self-contained boxes, classic procedures that take data in and spit results out. They are blocks, or packages, stowed in a corner of some bigger program, coming out only to perform they designated function. Clearly separated from the “usual” code, they may even occur only as libraries, both standard language libraries or third party extensions.
What’s important is being out of sight, out of mind. You are not supposed to delve into them. You can use them, or add new ones – sometimes even implementing them yourself! – but once you’re done, you get back to “real” code. Normal code, typical code, usual code; code that doesn’t have those “algorithms”.

Amusing as it is, this mode of thinking may actually be spurred by reading works such as the classic Introduction to Algorithms by Cormen et al. Since the subject is intrinsically quite heavy and full of intricate theory and math, it’s very tempting to skirt its surface. Sure, you need to know about most of the important algorithms, but you don’t really have to know them. It’s perfectly acceptable to skip the details of their implementation.
And why not? You can always go back and look it all up! Most of the important stuff is already implemented anyway. When was the last time you had to roll out your own sort() function?… Great programmers reuse, haven’t you heard? We should all stand on the shoulders of giants!

Well, that’s lovely. But I’ve got some bad news. There are no “algorithms”. You cannot compartmentalize programs to have demarcated “difficult” parts. You can try, of course, but it’s futile. They won’t comply, because the pesky reality itself will inevitably tear down any walls you choose to surround the “algorithms” with.

Let’s say you have your isolated, tricky procedure. Over the lifetime of a program that contains it, any of the following is very likely to happen:

The input data becomes too large to be stored completely in memory. At least some operations will need to be performed at a smaller portion, and then results will have to be merged.

Waiting for complete input becomes unfeasible. You are now required to compute the result incrementally (or online, to use the traditional term), adjusting it every time a new chunk of information comes in.

While computational complexity of the algorithm is still acceptable, practical factors mandate limiting of e.g. disk reads or network requests. The procedure needs to be reworked to incorporate additional caching, or pre-computation, or preliminary analysis of input, or obtaining more data outside of main logic, or…

User studies indicate a growing level of discontent about the lack of UI feedback when performing the Complex and Time-consuming Task™. To alleviate this, the code must now report at least approximate completion progress, so that the interface can be made more responsive by adding a certain widget.

In an attempt to improve throughput, the application has switched to using asynchronous I/O. As a result, the algorithm must be split into several callbacks and state has to be somehow passed between them. (Extra credit if the programming language has no support, or poor support, for closures).

Many, many other situations could be added to this list, but the point should nevertheless be clear. It doesn’t matter how tightly you’re going to shut the tricky procedure off the rest of the code, the ongoing evolution of the latter will eventually effect changes on it. After a while, you won’t be dealing with a single algorithm, but a whole subsystem with multiple tendrils touching other parts of your program.

At this point, you can’t pretend it can be confined to a dark, forgotten place anymore. Now it has to be taken into account when dealing with the “normal” code on daily basis.
And it’s nothing to fear of, because it simply is just a normal code.

What is your first association evoked by a mention of the Java programming language? If you said “slow”, I hope all is well back there in the 90s! But if you said “verbose” instead, you would be very much in the right. Its wordiness is almost proverbial by now, with only some small improvements – like lambdas – emerging at the distant horizon.

Thankfully, more modern and higher level languages are much better in this regard. The likes of Python, Ruby or JavaScript, that is. They are more expressive and less bureaucratic, therefore requiring much less code to accomplish the same thing that takes pages upon pages in Java, C# or C++. The amount of “boilerplate” – tedious, repetitive code – is also said to be significantly lower, almost to the point of disappearing completely.

Sounds good? Sure it does. The only problem here is that most of those optimistic claims are patently false. Good real-world code written in high level, dynamic language does not necessarily end up being much shorter. Quite often, it is either on par or even longer than the equivalent source in Java et al.
It happens for several reasons, most of which are not immediately obvious looking only at a Hello World-like samples.

You need copious amount of documentation

It is said Python is pretty much pseudocode that happens to be syntactically strict enough to parse and run. While usually meant as a compliment, this observation has an uglier counterbalance. Being a pseudocode, the raw Python source is woefully incomplete when it comes to providing necessary information for future maintainers. As a result, it gets out of hand rather quickly.

To rein it, we need to put that information back somehow. Here’s where the documentation and commenting part steps in. In case of Python, there is even a syntactical distinction on the language level between both. The former takes the form of docstrings attached to function and class definitions. They are meant to fill in the gaps those definitions always leave open, including such “trivialities” as what kind of arguments the function actually takes, what it returns, and what exceptions it may raise.

Even when you don’t write expansive prose with frequent usage examples and such, an adequate amount of docstrings will still take noticeable space. The reward you’ll get for your efforts won’t be impressive, too: you’ll just add the information you’d include anyway if you were writing the code in a static language.
The catch here is that you won’t even get the automatic assurance that you’ve added the information correctly… which brings us immediately to the next point.

You need a really good test suite

There are two types of programmers: those who write tests, and those who will be writing tests. Undoubtedly, the best way to have someone join the first group is to make them write serious code in any interpreted, dynamic language.

In those languages, a comprehensive set of tests is probably the only automated and unambiguous way to ensure your code is satisfying even some basic invariants. Whole classes of errors that elsewhere would be eliminated by the compiler can slip completely undetected if the particular execution path is not exercised regularly. This includes trying to invoke a non-existent method; to pass an unexpected argument to a function; call a “function” which is not callable (or conversely, not call a function when it should have been); and many others. To my knowledge, none of these are normally detected by the various tools for static analysis (linting).

So, you write tests. You write so many tests, in fact, that they easily outmatch the very code they’re testing – if not necessarily in effort, then very likely in quantity. There’s no middle ground here: either you blanket your code with good tests coverage, or you can expect it to break in all kinds of silly ways.

You need to keep abstractions in check

Documentation, comments and tests take time and space. But at least the code in high level language can be short and sweet. Beautiful, elegant, crisp and concise, without any kind of unnecessary cruft!
Just why are you staring at me like that when I say I used a metaclass here?…

See, the problem with powerful and highly expressive languages is that they are dangerous. The old adage of power and responsibility does very much apply here. Without a proper harness, you will one day find out that:

the clever trick you’ve used half a year ago is not nearly as obvious as it was back then

the programing style which feels natural for one member of your team is nearly incomprehensible to some of the others

new people have serious trouble grasping all the neat abstractions you’ve used in your project so liberally

Unless you can guarantee you’ll always have only real Perl/Python/Ruby rockstars on board, it’s necessary to tame that wild creativity at least a little bit. The inevitable side effect is that your code will sometimes have to be longer, at least compared to what that smart but mystifying technique would yield.

I don’t believe, of course, that this is the first time you may have encountered them. Nor that they are even half as complicated as the definitions above would suggest.

But although they appear rather awkward, what I wanted for those formulations to highlight is one particular way of interpreting the min and max functions as they are used in code. You may think it’s easy enough to read them quite literally (“get me the smallest/largest value”), and in many cases you are absolutely correct:

scores ={

'Alice': 10,

'Bob': 12,

'Charlie': 11,

}

winning_score =max(scores.values())

Often though, we don’t want to get the extreme value from a set, list, vector or other collection of unspecified size. Instead, we call the min/max functions on a few arguments that are known and “hard-coded” upfront. In many cases, this is mostly done to spare us from introducing a verbose if statement, or a more cryptic ternary operator (?:).

Or is it? I was recently surprised when discussing some very simple programming exercise on one of the IRC channels I frequent. Someone pointed out how my proposed solution is quite unusual with its (over)use of the max function. The task goes somewhere along these lines:

You are presented with a device that has n counters (c0, …, cn-1) and n+1 buttons (b0, …, bn). Every counter ci is incremented whenever a corresponding button bi is pressed. Additionally, pressing the extra button bn sets all the counters to the maximum value displayed on any of them (e.g. [1, 4, 3] → [4, 4, 4]).
Find a way to compute the final state of all counters given a sequence of k buttons presses.

Just by reading the above description and implementing it in the most straightforward way, we can trivially arrive at an algorithm which solves the problem in O(nk) time. And since we need to go through the sequence of button presses at least once, the lower bound for complexity of any other solution is therefore O(k).

My version was a simple improvement over the obvious one, scoring O(n+k) at the cost of maintaining a negligible amount of extra data:

min_state = max_state =0

The noteworthy application of max function was responsible for updating one of those values:

max_state =max(max_state, button_states[press])

If you look closely, you will notice how the first argument serves as a reference point, while only the second one is the actual ‘input’. What this invocation is saying is essentially “make max_state at least as big as it was before – and possibly bigger”.

Hardly a groundbreaking insight, eh? This approach, however, allows to rapidly parse even complicated applications of min and max. If you encounter, for example, an inlined version of a “clamping” function that is written in the following manner:

I’m pretty sure it will take you at least a few moments to grok what it’s doing. The max–min compound is puzzling, and the meaningful arguments (a and b) are disconnected from the function names. Had it had more nesting, you might even have needed to *gasp* count the parentheses.
But we know it’s all about trimming a number to the <a; b> interval. Why not express it in a way that readily highlights this fact?