thraxil.org:

A Simple Programming Puzzle Seen Through Three Different Lenses

The other day, I stumbled across Mark Nelson's blog post describing a fairly simple NPR word puzzle: "Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?"

Mark Nelson is a programmer, so his first instinct of course was to write a small program to solve it. Mine too. I immediately stopped reading his post, opened a new emacs buffer and spent five minutes coming up with this:

Sure enough, it gave me the answer (and like Mark Nelson, seeing it made me realize that it probably would have been even more productive to just think about it for a minute instead of writing the code).

To me, that Python code is about as simple and direct as it gets. There's nothing clever going on there. Just loop over the cartesian product of states x states, skip a couple obvious cases and then normalize the names of the states by sorting the letters and put them into a dictionary. As soon as you hit a combination that's already in the dictionary, you have your answer.

That approach to me would qualify as a "naive" or "brute force" approach. I figured there might be a more optimized approach, but with only 50 states it wasn't really worth optimizing any more. Running it took all of 18 milliseconds.

When I went back and read the rest of the blog post, my jaw dropped. His first pass, in C++ involved a quadruply nested loop, STL libraries, and was taking upwards of an hour to run. Looking at his code, he really took the notion of a "brute force" approach to a level I hadn't even considered. His second and third passes improved the runtime and even simplified the code, but they still run on the order of a few seconds. That's for a compiled, low level language. My Python code written without any particular thoughts toward efficiency running in an interpreter (so the time of starting up the python process, parsing the code, and interpreting it directly, line by line without compiler optimizations are all included in the running time) was beating it by a couple orders of magnitude.

The key, of course, is the algorithm. Python, like Ruby, Perl, and other relatively high level languages, has supports a dictionary (or hashtable if you prefer) data type at the syntax level. As a result, anyone programming in one of those languages quickly learns how to make the most of them and becomes familiar with a number of idioms, including the one I used of testing for uniqueness by keeping a 'seen' dictionary, inserting keys one by one and looking for a collision. It's dead simple, commonly used in scripting languages, and extremely efficient since inserting into a hashtable is O(1) and tends to be one of the most finely tuned parts of a scripting language's interpreter/VM.

There's nothing Python specific about the algorithm. In fact, in the comments on the post, Vince Huston submits a C++ solution that's basically identical to my approach (and probably outperforms everyone else handily). If I were forced to solve the same problem in C++, I would probably have come up with something similar. I would not be at all surprised if Vince Huston has had some experience coding in scripting languages and Mark Nelson hasn't.

The point isn't that Mark Nelson is a bad programmer. Far from it. Looking around at the rest of his site, at articles like his explanation of the The Byzantine Generals Problem (which was how I ended up on his site in the first place), and at the books he's written, I'd guess that overall, he has more breadth and depth to his understanding of algorithms and programming than I do.

My point is really just to repeat the tired, broken record cry of advocates of higher level languages that 1) using a faster, more appropriate algorithm will usually go much further in optimization than low level optimizations (using a compiled language, static types, fretting over clock cycles in an inner loop, etc) and 2) learning different programming paradigms, languages, and idioms will improve your programming even if you end up going back to a lower level language. In this case, some familiarity with dictionary idioms common in scripting languages helps immensely in producing a fast, simple solution.

Another comment on his post goes even further. Tom Moertel submits a solution implemented in Haskell. From a performance standpoint, it's pretty much the same approach as mine, using a dictionary (in this case, Haskell's Data.Map library) to do the comparisons on normalized strings. What makes it a really nice solution though, is that he approaches it by starting with a "clusterBy" higher order function that takes a signature function and a list and returns a list of groups clustered by the signature (my explanation is bad; his examples make it much more clear). Essentially, instead of directly solving the problem, he creates a general purpose tool which can then trivially be applied to the specific problem at hand. clusterBy is the kind of function that I could see being useful to other problems. So not only does he end up with the solution to this problem, he also ends up with a useful tool to make future programming tasks simpler. Haskell's smooth support of higher order functions makes this a natural approach and it seems to be the way that proficient Haskell programmers end up taking on most problems.

Python had the good taste to steal a few items from Haskell's bag of tricks though, so I was inspired to try a Python version. Here's a rough equivalent of clusterBy:

It takes the tuple of two state names, sorts all the letters in them and returns that as a string. The final piece is just to apply cluster_by with the normalize function to the cartesian product of states x states and find the resulting entries with multiple entries (and since I was going Haskell-style, I decided to use list comprehensions instead of for loops as well):

clustered = cluster_by(normalize,[(x,y) for x in states for y in states])
print [x for x in clustered if len(x) > 2]

It runs in 25 milliseconds on my machine, so slightly more overhead than the procedural approach, but close enough to not matter and, IMO, it's cleaner and would lend itself more to further reuse of the cluster_by or normalize functions.

So first knowing an idiom from a scripting language can produce better C++ code (see Vince Huston's solution), then taking a lesson from Haskell I'm able to come up with better Python code.

I'm waiting for a Prolog programmer to come along and smoke everyone next.

You can also do this one-liner which works without adding any new functions (though you need to import itertools):

print [zip(*v)[1] for v in [list(v) for k, v in itertools.groupby(sorted([(''.join(sorted(s1 + s2)), (s1, s2)) for s1 in states for s2 in states if s1 < s2], key=lambda v: v[0]), lambda v: v[0])] if len(v) > 1]

Of course, that's clearly crossing the line from "clever" into "obfuscated", but oh well. It was fun to write. :)

Well, the problem of course is that at this point, JIT-compiled dynamic languages can blow away statically compiled languages in performance. A proper JIT can extrapolate most data types, and kill the overhead of having dynamic types, except when you actually use them. Ditto for most other high-level features. At that point, you're at the same speed. In addition, there is a number of JIT-specific optimizations that, at this point, give you about a 10 percent boost over statically compiled. In addition, the GC can optimize the cache (in typical cases, cache-optimized code is 10x faster than unoptimized, although the GC will likely gain a bit less than the 10x from hand-optimizing memory layout).

In the long term, you can expect JITs to pick data types. I say "I want a list", and the JIT picks whether I have an array, a linked list, or some kind of tree structure, based on statistics of past use.

At the company where I started programming, hash tables were the constant metaphor. We built routers. Small, home routers for AppleTalk and IP. AppleTalk requires the knowledge of all nets by all routers (non-scalable, and not widely used now), so the answer to any question was a hash table. You'd never use a btree, you'd never use a sorted list. My favorite from that chunk of code was a three-way indexed table, which was moderately innovative coding for the time.

Today's common availability of hash tables is less about scripting languages and more about the wide availability of excellent support classes built into languages. I credit Java for starting the mad rush to excellence, and dun C++ for retarding innovation. The STL classes are poorly designed and don't include some of the most useful structures, while Java entered with a strong collection framework and worked hard to fix design flaws (example: early versions forced synchronization when not needed, later versions created a synchronization wrapper class). I don't count Java as a scripting language, but perhaps you do.

Thus, I agree with your premise, although I believe you are too quick to judge "low level languages" and too slow to judge the programmer. A programmer who doesn't follow tools improvements, and improve themselves, is a poor programmer.

My professional programming has turned to Perl, Java, and C. If I want the complexity of objects, I want the structure of Java. I may, someday, migrate from Perl, as I'm still struggling with the syntax for references, and at some point you just try another language.

In any case, it's a good time to be a programmer. A hash table used to be an annoying, scary construct, hand coded, now it's something you simply new up, insert to, and go to town. Sweet.

I really wasn't judging the language. If I were writing the same program in C++ I would have come up with pretty much the same algorithm and probably code pretty close to what Vince Huston had. The point is that having experience in higher level languages or different programming paradigms can improve your skills even if you're coding in a lower level language because it gives you new tools for thinking about the problem. The obvious algorithm to a C++ programmer involved a bunch of nested loops while the obvious algorithm for someone with scripting experience involves a hashtable and the obvious solution for a Haskell programmer involves reusable higher order functions.

Anyone reading this post and seeing anything controversial I think is reading too much into it. It pretty much boils down to "having a broad understanding of programming is good."

I have nothing against C++ or Java programmers or anyone else unless they have the mindset of "well, I know $LANGUAGE and I can get work done in it so there's no point in me learning any other language that might make my head hurt." A Lisp programmer who thinks they would have nothing to learn from studying assembly programming I would consider equally foolish.

Don't dis it until you've tried it. Many BASIC programmers don't see the point of pointers (or, for the politically correct, references) or real data structures. Indeed, if you explain the things you can do with a linked list, they'll show you how to emulate one in BASIC. In fact, although both languages are Turing-complete, and so can do the same things as the other, both have different mindsets because they make different things easy. Similarly, many C/Java programmers (such as yourself) don't see the point of closures, functional programming, and dynamic programming. It is a different way of programming, and a different way of thinking about programming. Coming from a background of C, Java, and a little bit of Perl, it is unlikely you'll be able to grasp the power that comes from being able to fundamentally modify your programming language to the task at hand.

There's a big difference between having a hash table class (Java), and being able to integrate a hash table into the syntax of your language (dynamic, functional programming languages).

I'll add that while Perl has much of the functionality of modern scripting languages, it has a very different mindset.

30 second getting started guide if you decide to pursue this: If you're interested in what you gain, the best book is SICP by Abelson and Sussman. If you're interested in using it, the best language is Ruby, and a good demo of it is Rails.

It took me a while to understand what the puzzle was asking me to solve. Is that a sign I'm a poor programmer? I mean, I had a suspicion which turned out to be correct, but it took me probably 15-20 minutes to confirm it (after rewriting the author's script in Ruby).

I'm always curious how great programmers think and solve problems, and to be honest, it worries me that you were able to "immediately stop reading," take five minutes and have it solved. In other words, I feel pretty slow sometimes.

i was just talking with someone about learning spoken languages, and how the things i appreciate most about it is the way it allows you to think in different ways, simply because you now have vocabulary (or language structures) you lacked before. it's definitely true of programming languages, as well. the point isn't to argue about whether high- or low- level languages are "better" (as it of course depends on what you're trying to accomplish), but to add more tools -- and more flexibility -- to the toolkit in your head.

btw, i thought about it for a second and came up with the North/South thing, too. is there a valid answer aside from that obvious one, or did you over-complicate things? :)

>> northcarolina + southdakota = northdakota + southcarolina

Here's another Haskell implementation. Like you, I didn't bother to read ahead before I solved it, and I'm not a proficient Haskell programmer, so I did not think of making a clusterBy. Here goes anyway:

I'd like to contribute a ruby solution. While the logic of using a "seen" hash is re-used, this is fairly concise due to the ruby libraries' tendency to allow cascading calls, and the flexibility of specifying the default not-found-key-handler of the Hash object:

I'm not so sure about your assertion that Mark Nelson isn't a bad programmer. If you put me in C and didn't let me use my usual hashtable library, I still would have instinctually gone for an array sorted by keys. Since the list is short, a binary search isn't going to take much longer than a hashtable lookup, and it doesn't require implementing a hashtable from scratch. I probably would have done that on my first shot, and certainly if I'd thought about the problem long enough to optimize and write a blog post about it.

Maybe you could claim that my instinct there comes from knowledge of scripting languages, but I was doing that kind of thing in C before I'd ever worked full-time in a scripting language. And isn't knowledge of your trade part of what makes you a good programmer?

I had heard about the groupby' function, but never aboutclusterby' (which seems much more useful). You r first solution that you posted look pretty much how I would had done it. Although, I have to admit that Tom Moertel's solution is quite elegant.

Anyway, I improved slightly your translation of Moertel's solution. I think you will appreciate the few tweaks I did:

yuck, the indentation got clobbered, and the underscores turned into font changes. in the groupby, use (a,x)(b,y) instead of what you see there. in the original I had underscores for x and y but they melted. x and y work fine, they just get ignored. also you may have to fix the indentation to get the thing to parse.