Here’s an interesting mental exercise that recently I’ve found more and more valuable:

Test drive your code (duh).

Before each time you run your tests, no matter how small the changes you’ve made, ask yourself why the tests will fail. Don’t just gloss over this; be explicit. Say it out loud or write it down, if that helps. If you have a pair, tell your pair.

You might be surprised how often your assumptions turn out wrong. And, you might be surprised what you learn when you explicitly state those assumptions and then have to justify them when they do turn out wrong.

If you don’t test drive, or you don’t always test drive, (and you’re still reading), ask yourself this: what happens those times when my assumptions are wrong and I don’t have tests to protect me?

Multithreaded programming was a hot topic at RubyConf this year, and a common theme in many talks was the use of functional languages to prevent contention between threads. This totally makes sense to me, to the limited extent that I can wrap my head around truly functional programming, and I’m sure it’s an excellent approach. However, imagine a case in which we can’t just drop in a new language, so we need to write some multithreaded code in Ruby. I’m sure you won’t have to think too long or hard.

Now, one point several speakers at RubyConf made that I would like to reiterate is this: multithreaded programming is difficult. Gosh darn difficult. People who write software tend to thrive on determinism and linearity. After all, computers always do what we tell them to, right? They don’t make mistakes or change their minds; not like those silly, silly humans. But now big, bad concurrent programming comes along and suddenly computers can come up with different answers depending on, oh, the alignment of the planets. Chaos.

So, functional programming languages aside for the moment, what tools do we have to rein in the inevitable entropy that will, more than likely, eventually bring the planets into alignment against us?

Several years ago Andrei Alexandrescu wrote this excellent article on how to use the C++ type system (stick with me, it’s all Ruby after this paragraph) to automatically prevent race conditions at compile time. I recommend you read it; it’s quite short. Now, however you feel about static typing, you must admit that the approach he describes is a beautiful use of the expressiveness of the C++ type system. The question is, can we do something analogous in Ruby?

First off, lets start with some thread-unsafe code, similar to what Jim Weirich used in his talk on threads at RubyConf:

This code results in an account balance somewhere between 1 and 1,000. The exact value depends, of course, on the planets.

Now, to make this thread-safe. I came up with a few attempts using #alias_method and #undef_method, but didn’t find anything satisfying. After that, I figured I’d try a simple proxy to approximate the effect. Here’s a first cut:

The final call to #balance will actually fail, since it’s not in a locked_scope; it would be nice to be able to declare individual functions as volatile without too verbose a syntax.

It’s tempting to name the locked account object the same name as the unlocked account object, but doing so will cause them to overwrite one another (fixed in Ruby 1.9, of course, but until then…)

The unlocked object is easily available. Given the opportunity to circumvent the lock, someone will do something horrible.

I really like the idea of using blocks to scope behavior like this. This particular example doesn’t feel particularly clean to me yet, but hopefully it will give some people something to think about. If you have a better approach, please don’t be afraid to shout it out.

I wrote a bit about function objects here. However, if you don’t buy that the persistent state of function objects provides something that anonymous functions cannot, how about this: readability. In some cases.

Anonymous functions are boss and cool, and extremely common in idiomatic Ruby. However, in some cases they can get a little… esoteric. Consider:

I’ve been a C++ developer ever since I discovered the language in the early 90’s and I realized that my beloved Pascal had nothing on objects. I’ve spent plenty of time working with other languages, of course, and over the past year or so I’ve written almost exclusively Ruby. But, one thing I’ve missed about C++ is the ease with which you can make objects act like functions.

This may not seem particularly compelling for generating Fibonnaci numbers, but consider generators that may carry more complex state, or references to state owned by other objects. Consumers can also set initial state, such as a seed value for a random number generator, via the ctor.

Also, consider the generate method above. It expects a third parameter that supports function call semantics with arity of zero. And nothing else. That parameter could be a function pointer (should you desire statelessness and impenetrable syntax), a functor of any type, or anything else that supports operator (). That’s duck-typing, my friends. In a statically-typed language. Dogs and cats sleeping together, and all that.

Again, you cry, why would anyone care? Well, blocks in Ruby carry around the state of the context in which they were created, but sometimes you want more. For instance, if you pass your Proc object around your code may be clearer with the state explicitly encapsulated. The initial state may simply not make sense as local variables when you create the Proc. You may want to save some secondary value that a consumer can query the functor for (how many Fibonacci numbers has this generator generated). Or, perhaps you want consumers to be able to mutate the state in some way.

In any case, this functor approach wacked me over the head recently while I was looking at some code that used the Rails Symbol#to_proc. We all know that Rails adds voodoo to symbols so that

User.find(:all).collect(&:name)

is equivalent to

User.find(:all).collect { |u| u.name }

And, we all know that this works because the & operator, when applied to an object in a parameter list, will implicitly call #to_proc on that object and then convert the result to a block. This is vanilla Ruby functionality, Rails just adds #to_proc to the Symbol class.

After some investigation we uncovered changes in how rake 0.8.3 parses command line arguments. In particular, it doesn’t remove rake-specific arguments, like –trace, from the ARGV. So, when test tasks invoke the Test::Unit::AutoRunner class, it receives these arguments, fails to recognize them, and complains. Messily.

Unfortunately, we don’t have an immediate fix for you. We submitted a patch to the project, and Jim Weirich has already filed a bug report, so version 0.8.4 should resolve the problem.

Interesting Things

We have a project that generates a lot of highly precise floating point numbers. However, we primarily want to display these numbers with only two decimal places of precision. In addition, we want to display these numbers with standard comma delimiters to the left of the decimal point.

Sadly, the Ruby #sprintf method provides the former functionality, but not the latter. What to do? Use a Rails helper, of course.

The NumberHelper from ActionView provides some useful functionality, so we used that. As it turns out, we found the best way to get the formatting we want to be using the #number_to_currency function with no denomination.

Also, rather than mixing the entire helper into the Float class for just one method, we chose to mix the helper into a nested class and expose only the functionality that interests us. The result looks something like this:

Interesting Things

ARMailer and ExceptionNotifer: A match not so much made in heaven.

If you use the ExceptionNotifier plugin (and if you don’t, why not?) and you install the ARMailer plugin, your app will stop sending the exception notifications. You have been warned. Initial reports suggest that fixing the problem is relatively straightforward.

For those not in the know, ARMailer is a plugin that queues all outbound email in your database, to be sent later by a cron job or something similar. Good for not clogging up your server with email processing during peak load.

As I mentioned in this post, we’ve decided to set aside some of our weekly brown bags to spread around some knowledge on different technologies via a relatively informal presentation/discussion format. This past week we talked a bit about Solr.

This post covers much of what we discussed, ranging from the introductory to the somewhat arcane. If you’re a seasoned Solr user, this may not have much for you. But, you never know.

Solr: wtf?

For people who have never used Solr (me, for instance), I’ll start with the obvious question: what is it? At its most basic, Solr simply provides a web interface to the Lucene search engine. It’s written in Java and runs as a servlet inside a servlet container such as Tomcat or Jetty. The example application included in the distribution package includes Jetty, so you can get up and running relatively easily. You use Solr by sending your requests in the form of XML over HTTP; the responses also contain XML.

For those of you looking for sense in the world, I’m sorry: Solr isn’t an acronym, and to our knowledge doesn’t stand for anything in particular. It’s just a name with a vowel shortage.

You can find the home page for Solr here, a wiki for discussion of all things Solr here, and tutorial to get you started here. Finally, you can download the distribution (the current release version is 1.3.0) here.

But, why?

Let’s say you’re working on a site to help people find a physician. Users of this site might care about location, age, or gender of each physician. Your site might include how many pending malpractice suits each physician has, how patients have rated their bedside manner, or what magazines they stock in their waiting rooms. As a good citizen of the web community, you want to provide your users the ability to search for any combination of these criteria. You have all the information sitting in your database, so you should be able to search it, right?

Sure, no problem, but in order to ensure quick response times you’ll want to add indices on the columns in your physicians table. But, which indices to add? If your table has columns for age, gender, and rating, and you want to allow users to search on any combination of fields, then you need three indices to match all searches:

age, gender, rating

rating, age, gender

gender, rating, age

Keep in mind that indices match from left to right, and will only match on columns included in the query. Thus, if you allow searching on another column you’ll need to have eight indices:

age, gender, rating, mortality rate

age, gender, mortality rate, rating

age, rating, mortality rate, gender

age, mortality rate, gender, rating

gender, rating, mortality rate, age

gender, mortality rate, age, rating

rating, mortality rate, age, gender

mortality rate, age, gender rating

So, we quickly discover that we need n! / (n – 1) indices to search n columns, and this doesn’t take into account range queries. This could quickly get out of hand; Solr to the rescue.

Solr will build your indices for you based on the columns you tell it you want to search, it will keep these indices up to date as you add or change records, and it will do it fast.

More accurately, Lucene will do these things for you. However, Solr allows you to put Lucene on its own server that your application talks to via HTTP. This way all of your production servers can share the same Solr server, keeping searches consistent for all instances of your application.

At Pivotal we generally have a talk each Wednesday that regards some topic of professional interest. These “brown bag” talks generally involve something to do with interesting projects or new technologies that we might want to use; sometimes we invite speakers from outside Pivotal, sometimes Pivots do all the talking. We nearly always invite folks from outside Pivotal, and we always feed everyone lunch.

Unfortunately, keeping up a schedule of interesting talks takes a not insignificant effort, and most of us here have one or two other things taking up our time. So, sometimes a week rolls around without a scheduled speaker. Or, on rare occasions, a speaker bails or has to reschedule. When this happens we have to cancel our talk for the week, everyone misses a learning opportunity, and no one gets a free lunch.

So, we decided to fill in the gaps with less formal, but still informative, discussions of various topics. We plan to focus on technologies that we use on some of our projects, that some Pivots may know a lot about while others may know less. In short, a chance to spread around some knowledge.

This past week we had our first of these discussions on Solr. I took some notes and will publish some of what we discussed in subsequent posts. Lunch was buffet-style Burmese. Delicious.