Month: October 2008

Some time ago, I started to develop programs using tiny, super general, higher-order functions and I began to borrow a little from functional programming. I noticed interesting things about these new programs. They called for special language features that I had until then heard of but never really needed. I’m referring to language features like support for first-class functions, closures, and lexical binding. The programs I produced were smaller and easier to debug than programs written in a traditional procedural style. Quite suddenly, I was able to write programs faster than ever before, I was delivering much higher-quality code, and I had found a whole new love of programming.

I work with data sets with millions of records, so often I need to write a program that I use only once. Sometimes, not by choice, I have to use PHP, Java, XQuery, C#, Visual Basic.NET, the old ASP, and even C or C++ to create or maintain applications. To deliver code or accomplish tasks at the highest possible rate, I use Perl most of all, together with a number of widely-employed programming techniques, such as object-oriented programming, extensive use of libraries (I try as much as possible to avoid the not-invented-here syndrome), and revision control. But I also use techniques that are not in such widespread use, despite the fact that they’ve been around for quite a while. I am going to describe one of those techniques here.

If you are a Lisp programmer, you probably already use this technique. For me, learning Lisp elevated the idea of abstraction and conciseness in code to a whole new level. I am a far better Perl programmer for having learned a tiny bit of Lisp. This is all the more impressive if you consider that I’ve never even put a Lisp program into production. Most of what I learned I learned by solving problems in projecteuler.net and by extending my text editor, Emacs.

If you are not a Lisp programmer, however, you might be wondering what the advantages of concise code are. By “concise code”, I don’t mean code that uses fewer characters or lines because of shorter variable names, less white space, longer lines, or other such economizing measures. Instead, I mean code that uses super general functions to achieve a high degree of abstraction and that borrows from functional programming to reduce the use of intermediate variables to generate simpler, more reliable modules.

Functional Programming

Functional programming means writing code such that the return value of the function depends completely on the function’s parameters and nothing else. Functions have no side effects. Coding in a functional style provides a number of advantages. Functional programs are trivial to parallelize or distribute among large clusters. They are easier to maintain, enhance, and debug. Large functional programs are easier to transform. You’d be lucky to learn that a program that you need to diagram or convert to another language was written in a functional style. And functional programs require fewer intermediate variables because functions are easy to chain together.

Another interesting thing about functional programming is that the order in which functions are executed flows obviously from the interdependencies of the functions.

To grasp the concept of functional programming, just think of a spreadsheet. You enter functions in the cells, never really worrying about which function is going to execute first or how the cells that your function references arrive at their values. There’s a reason why spreadsheets evolved to use functional programming: they had to be easy to maintain. Users could not be made aware that they were actually programming. Everyone knows that programming is difficult, and few users would have adopted spreadsheets if it meant that the users had to program spreadsheets in a traditional procedural fashion.

The code I describe here is not strictly functional. It is multi-paradigm code. The functions that the code defines are all actually methods that sometimes change the value of attributes of the objects to which they’re attached. But the code borrows from functional-style programming.

Super General Functions

A super general function is one that performs a task that is general enough that the function can be used for a whole class of problems rather than just a specific kind of problem. Typically, a super general function is a higher-order function, which means that the function accepts one or more other functions as parameters.

Loop keywords are super general functions in my view, because they are like functions that accept code, in the form of the body of the loop. But when I think of of super general functions in Perl, I’m thinking more about the built-in super general functions map, grep, and sort. And also about the myriad of other such functions and methods within the libraries of CPAN, like reduce and part. I will discuss several Perl functions that are super general and then I’ll write about how to code custom super general functions, because identifying the need for such functions and then coding them is the most effective way to make your code extremely concise.

sort

A good example of a super general function is the Perl sort function. It sorts. That’s what makes it general. It doesn’t just sort string values alphabetically, in ascending order. And it doesn’t just sort numbers either. It’s a super general function because it can sort anything, in any which way. You can provide code to the sort function to indicate how to sort the array of strange objects that you are giving to the function.

The Perl sort function (like the sort functions of many other languages) allows you to specify if you want to sort the elements of the array by name or age, and provides a mechanism for specifying how to compare the elements of the array. Here’s how you could sort the array by age, in ascending numerical order:

my @sorted_list= sort {$a->{age} <=> $b->{age}} @list;

To sort the list by name, in descending alphabetical order, you could do this:

my @sorted_list= sort {$b->{name} cmp $a->{name}} @list;

Of course, you can use Perl’s flexible sort function in a more complex expression or combine it with other functions to perform a great many tasks. For example, the following code selects the name of the oldest and youngest persons on the list:

While this code works well for small lists, I would never use such code to find the oldest and youngest persons if I had a very large list of records. Instead, I might use the reduce function or maybe a combination of the minmax function and a hash. But this example certainly shows how general Perl’s sort function can be.

Now imagine that you need a list of distinct ages with person counts next to each age and with the ages sorted by that count, like this one:

41 => 3
40 => 2
45 => 1

This result shows that 41 is the prevailing age in the group. Results such as these are useful when graphing the distribution of age of group. To obtain such a result set from the array in Listing 1, you could do this:

# Create a hash where the keys are the ages of people in @list and the
# values are the count of people in @list with the given age.
my %x;
for (map {$_->{age}} @list) {$x{$_}= $x{$_} ? $x{$_} + 1 : 1}
# Take the keys from the new hash, sort them by the people counts
# associated with those keys, then create strings showing the keys (ages)
# and their corresponding values (counts)
my @age_counts= map {"$_ => $x{$_}"} sort {$x{$b} <=> $x{$a}} keys %x;
print join("\n", @age_counts), "\n";

Think about how many lines of Java code, for example, would be required to implement the functionality of that Perl code. Yet that functionality is trivial to follow if you understand what the sort and map functions are doing.

In the same fashion, understanding a program with custom super general functions can be trivial if you first take the time to understand what the custom super general functions do. With that understanding, the rest of the program will seem short and simple.

By providing sort functionality in the form of a higher order function, Perl has given you the ability to perform sorts of arbitrary complexity. You could sort a list of strange objects by some field first, then another, and then another. And you could make the sort function assume a default value when it encounters a field with a null value or an object that doesn’t even have the field you’re sorting on.

The flexibility that the sort function provides by virtue of being a higher-order function comes with the added benefit of allowing you to write concise, readable code. You can sort things in complex ways using a small fraction of the code that would be required in other languages and your code will also be easier to follow.

grep

Like the sort function, the grep function performs a simple task in a very general way that allows you to apply the function to great variety of problems. The grep function is like the Lisp remove-if-not function. It provides the simplest mechanism for choosing some of the items in a list. Given a list and a test expression (in the form of some code), the grep function returns a new list that contains elements from the first list that pass the given test.

Let’s take a look at an example. EAN stands for European Article Number. It’s like a UPC, the number that you see in bar codes on packaged cheese and DVD players. EANs are not so European any more, but rather an international standard. Any standard 12-digit UPC in the US can be represented as an EAN by prepending a 0 (zero) to the UPC. Most bar code readers these days understand EANs and see UPCs as EANs.

An ISBN (International Standard Book Number), which you’ll find on the cover of most books in print, is also an EAN. ISBNs start with the digits 978 or 979. So if you see an EAN that starts with those digits, you know that you’re looking at an ISBN.

If you needed to choose the ISBNs from a given list of EANs, you could accomplish the task with procedural code like this:

my @isbns;
foreach (@eans) {
push @isbns, $_ if /^(978|979)/;
}

Those who like Perl like it in part because of how clearly and succinctly most tasks can be represented in code. The above code is a good example of this. But Perl allows you to improve on that greatly. I could use the grep function to perform the same task, like this:

my @isbns= grep {/^(978|979)/} @eans;

There are two really nice things about using the grep function. Typically, it is faster than the equivalent loop. And, the code is much shorter and clearer, which results in code that is easier to follow, to embed into or combine with other code, and to debug. The only requirement it places on the reader of your code is that the reader must understand what the grep function does.

Listing 2 shows a couple of more contrived examples of selecting specific EANs.

As you can see, the grep function can make task of coding of filters almost like not coding at all. And the resulting code is clear, more so than almost any custom code that could accomplish the same task.

reduce

The reduce function will serve as my last example of a super general function that is distributed with Perl (it’s in the List::Util Perl module).

Most people I’ve worked with have never used the reduce function, which is surprising because it applies to such a large and common-place set of problems. I didn’t use it for a large portion of my career. The reduce function is useful any time you need to arrive at a single value given an array of values. For example, you can use the reduce function if you need to find the largest or smallest value in an array, or if you need to add all the values in an array.

Suppose you need to find the average of all the numbers in array @numbers. Here’s some procedural code to do that:

Don’t forget to include the following use line at the top of your file if you want to use the reduce function:

use List::Util qw/reduce/;

What if you wanted to find the largest value in the @numbers array?

$largest= reduce {$a > $b ? $a : $b} @numbers;

The reduce function, like the other super general functions I’ve described so far, tells the programmer reading your code exactly what you’re trying to do. And it does so at first glance. You’ll have shorter, faster, clearer code at once if the problem you’re tackling falls into the broad category that the reduce function can solve. And the same is true for many other super general functions accessible to Perl, especially those that are implemented in C.

Custom Super General Functions

Though using super general functions can definitely make your code significantly more succinct, it is writing super general functions that can shorten, simplify, and enhance the quality of your code the most. When a super general function contains a bug, that bug is more likely to surface than a bug that some lone loop in the program might contain. The loop will probably run once and will perform one specific task. The super general function, on the other hand, will likely have to run more than once and might perform slightly different functions during each run. Many different parts of the program might call on the super general function to perform slightly different tasks. Accordingly, in the end, the stress on the code in the super general function is higher and a program is less tolerant of faults in the super general function than of faults that exist in other parts of the program.

You might tend to reuse super general functions. Once you’ve written one, you’ll typically start viewing more problems in terms of that super general function. So you might end up using it in more than just one program. When you write a super general function, you’re solving a much wider range of problems than when you write a regular function, so you’re more likely to need the super general function again and again. This creates an effective natural pressure to make the super general function work correctly, without bugs.

Let’s create a program that demonstrates the usefulness of custom super general functions. We’ll start with a simple program that sends a file to an FTP server in a reliable fashion. Then we’ll add the ability to reliably retrieve and delete files from the FTP server. We’ll create put, get, and delete methods to perform those functions. Once all of that is working, we’re going to factor out the code that makes the put, get, and delete methods reliable. The factored-out code will go in a super general function that we’ll be able to use to make any operation more reliable.

Our example program is going to start as two files. We’re going to create a Perl module called ReliableFtp that will work much like the CPAN Net::FTP module works, except that it will retry an operation in a reasonable fashion when the operation fails. The other file is going to be ftp.pl, which will simply instantiate the ReliableFtp module and call its methods.

Later, we’ll change things around a bit and we’ll end up with a simple with_retry method that will allow us to perform any operation reliably. The put, get, and delete methods will become very simple when they start using the with_retry function.

The ReliableFtp module consists of one constructor, two methods, and and AUTOLOAD and DESTROY functions. The connect method connects to the FTP server, logs in, and changes to the default remote directory, all using values that you specify when you instantiate the ReliableFtp class with the new method.

The put method tries to send a file to the server (using NET::FTP’s put method). If the send fails, the method waits a little while and then tries to send the file again. Each time the send fails, the method waits a little longer before trying again. But the method gives up altogether after a few tries, returning a false value. If the send succeeds at any time, however, the method returns a true value.

The AUTOLOAD and DESTROY functions simply ensure that the methods that the caller is calling exist. If not, ReliableFtp exists with a message that includes a list of valid methods.

At this stage, there’s very little functionality that you could factor out. But let’s add a get method to the ReliableFtp module. This get method should work much like the put method, trying and retrying to fetch a file, returning a true value when successful and a false value after a number of repeated failures.

We’ll add a delete method too, for good measure.

Listing 5 shows the additional code for the ReliableFtp module, to support the new get and delete methods.

We'll need to tell the AUTOLOAD function about these new methods, but
doing so is trivial. Just locate the line with the list of methods that
ReliableFtp supports, then add the new methods to that list. The modified
line in the AUTOLOAD function should end up like looking like this:

# This is a list of the methods that ReliableFtp supports
qw/put get delete/;

And that’s it! Our program is now able to reliably put, get, and delete files on a remote FTP server. It works just as advertised. If we had to add another method, we could simply copy the code of one of the existing methods, put, get, or delete, and then change the code slightly to implement the new method. Or, we could factor out some code. Let’s think about that.

In the ReliableFtp module, put, get, and delete all have calls to the connect method. This makes sense because otherwise we’d have to include the code from the connect method in each of put, get, and delete, and then the program would be significantly longer and harder to follow and it would just look silly. Instead, the code to connect to a remote server exists in a separate function.

Still, the put, get, and delete methods retain code that is quite similar among the three methods. The only difference in the code between the methods is that each method calls a different method of the Net::FTP object. The put method calls Net::FTP’s put method, the get method calls Net::FTP’s get method, and so on. There’s no other difference. To factor out the code in the put, get, and delete methods, you’d have to create a function that contains all the code of one of the methods, but that accepts a parameter indicating the specific function to run. The idea is that with the help of such a function (we’ll call it with_retries), the put method would look a lot simpler. Listing 6 provides an example.

The put function in Listing 6 is only 4 lines long. That’s a tremendous improvement over the original, already succinct put function in Listing 4, which required 13 lines. But let’s take a closer look at the definition of put in Listing 6. Several subtle but interesting things happen in that tiny function. To start, the with_retries function is a super general function. Also, note that put passes a parameter-less function to with_retries, so that with_retries is able to execute the function like this: $function->(). The $self and $file variables resolve to values at the time the definition of the anonymous sub executes and the anonymous sub doesn’t look at any parameters (@_) that might be passed to it later, when it is called.

The with_retries function looks a lot like the original put method from Listing 4, but with_retries accepts a function as a parameter and executes that function, returning true upon success.

If the with_retries function encounters a problem executing the function, with_retries simply waits a little while and then tries executing the function again. Repeated failures will eventually cause with_retries to fail by returning a false value.

The with_retries function requires two things from the function that you pass in. The function must be parameter-less. And, the function must return a true value upon success and a false value upon failure. With a little code, almost any function can be made to behave like that, so with_retries should work with an enormous variety of functions.

Now that we have the with_retries method, we can easily rewrite the put, get, and delete methods. If we need to add other methods in the future, we can do so with minimal code (probably 4 lines) and a reduced chance of introducing a bug.

The natural thing to do now might be to move the with_retries function to some kind of utility module, where it could co-exist with other super general functions.

Languages such as Perl, with full support for first-class functions and lexical closures, make the sort of factoring work described here much easier than other, more rigid languages. These language features can help you to build truly elegant programs and to avoid the typical boiler plate code that you see in less flexible programming languages.

Conclusion

It can take a little while to become accustomed to thinking in terms of super general functions and writing programs in a functional style. But once you start doing so, you begin to simplify otherwise complex tasks. And you start to feel like you’re capable of tackling more complex problems more elegantly, with less code that is more readable and reliable.