Lisp and Java

Why learn a new programming language? Among other excellent reasons
(such as good, old-fashioned intellectual curiosity), there's the
opportunity to pick up useful techniques, tricks, and idioms that you can
apply in your day-to-day programming life. At its best, studying a new
language can give you the kind of conceptual shift that illuminates thorny
problems in a new light. Even if your mainstream language of choice doesn't
provide the special-purpose syntax that you find in a language you're
exploring, you can often find a way to implement the underlying technique
in a useful manner.

In this article, we're going to steal an idea from one of the most
theft-worthy languages out there: Lisp. We're going to pick out one of its
most useful features -- the ability to treat functions as data -- and talk
about how to apply this feature, in a slightly different form, in Java. In the
course of doing so, we'll give a very (very) brief introduction on how to read
Lisp code. We'll also develop a small but useful library for JDBC and
collections programming that you are welcome to use, abuse, and extend as
you see fit. We're going to use the Scheme dialect of
Lisp for our discussion, because it expresses the ideas we're interested in
in a particularly clear and elegant way.

Lisp Code and How To Read It

(this (is what)
(lisp code
(looks)
(like (more (or less)))))

A casual observer of Lisp code will notice the dazzling collection of
parentheses, without a heck of a lot else to visually break up the code.
Where in a language with C-descended syntax you get parentheses, curly
braces, commas, colons, semicolons, and a whole horde of other syntactical
signposts to keep you on your way, in Lisp, you get pretty much nothing but
( and its friend ). All other tokens are
separated by white space, and indentation, though extremely important
culturally, has no significance as part of the language itself.

For example, in Lisp, if you want to, say, call a function that doubles a
number, you write:

(double 7)

Side note: when discussing Lisp, it's common to show the result of evaluating an expression as follows:

(double 7)
=> 14

Which can be read as "(double 7)evaluates to 14."

In Java, there are basically two ways you can call a function: as an
instance method or as a class method. For the former, you'd have something
like:

aNumberSeven.double()

For the latter:

aClass.double(7)

For both of these cases in Java, you learn to see the word just before
the opening parenthesis as "hot" -- it's the verb of the phrase you're reading.
One nice thing about Java's instance method-call syntax is that makes it
very clear what the subject of that verb is (by matching the English
language's subject-verb-object order). In Lisp, the word immediately
after the opening parenthesis is what you learn to read as the verb (in
most cases), and there is no distinguished subject of that verb. This
pattern is used for functions that are infix operators in other languages,
such as plus and minus:

(+ 5 2)
=> 7

Or assignment (which has an undefined value, so we won't show it
evaluating to anything):

(define a 11)

which is more or less equivalent to:

int a = 11;

Note that Lisp doesn't specify the type of variable a
-- in Lisp, a variable can hold any type.

The intense regularity to the syntax can be forbidding. Parenthetical
digression: the payoff comes through the ability to represent Lisp programs
as Lisp data, which means that Lisp programmers can easily write programs
to manipulate other programs. Although that is indeed a trick worth
stealing or even spending a lifetime studying, it would take more of a book
than an article to explore it fully. For such a book, check out Paul
Graham's ANSI Common Lisp
or On Lisp (available
as a PDF on his web site), which contain many inspiring uses of the mighty
Lisp macro. End parenthetical digression.

Moving on through the rudiments of the language: the basic data structure
in Lisp is the list, which can be created in a variety of ways, one of the
simplest of which is via the list function.

(define lst (list 2 3 5 7 11))
lst => (2 3 5 7 11)

Treating Functions as Data In Lisp

Okay, now that we've got basic Lisp syntax and Lisp data structures
under our belts, we're ready to talk about the idea we're going to steal:
in Lisp, functions are first-class objects, meaning they have all
the "rights and privileges" accorded to other objects in Lisp. You can
name them with variables, pass them into functions, return them from
functions, and store them in data structures. They are no different from
"basic" data. This facility turns out to be enormously powerful.

As a simple example, you can define a function that takes another
function as an argument. Such a super function is known as a
higher-order function. If this is a new concept, the key thing to
understand is that you can, in Lisp, refer to a function without
calling it, much as you can, in Java, refer to a variable without
immediately using its value. In fact, in the Scheme dialect, the syntax
for function and variable reference is identical. To show this, we'll
introduce our first big higher-order function, map. The
map function takes another function and a list as arguments and creates a new
list by applying the function to each element of the original list, like
so:

Which looks great, but is, unfortunately, nonsense. In Java, you can't
pick up a method and pass it into another method. There is no simple way
to refer to the method at runtime -- you can only invoke it. You
can't easily pass it to another function, return it from a function, or
store it in a data structure. Through complex reflection tricks, these
things can be done, but it's not for the faint of heart. In Lisp, you can
do all of these things quite simply, and, in Lisp culture, you do them all
the time.

Why First-Class Functions Are Useful

One place where first-class functions come in particularly handy is in
dealing with collections. It's very common to have a collection of stuff
and to want to produce another collection of stuff. You'd like to ignore
the nitty-gritty of the looping code and just keep your mind on the
collection as a whole. A higher-order function such as map lets
you do just that.

An example: if you've done any work with aggregate data in Java, you've
gotten used to seeing your ideas disappear under a haze of iterator code.
In particular, one type of aggregate data that is tough to handle cleanly
is the JDBC ResultSet. In my day-to-day life, I write a lot of
Java code that pulls data out of a database and then displays it on a web
page. For all of you JDBC fans, the following type of code probably looks
all too familiar:

To a Lisp programmer, it would be natural to break down the above chunk of
code into something that generates a list of rows, and a function that is
then mapped over each row. How can we do something similar in Java?

Approximating First-Class Functions in Java

The only parts of our JDBC example that would change if we were collecting
a different type of object would be:

The connection, which was obtained outside of the block shown.

The string used in the query.

The body of the while loop.

The connection and the string are easy, since we can pass them in to a
method that captures the pattern. The body of the while loop is tougher,
because it's well, a chunk of code, which is hard to manipulate in Java.
In general, if you want to pass around code in Java, you have to create an
instance of a class. Thanks to interfaces and anonymous classes, we can
create an anonymous class just for the code in question.

To start our translation of the Lisp map idiom into Java, we define the
following interface:

An object that implements RowFunc is sort of like a function
in Lisp, in that it can be called (by invoking the of method),
passed to another function, returned from a function, and stored in a data
structure. Of course, we have to explicitly apply it, but it's a start. We
call RowFunc's method of so that you can read
an invocation of a RowFunc named f as
f "of" whatever it's applied to. I have a mathematical
background (and a possibly too-great love of concision). In any event,
here's how we use the interface to abstract our "collect all of the objects in
the ResultSet" pattern:

You'll note that we don't first pull all of the results into a
List and then map over that. Although that would be a
perfectly valid way to work, JDBC ResultSets are common enough
(and there's enough interesting ResultSetMetaData in them),
that it seemed worthwhile to have a special purpose method to map from a
ResultSet to a List. Later in the article, we
develop a nice way to go from a ResultSet to a
List of HashMaps, which is about as general as
you can get when dealing with unknown data.

With these general definitions in place, we can now rewrite our earlier
specific query using an anonymous inner class as:

Now, in this one case, we haven't saved much in the way of typing, though
it is certainly nice that the JDBC boilerplate has disappeared from our
code. However, if we need to create a list of users again somewhere else,
we can just pass in makeUser again:

If "create a User from a row of a ResultSet"
turns out to be a useful concept, having abstracted it into a function
that can be easily manipulated will prove its value over time. Speaking
of easy manipulation, wouldn't it be nice if we stored the
makeUser function with the User class? As
mentioned above, we can store these function-like objects in data
structures:

Note that before, User had a constructor that took three
params, whereas now we're just directly passing in the ResultSet. This eliminates the possibility of confusing the order of the args. Now our call looks like this:

Let's keep improving things. Every class we're going to fit into this
paradigm is going to have a nearly identical make static
method. That seems a bit wasteful. What if we could write a function
that returned a function to do it for us? We can do that, using a class
constructor and a bit of reflection:

The complexity of using reflection here is a clear demonstration of why
it's too tricky to use in day-to-day programming. As part of an
infrastructure (which is how we can think of RFMaker), it's
worth the effort, but the above multi-exception monstrosity should never
appear in normal code. It's not clear exactly how to best handle the host
of checked exceptions. We opt for a fail-fast runtime exception, which
seems reasonable.

User and any other classes that are going to be loaded from the database need only define a constructor that takes a
ResultSet as a param. At that point, you might want to put the SQL strings into the classes as well, so that all of the information about how to load, for instance, a User from the database is stored in one place.

Extending the First-Class Function Approach

"Why not use an object/relational bridge?" we hear you say. Yes, yes,
yes, you could use one of the innumerable object/relational bridges out
there, possibly specifying the table-to-class mapping via a set of XML
files (and can we just say, ugh). Yes, it's possible that that would make
some of the simple queries above disappear altogether. However, the really
nice thing about the first-class function approach is its flexibility: for
example, it can be easily extended to handle more complex queries (as we
saw above for the newUsers), or even new kinds of loops.

To make this concrete, let's say you need to use a prepared statement
for your SQL query. If you're still building Users, you can
write a new mapping function, and use the sameRFMaker
constructor. This is where the power of having abstracted the
User creation code away from the looping code really starts to
shine. Et voila:

Or, you could decide that you'd like to create HashMaps
from each row in your ResultSet, rather than building
instances of a specific class. This can be a very nice approach, if all
you're going to do is display the resulting list via a template system such
as Velocity or Freemarker. Taking advantage
of some ResultSetMetaData, we can write the following
RowFunc:

Note: The above code suffers from some inefficiency because it
rereads all of the column names for each row. If this proved to be a problem,
it would be fairly easy to create a variant of makeHash that
was initialized with an array of column names once per ResultSet.

We've just scratched the surface of what you can do with first-class
functions. As we confront problems in our programming practices, we should
always be striving to write code that states what it does as clearly and
simply as possible. The first-class function approach, by giving you new
tools to capture patterns in your code, can lead to clear and elegant
solutions to a wide variety of problems. The new abstractions that you
can build will often let you step back and see your overall program in a
new light. And that's something always worth learning.

Resources

Lisp has a long and complex history, with enough brilliant innovations,
bitter rivalries, and failed startups to keep a team of historians in
tenure. For some details on where the bodies are buried, check out
Richard Gabriel and Guy Steele's excellent paper, "The Evolution of Lisp" (PDF).

Dan Milstein
is an independent programmer and consultant in the Boston area.