Guerrilla DevelopmentMessaging may not be the way to Build a Distrbuted Systemby B. Scott AndersenJuly 31, 2005

Summary
Eric Armstrong's recent article on Artima caught my eye and made me
think a bit about the evolution of distributed programming techniques.
In the beginning, there was the message. But what now?

Advertisement

In the beginning...

... there was the message. Eric Armstrong's recent article on Artima
Messaging is the Right Way to Build a Distributed System
caught my eye the other day. I've not had a chance to write anything for
Artima in quite a while and perhaps responding to this piece is as
good an excuse as any to get back in the habit.

Eric's article made me begin thinking about the evolution of
the distributed programming techniques I've seen and done in
the last two decades. While a message passing paradigm started
the ball rolling, I believe we've progressed far from that mark.

CSPs

I first started thinking about distributed computing using
messaging in about 1985 while working on a project that was
trying to make a loose collection of
devices communicate over a twisted pair network.
It was pretty primitive compared to what we do today, but the
problems were all the same: remoteness implies longer latencies,
the possibilities of partial failure, managing parallelism,
and dealing with the pure tedium of message formats and the
protocols that carry those messages.

The model we used then was reflected in Tony Hoare's book
Communicating Sequential Processes. CSPs are indeed
an excellent way to organize distributed components with simplicity.
Each service need only have a single thread (it could have more, but
more aren't required for this model), they take a message, perform an
action, and send a reply message to report the results to the
initiator.

Message paradigm wrapped in an RPC

A few years later, I was running a project building a piece of
semiconductor fabrication equipment and I employed the same
approach. The machine had a large number of moving parts and
several racks of computers providing the computing power to handle
the motion control and supervisory functions. The CSP approach
made managing parallelism about as easy as it was going to be.
The model was so intuitive that I constructed a programming language
called MIL (the Machine Interface Language)
which I had hoped technicians might be able to use to calibrate
and diagnose problems with the machine.

MIL could format messages and send them on the machine's message bus.
One of the mantras I'd chant was, "everything had to be on the bus!"
While this probably drove the engineers nuts, the approach ensured that
the power of the network and this message passing paradigm extended into
all of the nooks and crannies of the machine. If it works for the big
stuff, it works for the small stuff, too.

The message handling primitives of send and receive were
quickly wrapped into procedure forms: one form for a regular RPC
(send, wait for the reply), and call-no-wait. (The majority of the code
written for the machine was in C and the C programmers also wrapped
their message stuff in RPC wrappers.) The RPC form looked
something like this in MIL:

status := ssl_move(x:=5, y:=10, timeout:=40000)
if (status != 0) then
...
end if

To do things in parallel, you would use the no wait
form of the subroutine call which would just send the message
and not wait for the reply. Waiting for the reply was done in
a separate step. MIL's syntax for using the no wait
form looked something like this:

In the above code, we sent messages to both the
left substrate stage (ssl) and the reticle stage
to have them initialize. The _no_wait
forms of these routines just sent the messages to
the respective subsystems and returned a
handle that could be used later to
recognize the arrival of the return message. So,
in this example, we started the initialization
process for two of the machine's subsystems and
then, once all this activity was started (in
parallel), we could wait for their completion with
the wait command. The
wait command takes a variable used as
the status return value from the send
call as the first parameter, and takes a time out
value (in milliseconds) as the second value. In
this case, we just start waiting for the return
messages. In the normal case, both return messages
will arrive, both wait calls will
succeed before their timeout kicks in. The
if statement after the two
wait calls asks if either of those
two waits had failed by seeing if we
are still waiting. All of this is
pretty simple and it provided a very powerful,
intuitive model that most everybody, engineers,
technicians, test designers, etc., could use.

Notice, though, that the underlying mechanism, message passing,
eventually gathered another layer of abstraction on top of it.
This wasn't to obscure the remoteness of the various
subsystems inside the machine; it was to provide some syntactic
sugar that made the concepts clearer and more palatable to
the developers.

Moving more than data in our messages

After this project it was time to waste some career cycles doing
stuff for the web. I'd done some SQL stuff in previous jobs
but it was interesting to me how much programming was being
done in the database layer. It wasn't just all the SQL query
strings that were being passed around, though there were
plenty of those, it was also the number of stored procedures
that were being used. It seemed like some combination of SQL
and the stored procedures were a programming platform in its
own right. The lines for when code was installed and used
was starting to get blurred in my mind. Were SQL statements
data or code? How about stored procedures? If you install
stored procedures and then exercise them along with a bunch
of other SQL code, is that mobile code? Or, is it
just code that was subject to a lightweight install
or temporary install. Was there a difference?

Is a connection to a database, and the passing of SQL
statements just business as usual in Eric's messaging
model? Or, is it more along the line of mobile code?
Clearly, the purpose of moving the SQL statements to the
database is to have the processing done closer to the
data with only the results passed back to the initiator.
That sounds like mobile code to me. In fact, the whole
point of the database connector is so I don't need to
know how the magic of the transport and messaging
between my program and the database works.

Why should I care about the protocol?

When I'm trying to talk to a service like a database
I really don't want to care about the underlying
transport or how they organize the messaging beneath.
I've been given a different, higher-level view than
all that and I'm happy for it. My program has to be
able to handle database errors (including a lost
connection to the database), but I don't need to
understand any of that stuff the database connector
provided.

In fact, the particular transport and protocol for
the interactions with the database are a private
problem between the database vendor and the provider
of the connector technology. If they release a new
version of the database and connector, they could
even change this protocol completely and I, after a
quick recompile and relink, shouldn't care a bit.
Wouldn't it be nice if more stuff worked like that?

More stuff does work like that

The idea of moving code isn't new. Back in 1993
and 1994 General Magic carved out a nifty idea of
having a scripting language regular users could use
to help organize their personal communication. Here's
an example of what this could have done.

Let's say you set up some rules for how you wish to be contacted:

If email is received during 8-6 M-F route to office mail.

If email is marked urgent, route to office mail and copy to
my Blackberry.

If I haven't picked up a message during normal business hours
and a message is 4 hours old (or older), send a message to my
Blackberry with the subject only.

If I receive a message from "boss" during normal business hours
that is not picked up in 30 minutes or less, call me on my
cell phone and do a text-to-speech of the message
(so I can pick it up at the ballpark... Hey, I don't want to
be fired...)

Telescript was going to do stuff like this and much more.
Here's the blurb from a
Byte magazine article
from that era:

"Telescript, a communications-oriented programming
language comparable to C or Pascal, will let developers
create network-independent intelligent agents and distributed
applications. Some of these applications, in turn, may be tools
that let ordinary users create intelligent agents without
programming. What PostScript did for cross-platform,
device-independent documents, Telescript aims to do for
cross-platform, network-independent messaging. General
Magic hopes Telescript will become a lingua franca for
communications."

We didn't get a world filled with Telescript. But, we do
have a world filled with Java and we've been moving Java
code since the beginning of the Java movement. Applets,
those sometime cutesy and even occasionally useful pieces
of Java code that get loaded into web browsers are an
example of mobile code. Sometimes those applets talk back
to the computer that served them up to our web browser.
Again, the browser provides a place for the applet to
run and the applet provides a service within the page,
but whatever protocols or messages it exchanges with
the remote computer are invisible, and uninteresting,
to us. Like the database connection utilities, I don't
really care how it does its job.

More stuff should work like this

Moving code around, whether it be SQL commands,
telescript, or Java code, has much to recommend it.
There are a couple of things that moving code
accomplishes:

Computation happens in the right place, and

details like protocol specifics and message formats can be hidden.

Jini, a distributed computing model
for Java, does move code around and for both of these reasons. But,
there is some reluctance even in this forward-looking group to fully
embrace moving code to solve some problems. I had a long running
conversation on the JavaSpaces list where I lamented that some of
the proposed changes to the JavaSpaces specification made this Spring
were trying to move data when moving code made more sense to me.
You can read my prattling
here.
Who knows, maybe I was off my rocker--but it seemed like if moving SQL
commands to the database for data selection made sense, moving code to a JavaSpace
for data selection made at least as much sense.

The relationship between the JavaSpaces client and the JavaSpaces
service is managed by the mobile code of the service's proxy.
Just as the database connector technology provided a local proxy
for the facilities supplied by the database, the JavaSpaces proxy
supplies a facade for the JavaSpaces service. The only difference
between these two cases is when the linkage-editor functionality
happens. If you believe "later is better", you're gonna love Jini
and JavaSpaces.

The evolution from the messaging paradigm

We've come a long way since the simple exchange of messages.
We started with just simple messages, protocols that had
to be agreed upon up-front, and Hoare's CSP model. But
as time progressed, there was pressure on that model.

Pressure to move the computation closer to the data
The SQL commands you send in every database query are an
example of moving the computation closer to the data. You
can buy more computing power; it is hard to buy more network
bandwidth.

Pressure to keep a familiar program model like subroutine calls
Messages are fine but the RPC programming model is often what
you're trying to do anyway: make a call, get an answer, move on.
Why not use a something that looks like a procedure or
function to do that?

Pressure to hide unnecessary details
I don't care about the protocol between the database and
whatever connector technology I'm using. In fact, the
less I know the better off everybody is. If they want to
change it in their next release, that's fine with me.

Pressure to control both ends of the connection
If you have to declare message formats and protocols up-front,
that means a lock in that could be uncomfortable later. If
instead you simply offer an API for the service (as the database
connection utility or Jini service would do), you can control
both ends of the protocol and the message formats because both
of these things are implementation details below the visibility
of your end-users.

Distributed computing's growth over the last half decade has
been astonishing to me. And, while I'm still a fan of Hoare's
simple model, I recognize that pressures have encouraged an
evolution of tools, techniques, and attitudes that have made
distributed system development easier.

It may have all started with the message... but we've come a long
way since then.

Resources

Communicating sequential processes (Prentice-Hall International
series in computer science) (Paperback), C. A. R. Hoare, 1985,
found on Amazon
here. This was one of the first texts I had seen on the topic.
It was a bit heavy on the symbology and the math, but once I got it,
I was hooked. I still have the hard-bound edition on my shelf.

RSS Feed

About the Blogger

B. Scott Andersen has 20+ years of experience in software
development splitting his time between individual contributor and management. He is now a Principal Software Engineer with Verocel, Inc., a company specializing in helping safety-critical system developers attain certification for their products. The opinions expressed here are his own and he takes full responsibility for them... unless, of course, they are worth money, at which point they belong to his employer.