Summary
Ken Arnold, the original lead architect of JavaSpaces, talks
with Bill Venners about loose coupling in JavaSpace-based systems,
why fields in entries are public, RPCs to nowhere, and
building systems that sway with failure.

Ken Arnold has done a lot of design in his day. While at Sun Microsystems, Arnold
was one of the original architects of Jini technology and was the original lead architect of
JavaSpaces. Prior to joining Sun, Arnold participated in the original Hewlett-Packard
architectural team that designed CORBA.
While at UC Berkeley, he created the Curses library for terminal-independent screen-oriented programs.
In Part I of this
interview, which is being published in six weekly installments, Arnold explains why
there's no such thing as a perfect design, suggests questions you should ask yourself when
you design, and proposes the radical notion that programmers are people. In Part II,
Arnold discusses the role of taste and arrogance in design, the value of other
people's problems, and the virtue of simplicity.
In Part III,
Arnold discusses the concerns of distributed systems design, including the need to
expect failure, avoid state, and plan for recovery.
In this fourth installment, Arnold describes the basic idea of a JavaSpace,
explains why fields in entries are public, why entries are passive,
and how decoupling leads to reliability.

Bill Venners:What is a JavaSpace?

Ken Arnold: The basic idea of a JavaSpace is to introduce
loose coupling between actors in a protocol. For example, you have a
question. If you find a server that can answer your question and you
send it a message, you are tied directly to that server and have to deal
with its failure modes. With a JavaSpace, you instead write an object,
called an entry, into the space. Somebody who can answer
the request that entry represents then extracts it and writes back the
response.

The kind of loose coupling JavaSpaces enables has several advantages,
as loose coupling always does. The simple, direct advantage is
scalability. Say you have one server and many clients. The clients
write requests, and the server extracts them and writes the responses
back. Now let's say the server gets overburdened; in that case, you
need to accelerate it. With a directly coupled system, you have to make
"the place people talk to" faster. Or you have to change people's logic
so that something in the system knows how to distribute the load.
Distributing the load is an interesting problem: Who is busy? Who is
not? It's sometimes hard to tell.

In the JavaSpaces model, you just start up a second server. Now you
have two things retrieving requests from the space and writing results
back, so performance increases roughly twice as fast. In fact, it is
distressingly close to linear when you do this, for a long period.
Because you decouple requesters from request handlers and how
requests are handled, you can just have multiple request handlers. You
can break the request down into two parts; then something that can
complete one part efficiently will retrieve it and write back an
intermediate result that somebody else knows how to finish. You can
partition the result in different ways. The customer only cares that the
results come back.

JavaSpaces has essentially three primary operations:
write, which puts an entry into the space;
read, which reads an entry from the space; and
take, which is equivalent to reading except it removes the
entry as well as reads it. The entry is a simple kind of object. It has a
set of public fields that all have an object type, with no primitive
types. Writing an entry into the space is as simple as creating one of
these entry objects and writing it. To do a read or a take, you create an
entry of a preferred type, you fill in the fields whose values you care
about, and you pass it to a read or take
method. A filled-in field has to match exactly. A field that is not filled
in—a field left null—is ignored. If I ask for one type, I
can get a subtype. JavaSpaces provides a simple way for me to ask
you to do something, for you to do things you know how to do, and
then for me to look for the results. You can design protocols on top of
this basic set of operations.

JavaSpaces is sometimes called a kissing cousin to Linda, the work on
which it is based. Whereas Linda was structural, JavaSpaces has added
objects to the system. JavaSpaces is a distributed, object-based
system. JavaSpaces has transactions like some Linda systems, but it
has distributed transactions. So JavaSpaces clearly is an
inspired work. We took the insights David Gelernter and his crew used
to create Linda, and applied them to a new domain, with differences
that make sense to that domain.

Bill Venners:Why are fields in entries public?

Ken Arnold: We could have used typical accessors, such as
get and set methods. In any pair of
get and set methods, such as in a
JavaBean, there's a contract. The documentation of the JavaBean says
that setting this value will result in the following behavior. One option,
for example, could define get as "get next," where the
returned value monotonically increases. set could mean
"set the starting point,"—reset where the value returned by
get monotonically increases from. That is a legitimate
get and set contract. In random number
generators, for example, you set the seed and get the next random
number.

The contract for get and set methods of
entries would essentially be: if you call set with a given
value, and then return later and call get with no
intervening set, the value would be the same—it would
be unmodified. Furthermore, remember the matching is exact. When
you call set on your template to set a particular value
to 17, you are asking for an entry where the value is 17. When you
receive an entry and call its get method, you better get
17, not 18. Incrementing is not OK.

So if you examine the contract description for an entry's
get and set methods, you would see it
describes a field. get and set would
have to act exactly like a field. Therefore, we asked ourselves, why
should we have get and set methods
whose behavior is exactly like this other language construct called a
field? Why not just make it a field? If we make it a field, it will have
the correct behavior. Nobody can accidentally screw up their
get and set methods. Making it a field
eliminates a source of error.

Now this sometimes makes people uncomfortable because they've
been told not to have public fields; that public fields are bad. And
often, people interpret those things religiously. But we're not a very
religious bunch. Rules have reasons. And the reason for the private
data rule doesn't apply in this particular case. It is a rare exception to
the rule. I also tell people not to put public fields in their objects, but
exceptions exist. This is an exception to the rule, because it is simpler
and safer to just say it is a field. We sat back and asked: Why is the
rule thus? Does it apply? In this case it doesn't.

Bill Venners:I'd like to ask you about some quotes
from JavaSpaces Principles, Patterns, and Practice, which
you coauthored with Susanne Hupfer and Eric Freeman.

Ken Arnold: Eric and Susanne did most of the writing and I
reviewed it. They gave me the privilege of putting my name on the
book, for which I am grateful to them.

Bill Venners:Here's a quote: "An entry in a space is a
passive data object that can't be changed or altered unless it is first
retrieved from the space. This distinction has a powerful effect when
developing distributed applications." What is important about the fact
that in the space the object is passive?

Ken Arnold: In the sense it is meant there, an entry is passive
because it doesn't change state on its own. That
means, in effect, that by having a written entry, you can turn your
back on it. Six years from now, if it is still in the space, it will have the
same value. This reduces the number of actors in the system, because
the space is not an actor. It remains idle. The more actors a distributed
system has, the more complicated the interactions are. Also, for
something to change, somebody must take responsibility. Instead of
having something that itself and other people change, someone has to
step in and say, I will change it. Therefore, all the changes have a
responsible actor.

In some sense, JavaSpaces is like an RPC (remote procedure call) to
nowhere. You write an entry into a space, which effectively
will invoke a method.
You just don't know on what, how it will happen, or when you will get
a result. It is an asynchronous method invocation. What would the
world be like if you made a method invocation, and while the method
invocation traveled to the destination, somebody came in and altered it.
How would you live in that universe? You would live in a very
different way. And so this kind of static existence means you can view
it as an RPC to nowhere, because it isn't touched.

Bill Venners:Here's another quote from your book:
"Uncoupling senders and receivers lends to protocols that are simple,
flexible, and reliable." How does decoupling senders and receivers help
you build simple, flexible, and reliable systems?

Ken Arnold: Decoupling leads to that; it doesn't guarantee it.
Nothing prevents people from doing bad things, except straight jackets
and rubber balls—no sharp edges. If something is useful, it can be
abused. But JavaSpaces leads in that direction. Programmers of the
various actors in a system need not understand, nor rely upon, the way
other actors in the system are structured. Is it one actor or three that
performs this particular operation? Are there one or multiple actors of
the same kind in the system? Might writing one entry actually result in
a cascade of 37 other entries written in by other actors? Those things
are out of the requester's sight, as long as the result comes back.

I mentioned idempotency in regard to distributed system design (see
Part III of this interview). It is useful
here as well. You want to make your algorithms for using JavaSpaces
idempotent as well. You want to be able to write the same entry and,
as a general rule, have it be harmless. Because if you don't receive an
answer in some humanly defined reasonable amount of time, you might
decide: Hey, somebody dropped the ball. Better write another one in.

You can think about it this way: If you design a traditional system
using RPCs, or whatever you want to use, in the end you design an
architecture of actors and determine how they communicate with each
other. You decide what messages they can send to each other, what the
error states are, and what the responses are. When you design a system
using a JavaSpace, you essentially do the same amount of work, except
you replace messages with entries.

The JavaSpace provides you with this robustness mechanism. You
write entries into it. You can replicate the space. There is at least one
commercial implementation of a replicated JavaSpace. The space can
be fault tolerant. It can survive crashes. You can have transactions so
things don't get dropped on the floor. If I remove something to
compute your results and then I crash, the transaction will timeout,
abort, and then entries will appear for somebody else to take.

There is a great story about a group building a project called
Viper. The project basically uses a JavaSpace as a compute model.
You write a task into the space that represents a large complicated
simulation. Then there are compute servers that will take out the tasks,
whatever they are, and invoke their run method. The
servers don't know what the run method does, they
just invoke it. Whatever the run method returns, the
servers write back in the space. Then they retrieve something else.
Essentially they just donate cycles to run these jobs. The servers
download the code associated with the job and they execute it.

When the system is in beta, a guy puts in a large fluid dynamics
calculation, and he realizes he put in the wrong one. Rather than wait
two hours to get the wrong answer back, he finds the compute server
executing his entry, and he kills it. He kills off the virtual machine. So
the transaction times out, and the request returns to the space. It is
now visible for someone else to take, because the transaction has
aborted. So someone else takes it and starts executing it. And he goes
and he kills that one. He follows this thing around the network, and he
cannot kill the job. This is not the typical problem people have in a
distributed system, right?

You asked how JavaSpaces help you make reliable systems. It can do
it like that. If those servers were failing because the
hardware was flaky, it would also work. You don't have to have some
guy going around shooting things. The project Viper people learned
they have to have a Cancel button. You can cancel jobs in several
ways. I don't know how they did it, but they solved it. I can imagine
five designs to solve it. But the main point is, using exactly the
technology they have now, they could have built a system with 100
compute services where each one is down 50 percent of the time. And
they would still be able to get work done. The client wouldn't know
that only half of the compute servers were working. It might be too
slow. They might decide that 50 percent is a bad down time. If they
start replacing the systems, the performance would get better. But it
would be robust.

People talk about five nines, six nines reliability. (Five nines is 99.999
percent reliability.) They usually try to reach the desired number of
nines by making each component more reliable. But if you design a
system like Viper, you can make it reliable with unreliable components
that are much cheaper, more plentiful, and easier to come by. At some
point, every component is unreliable. I would much rather build a
system on that principle than try to build a system that never goes
down. It is the difference between trying to survive an earthquake by
building a sturdy structure that is hard to break and building a structure
that sways with the movement. You can survive much bigger
earthquakes by swaying with the movement, even though your instinct
is to build a sturdy structure. People are now following the instinct to
build a sturdier structure. When building with JavaSpaces and Jini,
you sway with the earthquake, and you can do much better.

Talk Back!

Have an opinion about JavaSpaces, public fields, or reliability? Discuss this article in the Cool Stuff Forum topic, Sway with JavaSpaces