For the past several years, Java's creator James Gosling has been working at Sun Labs, researching ways to
analyze and manipulate programs represented as annotated parse trees, a project called Jackpot.
Compilers have long built parse trees when they translate source code into binary. But traditionally,
programmers have worked with source code primarily by manipulating text with editors. The goal of the Jackpot
project is to investigate the value of treating the parse tree as the program at development
time, not just at compile time.

In this interview, which is being published in multiple installments, James Gosling talks about many aspects
of programming.

In Part I: Analyze this!, Gosling describes the ways in which Jackpot
can help programmers analyze, visualize, and refactor their programs.

In Part II: Failure and Exceptions, Gosling talks about how to build solid apps, organize your catch clauses,
scale checked exceptions, and deal with failure.

In this final installment, Gosling talks about visualizing software designs and
understanding large-scale distributed systems.

Visualizing Software Designs

Bill Venners: One thing I think could help people design software is
being better able to visualize what they're designing. If you design a chair, you can see it.
You can sit in it and get the feel of it. But when you design an object, you can only see
what's on the screen, which is often just code.

You have said about your Jackpot research project that if the notion of truth in a program
is the abstract syntax tree, not text, you can display the program in a lot of interesting
ways. I would imagine that if the abstract syntax tree is stored persistently, then the code
is just one view. In fact, you could have many different code views. If someone doesn't
like curly braces, for example, they wouldn't have to look at curly braces. I could also
imagine that programs could be viewed at more abstract levels that would help people see
problems with their design. Is that kind of visualization what you're thinking about in
your research?

James Gosling: Yes. Historically, people have used all kinds of ways
to try and visualize their programs. Some are simplistic and kind of obvious, others are
not. The mathematics world, for example, has a notation for expressing mathematical
formulas that is much richer and more evocative than plus, minus, star, slash,
parentheses, and variable names. People are pretty good at looking at a page of
mathematics that has square root signs and exponents. So if you've got a piece of code
full of gnarly math, it's probably a lot more comprehensible to the people who understand
gnarly math if you actually display it in something that looks like conventional
mathematics. That's one of the things our system will do. There have also been a
number of attempts of come up with notations for visualizing control flow that is
different than just code. For example, Nassi-Shneiderman diagrams are pretty good for
representing decision trees.

In our research, we've been working to build a framework into which you can plug in
various kinds of visualization techniques. The extent to which these visualization
techniques are helpful is often context dependent. If you are writing a ray tracer, for
example, the ability to do mathematical layout of code probably helps a lot. If you are
writing a banking application, on the other hand, mathematical layout probably doesn't
help at all. But in a banking application, specialized visualizations for state diagrams,
database modeling, and database access might help. If you crack open database
textbooks, you'll see diagrams of what databases look like. Maybe your program ought to
look like that. Maybe your program would be a lot more comprehensible if it looked like
that.

The Abstract Syntax Tree of Truth

Bill Venners: What kind of things would change if the abstract syntax
tree is the truth, not text? Would my source be binary? Wouldn't comments need to
become first class parts of the language? Right now, comments are just part of the text
and they're thrown away.

James Gosling: Oddly enough, one of the most painful things inside
our system right now is dealing with comments. Javadoc comments, which people use in very
clearly stylized ways, are basically in the grammar. Because of that, Javadoc comments
are completely straightforward to deal with. But general comments that people put in
random places are an unbelievable pain.

Bill Venners: Because you don't know what to attach them to?

James Gosling: Yes. We do a pretty good job with comments,
but some people do very bizarre things with comments. Currently, we don't even try
to guarantee perfect fidelity for arbitrarily bizarre comment usage.

Also, we actually don't represent the programs as binary in their persistent form. We
represent them as a Java source file. We actually use the Java source file as the way to
represent the parse tree.

Bill Venners: If I also edited the Java source by hand, would it break
your tool?

James Gosling: No, you can do arbitrary editing and we figure
it out. It became clear that any parse tree representation that we would come up with
would be almost certainly slower to parse and take more disk space than Java source
code. We can derive almost everything we have as annotations from the source code
directly. Certainly all the type information is generated by inferencing in the type system.
So that's pretty straightforward. We can discover many other things by doing various
kinds of pattern matching. And we have also been attaching attributes to methods.

Bill Venners: How do you add attributes?

James Gosling: We're about to start using the 1.5 metadata
specification explicitly as our mechanism for attaching persistent information to source
code. The traditional way to do that, used for example by Borland JBuilder, is to put the
metadata in comments. Often people use Javadoc comments as the way to store their
metadata, and that actually worked remarkably well. But now there's a real metadata
facility in 1.5.

Bill Venners: What kinds of metadata would you be adding? If I'm
visualizing a class in UML, and I click and drag it to a new position, it seems like the
positions would need to be attached to the classes themselves.

James Gosling: Yes. Somebody doing a UML editor could certainly
attach metadata about the location of the boxes on the screen. Also, in a UML diagram you
often distinguish between the important concepts and the fluffy concepts. If you want to make
the diagram small, you leave out the fluffy concepts and just represent the important
ones. So you could, for example, certainly have a piece of metadata attached to each
field that says whether it's fluffy or important.

Undestanding Distributed Systems

Bill Venners: How can you visualize and understand the complexity of
distributed systems? For example, in enterprise systems today it is often hard to turn
something off, because you don't know what's been connected to it over the years.
Understanding the complexity of one big application seems like a hard problem, but more
manageable than understanding a...

James Gosling: ...sea of things. Boy, there are a whole bunch of PhD
theses ready to be had about that topic. It is really hard. For example, in the Web services
model, you may publish a service descriptor that says, "Hi, I'm a service. This is what I
take. Talk to me." Eventually, something needs to change in that service, or maybe the service
has to go away for some reason. If you need to track down the dependencies in a large-scale
system where dependencies get established in a completely dynamic and ad hoc
basis, there's nothing that's as good as just maintaining a log of who has ever talked to
you.

In some sense this is kind of a hopeless problem, and maybe that's OK. And I say,
"maybe that's OK," because it really is a deeply difficult problem. For example, look at URLs, which make the Web work. Hypertext wasn't invented with
the Web. Hypertext had been around as a concept for 20 or 30 years. The earliest popular
description of this was this book called Computer Lib, a written a long time ago by Ted Nelson.
That book really was about what you could do with hypertext, and he had
this project called project Xanadu that was trying to do that. But they went off and did
the usual computer science thing, which is to try to solve all the hard problems and make
it perfect.

One of the hard problems is exactly what you were just asking about concerning distributed
systems. You've got a reference to a remote resource. What happens if that remote
resource moves? Should you keep the backtracking information? How do you keep the
backtracking information? Solving that problem is really, really, really hard. Lots of
people went running at that brick wall over, and over, and over again, trying to find a way
to make these large scale distributed references really work. In the computer science
academic world, it was generally considered that an internet link just wasn't of any
value unless it could handle resource moving and renaming and issues like that.

In some sense, the brilliant thing that Tim Berners-Lee did was simply to say, "I don't
care." For 20 years people had been failing to solve these problems in any large-scale
way. Berners-Lee decided to just do the simple obvious thing that solves the problem he
needed, namely, getting ahold of a resource. And that's actually an easy problem.
Coming up with those names, URLs, is a relatively straightforward thing. He did that,
and that enabled a lot of what the Web is today. But the Web has all these problems.
What happens if a Web page moves or gets deleted? That is exactly the problem of
maintaining or managing the configuration of any large scale distributed system. On the
one hand, the URL design has made the Web somewhat fragile. Broken links are all over
the place. On the other hand, if they had tried to really solve that problem, the Web never
would have happened, because the problem is just too hard.

So philosophically, I really don't know. Dealing with dynamic systems with pieces that
come and go is a really hard problem. There are all kinds of specialized solutions for
specialized situations, but I've never seen anything like a set of general solutions. In some
sense, this particular problem feels like one where unreliability may be a good thing, just
because it makes the whole enterprise possible. Maybe people should just get over it.

Next Week

Come back Monday, November 17 for Part II of a conversation with
Ruby's creator Hiruhito (Matz) Matsumoto. I am now staggering
the publication of several interviews at once, to give the reader
variety.
If you'd like to receive a brief weekly email
announcing new articles at Artima.com, please subscribe to
the Artima Newsletter.

Talk Back!

Have an opinion about refactoring tools, program visualization, or JavaDoc?
Discuss this article in the Articles Forum topic,
Visualizing Complexity.