Oracle Blog

Jonathan Gibbons' Weblog

Thursday Jan 28, 2010

One of the more subtle aspects of javac syntax trees is that every tree node has position information associated with it. This information is used to identify the location of errors with the source text, and is used by IDEs when refactoring or reformatting code. Ensuring the information is accurate is tricky, and with a number of projects ongoing to update the Java language, and hence the trees used by the compiler, the time has come for some better test support in this area.

The new test is called TreePosTest, and can either be run by the jtreg test harness or as a standalone utility. It can read one or more files or directories given on the command line, looking for source files. It reads each file in turn, ignoring those with syntax errors (there's a lot of those in the javac test directories!) For each file, it scans the tree, checking invariants for the position information for every tree node. Any errors that are found are reported.

Each tree node has three nominal positions, identified as character offsets from the beginning of the file. (Older versions of javac used line-number/char-on-line coordinates, packed into a single integer.) The three positions are the start of the node within the source text, the end of the node within the source text, and a position used to identify that node -- typically the first non-white character in the source text that is unique to that node. The last of these is stored directly in every tree node. The start position is always available and is recursively computed by TreeInfo.getStartPos. The end position requires a table to maintained on the side; for performance reasons, this table is not normally enabled; it must be enabled if end positions are going to be required. When enabled, the end position for a node is recursively computed by TreeInfo.getEndPos, using the endPosTable. (Certain nodes also store an end position directly, when such a position may be required for an error message.)

Given these three positions, we can identify various invariants.

For any node:
start <= pos <= end

Any node must be enclosed in its parent node:
parent.start <= start && end <= parent.end

The position of a parent node should not be within any of its childen:
parent.pos <= start || end <= parent.pos

The first surprise was that the test program found a number of faults within its own source text. Ooops. Running the test program over the source files in the langtools test/ directory, it found 6000 errors in over 2000 files. More ooops. Fortunately, many of those errors are repetitions, but what started as a proactive exercise to test new compiler code was turning out to have more payoff than expected.

I don't know about you, but I tend not to think in characters positions very easily, and error messages like the following leave a little to be desired:

But, you know what they say: a picture is worth a thousand numbers, so the test program now has an optional GUI mode, in which it becomes clearer that the reported range for the parent wildcard node (in red) incorrectly omits the type bound kind (in green). In fact, the type bound kind and therefore the enclosing wildcard node both actually begin at the preceding '?'.

Here is another example. Here, it becomes clear that the position for the parent AND node is incorrectly within the expression on the right hand side of the &&, instead of at the && operator. In fact, this is an instance of a previously reported bug,
6654037.

Issues

Most of the issues that have arisen have been reasonably easy to fix, and bug fixes are already underway. However, there are some problem cases.

Enum declarations

These are desugared right in the parser into equivalent declarations of static final fields within the enclosing class. The question then becomes, what position information should be recorded for these desugared nodes. On the one hand, one might argue to use the "invalid position" constant, NOPOS, since these nodes do not directly correspond to source text, but on the other hand, it is important to record a position in case of errors.
(See 6472751.)

Array declarations

Array declarations are complicated by support for legacy syntax, that allows constructions like:

int[] array[];
int[] f()[] { return null; }

Annotations

A number of issues have been observed with the positions recorded for annotations, but which have not yet been fully investigated.

Currently, these issues are addressed by making allowances within the test program.

Summary

The test program can easily be applied to large code bases, such as JDK or JDK test suites.
Despite some outstanding issues within javac, the test program has proved its worth in identifying errors within the existing javac code, and should prove useful in future to ensure that any support for new language features will also satisfy the expected invariants for tree positions. And even if the bar is not currently at 100%, at least we know where the bar actually is, by virtue of the specific allowances made in the test program.

Friday Nov 20, 2009

Back in August, Kelly posted a blog entry about the
Anatomy of the JDK build. However, upcoming new features for javac mean that building the JDK is about to get more interesting. More specifically, building the langtools component of the JDK is about to get a whole lot more challenging.

Background

Currently, it is a requirement that we can build a new version of JDK using the previous version. In other words, we can use JDK 6 to build images for JDK 7. However, we also want to be able to use new JDK features, including new language features, throughout most of the JDK source code. This means we need to be able to use the new version of javac to compile the Java code in the new version of JDK, which in turn imposes a restriction that we must at least be able to compile the new version of javac with the previous version of javac.

In practice, this means langtools is built using the boot JDK, which it uses to build bootstrap versions of javac, javah and javadoc, which understand the latest version of Java source code and class files, but which can be run by the boot JDK. These bootstrap tools will be used through the rest of the JDK build. In addition, the langtools build uses the new bootstrap javac to compile all of the langtools code for eventual inclusion in the new version of JDK. This is shown here, in Figure 1.
This directly corresponds to step 1 in Anatomy of the JDK build.

Figure 1: Building langtools today

The main body represents the langtools build; inputs are on the left, and deliverables (for the downstream parts of the build) are shown on the right.

Problem

In a recent blog entry, JSR 199 meets JSR 203, I described a new file manager for use with javac that can make of of the new NIO APIs now available in JDK7. Separately, Project Jigsaw is working to provide a Java module system that is available at both compile time and runtime, with consequential changes to javac. These two projects together have one thing in common: they both require the new version of javac should be able to access and use new API that is only available in JDK 7, which is at odds with the restriction that we should be able to compile the new javac with the previous version of javac.

The problem, therefore, is, How do we build javac for JDK 7?

Using the source path

One might think we could simply put the JDK 7 API source files on the source path used to compile javac. If only it were that simple! Various problems get in the way: some of the new API already uses new language features which will not be recognized by earlier versions of javac -- for example, some of the Jigsaw code already uses the new diamond operator. Also, javac ends up trying to read the transitive closure of all classes it reads from the source path, and when you put all of JDK on the source path, you end up reading a whole lot of JDK classes! Even though the new javac may just directly reference the NIO classes, to compile those classes, the transitive closure eventually leads you to AWT (really!) and to a couple of show stoppers: some of the classes are platform specific (i.e. in src/platform/classes instead of src/share/classes) and worse, some of the source files do not even exist at the time javac is being compiled -- they are generated while building the jdk repository, which happens much later in the JDK build process. (Step 6 in Anatomy of the JDK build.) So, simply putting the JDK 7 API source files on the source path is not a viable solution -- and reorganizing the build to generate the automatically generated source code earlier would be a very big deal indeed.

Using an import JDK

So, clearly, you can no longer build all of a new javac using the previous version of javac. But, we could leave out the parts of the new javac that depend on the new API, provided that we can build a bootstrap javac that functions "well enough" to be able to build the rest of javac and the JDK. However, we would still need to be able to build the new version of javac to be included in the final JDK image.

If you temporarily ignore chickens and eggs and their temporal relationships, the problems would all go away if you could put the classes for JDK 7 on the (boot) class path used to compile javac. This is very similar to the use of an "import JDK" used elsewhere by the JDK build system when performing partial JDK builds: an import JDK is used to provide access to previously built components when they are not otherwise part of the current build environment, which is somewhat the case here.

This is shown here, in Figure 2, and is not so different from what we are currently doing.

Figure 2: Building langtools with an import JDK

Stub files

In a full JDK build, we cannot compile against the JDK source code on the source path, and we cannot assume the availability of an import JDK to use on the (boot) class path. The solution is to provide stub files for the necessary JDK 7 API, which are sufficient for the purpose of compiling javac. Stub files have the same public signature as the files they represent, but none of the implementation detail, so they do not suffer from the same extensive transitive closure problem as occurred when trying to compile against the real JDK 7 API source code. And, we only need stub files for those classes required by javac that are either new or changed from their JDK 6 counterparts. This also simplifies the problem substantially.

The number of files involved, and the rate at which some of the files are changing, makes it impractical to create and maintain such stub files manually. The solution is to generate the stub files automatically from the latest JDK 7 API that would otherwise be used instead. The stub generator is built from parts of javac -- it reads in the JDK 7 source files to create javac ASTs, it rewrites the ASTs by removing as many implementation details as possible, then writes out the modified AST in Java source form to be used in place of the original. And, as a minor added complication, although the output stub files must be readable by a JDK 6 compiler, the input source files may contain JDK 7 artifacts (remember I said that the Jigsaw code already uses the diamond operator), so the stub generator must be built on top of the new javac -- or at least, those parts of the new javac that can be compiled by the old javac.

The final result is shown here, in Figure 3.

Figure 3: Building langtools using generated stubs

Implementation details

The langtools build.xml file uses three new properties. Two are statically defined in build.properties, and specify the langtools source files that depend on new JDK 7 API, and the API that is depended upon; the third is provided by the user and can specify the location of either an import JDK or a jdk repository.

When building a full JDK, the langtools build.xml must be given the location of the jdk/ repository. The langtools build will create and compile against stubs files generated from the necessary JDK source code. [Figure 3, above.] In a full JDK control build, the location of the jdk/ repository is passed in automatically by the Makefile from the JDK_TOPDIR make variable, which exists for this purpose.

When building langtools by itself, a developer may choose to pass in the location of an import JDK. In this case, the langtools build will compile against rt.jar in the import JDK, thus precluding the need to generate and use stub files. [Figure 2, above.]

If no value is passed in for the jdk/ repository or import JDK, the langtools build will not build those classes that require the use of JDK 7 API. This allows a developer to create a compiler that is just "a better JDK 6" compiler. [Figure 1, above.]

It is also worth noting the compiler options are quite tricky for these different cases, and specifically, for the boxes in the diagrams labelled "compile product classes".

javac itself is run using the bootstrap javac classes on the JVM boot class path (-J-Xbootclasspath/p:).

If being used, the stub files go on the compiler's source path (-sourcepath), together with -implicit:none and -Xprefer:source. Together, these mean that the stub files are used in preference to any files from the boot JDK, and that class files are not generated for the stub files. Other JDK API comes from the normal boot class path. Note that unlike other situations when overriding the standard JDK API, the stub files cannot go on the boot class path because source files are not read from that path.

If an import JDK is being used, it is used together with the javac output directory for the compiler's boot class path (-Xbootclasspath). This completely replaces the normal boot class path used by the compiler, so all JDK classes are read from the import JDK.

Unless an import JDK is being used, the javac output directory is prefixed to the normal boot class path (-Xbootclasspath/p:). This means that langtools classes are used in preference to classes on the normal boot class path, while not hiding any classes not defined by langtools.

Summary

With these build changes, it is possible to allow limited references from javac into new JDK 7 API, which are forward references in terms of the normal build process. Furthermore, this can be done without changing the overall structure of the JDK build pipeline.

Thursday Sep 24, 2009

The Compiler API, JSR 199, added in JDK 6, provided the ability to invoke a Java compiler via an API. Now, in JDK 7, there is a new feature, More New I/O APIs for the Java Platform, JSR 203, which provides a rich file system abstraction. This past week I've put together some code to connect the two.[Read More]

Friday Jun 12, 2009

As we reported at JavaOne, a lot has been going on for javac over the past year.

Under the auspices of Project Coin, various small language changes are being considered: strings in switch, arm blocks, binary literals, large arrays, and the diamond and "elvis" operators. Project Jigsaw is investigating the use of modules; JSR 292 is providing support for the "invoke-dynamic" bytecode, and JSR 308 will provide support for annotations on types.

In addition, within javac, we've been finding and fixing bugs, including issues in the type system, and improving the diagnostics that may be generated. We've worked to produce an ANTRL-based grammar for the compiler, and we've worked with the OpenJDK release team to release javac as part of OpenJDK 6.

Monday Dec 08, 2008

Recently, we've been working to raise the quality bar for
the code in the OpenJDK langtools repository.

Before OpenJDK, the basic quality bar was set by the JDK's
product team and SQE team. They defined the test suites to be
run, how to run them, and the target platforms on which they
should be run. The test suites included the JDK regression
tests, for which the standard was to run each test in its
own JVM (simple and safe, but slow), and the platforms were
the target platforms for the standard Sun JDK product.

Even so, the bar was somewhat higher in selected areas. The
javac team has pushed the use of running the javac regression
tests in "same JVM" mode, because it is so much faster.
Starting up a whole JVM to compile a three line program to
verify that a particular error message is generated is like
using a bulldozer to crack an egg. Likewise, as a pure Java
program, it has been reasonable to develop the compiler and
related tools, and to run the regression tests, on non-mainstream
supported platforms.

With the advent of OpenJDK, the world got a whole lot bigger,
and expectations got somewhat higher, at least for the langtools
component. If nothing else, there's a bigger family of developers
these days, with a bigger variety of development environments,
to be used for building and testing OpenJDK.

We've been steadily working to make it so that all the langtools
regression tests can be run in "same JVM" mode. This has required
fixes in a number of areas:

in the regression test harness (jtreg)

in tools like javadoc, which used to be neither reusable nor
re-entrant. This made it hard to test it with different
tests in the same VM. javadoc is now reusable, re-entrant
is coming soon

in the tests themselves: some tests we changed to make them
same-VM safe; others, like the apt tests, we simply
marked as requiring "othervm" mode. Marking a test as
requiring "othervm" allows these tests to succeed when
the default mode for the rest of the test suite is "samevm".

We've also made it so that you can run the langtools tests without
building a full JDK, by using the -Xbootclasspath option. For a
while, that left one compiler test out in the cold (versionOpt.sh)
but that test was finally rewritten, recently.

We've been working to use Hudson to build and test the langtools
repository, in addition to the standard build and test done by
Release Engineering and QA teams. This allows us (developers) to perform
additional tests more easily, such as running FindBugs, or
testing "developer" configurations as well as "product" configurations.
(i.e. the configurations an OpenJDK developer might use.)
This has also made us pay more attention to the documented way
to run the langtools regression tests, using the standard Ant
build file. In practice, the Sun's "official" test runs are
done using jtreg from the command line, and speaking for myself,
I prefer to run the tests from the command line as well, to have
more control over which tests to run or rerun, and how to run them.

The net result of all of this is that the langtools regression tests
should all always pass, however they are run. This includes

as part of testing a fully built JDK

as part of testing a new version of langtools, using an earlier
build of JDK as a baseline

from the jtreg command line in "other vm" mode

from the jtreg command line in "same vm" mode

from the <jtreg> Ant task, such as used in the
standard build.xml file

Tuesday Oct 07, 2008

We've reached an interesting milestone for the regression tests for the OpenJDK langtools repository. You can now run all the tests using the jtreg -samevm option. In this mode, jtreg will run all the tests in the same JVM whenever possible. This means that the test run completes much faster than if every test creates one or more separate JVMs to run.

This has been a background activity ever since I joined the JDK Language Tools group. It has required work on a number of fronts.

We've had to fix a number of bugs in jtreg itself. There was a chicken and egg problem: there was no demand for the feature since it didn't work work enough, and with no demand, there were no engineering cycles to fix it. But, I wanted to run the javac tests fast, and I took over jtreg, so the chicken and egg became one.

We converted a lot of javac tests so that they could be run in the same VM. These tests mostly started out as "shell tests" (i.e. written in ksh.) They executed javac and then compared the output against a golden file. By fixing jtreg, and by adding hidden switches to javac, we converted the tests to use jtreg built-in support for golden files.

This was enough to get the javac tests to run in samevm mode, which has been the state for a while now, but there was still the issue of the other related tools, such as apt, javadoc, javap, javah. And, the fact that not all the tests could be run in samevm mode meant that anyone wanting to run all the tests had to use lowest common denominator, slow mode.

javap was the next to get fixed: it needed a rewrite anyway, and when that happened, it was natural to fix those tests to run fast as well.

javah was similar: there's a partial rewrite underway (to decouple javah from javadoc), and so those tests have been fixed.

javadoc has always been the tough one, and has the second largest group of tests after javac. A while back, I took the time out to figure out why the tests failed in samevm mode. It turns out to be a number of factors, mostly relating to the fact that javadoc is a fairly old program, and is somewhat showing its age. Internally it was using static instances a lot, and as a result, was neither reentrant nor reusable. In addition, there were classloader issues when creating the classloader for the doclet: javadoc was not following the recommended "parent classloader" pattern. Having identified those issues, we've been working to fix them, and it is the result of fixing the last of those issues that gives the milestone today.

In parallel with the work on javadoc, and with the milestone in sight, we checked out the final tool, apt. There were 24 test failures there when using samevm mode, and since the tool is scheduled to be decommissioned in Java 8, we "fixed" those tests simply by marking them as requiring othervm mode. That doesn't make them run any faster, but it does mean they don't fail if you run the tests with samevm mode as the default mode.

What does all this mean? It means the tests run is less than a quarter of the time than before. Using jtreg samevm mode, and using my laptop, 1421 tests run and pass in a little over 5 minutes, compared to 22 minutes in the standard othervm mode. As a developer waiting to move on the next step, that's a big saving :-) Although all the supporting work has not yet made it back to the master workspace, that's all underway, and at some point we'll throw the switch so that samevm mode is the default for the langtools repository. So far, we've been targeting the JDK 7 repository, but Joe Darcy is keen to see as much of this work as possible in the OpenJDK 6 repository as well.

Can we do the same for the main jdk repository? In principle yes, but it will take someone to do it. The good news is that jtreg now supports the samevm feature well, and the payoff is clear. It does not have to be done all at once: as we did in the langtools workspace, all it takes is someone interested to work on it section by section. The payoff is worthwhile.

Monday Oct 06, 2008

A while back, we created a new OpenJDK project,
Compiler Grammar, so that we could investigate integrating an ANTLR grammar for Java into javac. Thanks to some hard work by our intern
Yang Jiang, with assistance from
Terence Parr,
the initial results of that work are now available.

The grammar currently supports Java version 1.5, although the goal is to fully support the -source option and support older (and newer) versions of the language as well. Right now, the performance is slower than that of standard javac, so this will not be the default lexer and parser for javac for a while, but even so, it should prove an interesting code base for anyone wishing to experiment with potential new language features. And, it does mean that the grammar files being used have been fully tested\* in the context of a complete Java compiler.

We are also looking to align the grammar more closely with the grammar found in JLS.

This version of javac is in the langtools component of the compiler-grammar set of Mercurial OpenJDK repositories.

\*There are currently a few test failures in the regression test suite. Some are to be expected, because the error messages generated by the parser do not match the errors given by the standard version of javac; the other failures are being investigated.

Sunday Jul 06, 2008

I just finished a vacation with my family, and in the in-between times, I made significant progress towards a multi-threaded javac.

Before you get too excited, let me qualify that by saying that there is some low-hanging fruit for this task, and there's a complete rewrite of the compiler. I'm only talking about the former; there are no plans to do the latter.

The difficulty with a multi-threaded javac is that the Java language is quite complicated these days, and as a result the compiler is also quite complicated internally, to cope with all the consequences of interdependent source files. The current compiler is not set up for concurrent operation, and adapting it would be error-prone and destabilizing. (For more details on the compiler's operation, see
Compilation Overview
on the
OpenJDK Compiler Group web pages.)

The low hanging fruit comes by considering the compilation pipeline in three segments: input, process, and output. The source files can all be read and parsed in parallel, because there are no interdependencies there. (Well, almost none. More on that later.) Likewise, once a class declaration has the contents of its class file prepared internally, it can be written out in the background while the compiler begins to work on the next class.

We've known about this low hanging fruit for a while, and Tom Ball recently submitted a patch with a code for parsing source files in parallel. So, faced with a family vacation, I loaded up my laptop with the bits needed to explore this further.

Parallel parsing

If you're parsing files in parallel, the primary conflict is access to the Log, the main class used by the rest of the compiler to generate diagnostics. It is reasonably obvious that even if you're parsing the source files in parallel together, you don't want to see interleaved any diagnostics that might be generated: you want to see all the diagnostics for each file grouped together. Initially, I was thinking to create custom code in the parser to group parser diagnostics together, but since the scanner (lexer) can also generate diagnostics, it seemed better and less intrusive to give each thread its own custom Log that could save up diagnostics until they can all be reported together. The previous work on the Log class made it somewhat "hourglass shaped", with a bunch of methods which are used by the rest of the compiler to create and report diagnostics, and a back end that knows how to present the diagnostics that are generated. In between is a single "report" method, which was originally introduced to make it easy to subtype Log to vary the way that diagnostics are presented. Now, however, that method provided an excellent place to divide Log in two, into an abstract BasicLog, that provides the front end API used by the body of the compiler, and subtypes to handle the diagnostics that are reported. The main compiler uses Log it always did -- one of the Big Rules for the compiler is to minimize change -- but the threads for the new parser front end can now use a new subtype of BasicLog that buffers up diagnostics and reports them together when the work of the thread is complete.

This refactoring forced one other cleanup in Log, which was an ugly hangover from the introduction of JSR 199, the Compiler API. The Diagnostic objects that get created had an ugly hidden reference bug to the Log that created them, which if used incorrectly could provoke NullPointerExceptions or other problems if you tried to access the source code containing the diagnostic. For those that are interested, it's because of interaction with the Log.useSource method which sets temporary state in Log, but the bottom line is that one more refactoring later, the DiagnosticSource interface became a much better DiagnosticSource object, providing a much cleaner standalone abstraction for information about the source code containing the location of the diagnostic.

(Log used to be one big do-everything class; it has slowly been getting better over the years, and watch out for the upcoming exciting new work that Maurizio is doing to improve the quality and presentation of diagnostics. Luckily, these refactorings I'm describing here will not interfere with that work too much.)

There are some other shared resources used by the Parser: most notably, the compiler's name table, but these were easily fixed by synchronizing a few strategic methods.

That, then was sufficient for the first goal — to parse source files in parallel. :-) Writing the class files concurrently was somewhat more interesting.

Background class generation and writing

Apart from the general refactoring for Log, the work to parse source files in parallel turned out to be very localized, almost to a single method in JavaCompiler, which is responsible for parsing all the source files on the command line. That one method can choose whether to parse the source files sequentially, as before, or in parallel. There is no such easy method for writing out class files. This is because the internal representation of a generated class file may be quite large, and the compiler pipeline is normally set up to write out class files as soon as possible, and to reclaim the resources used. Because of the memory issues and the primarily serial nature of the upstream pipeline, the general goal was not to write all the class files in parallel, but merely to be able to do the file IO in the background. Thus the design goal was to write classes using a single background thread fed by a limited capacity blocking queue, and so improving the flexibility of the compiler pipeline would improve the ability to write out class files in the background. In particular, it was also desirable to fix an outstanding bug such that either all the classes in a source file should be generated, or none should. The current behavior of generating classes for the contents of a source file until any errors are detected does not fit well with simple build systems like make and Ant that use simple date stamps to determine if the compiled contents of a class file are up to date with respect to the source file itself.

There were already some ideas for reorganizing the compiler pipeline within the main JavaCompiler class. Previously, a big "compile" method in JavaCompiler had been broken up into methods attribute, flow, desugar and generate, representing the different stages of processing for each class to be compiled. These methods could be composed in various ways depending on the compilation policy, which is an internal control within the compiler. The methods communicated via lists of work to be processed, and although the concept was good, it never paid off quite as well as anticipated because of the memory required to build all of the items on the lists before handing the list to the next stage. The latest idea that had been developing was to use iterators or queues to connect the compilation phases, rather than lists.

Another refactoring later, it turned out that queues were the way to go (as in java.util.Queue), because they fit the abstraction required and caused less change elsewhere in the compiler.

In a related improvement, the main "to do" list was also updated. Previously, it was just a simple list of items to be processed, using a simple javac ListBuffer. It was updated to implement Queue, and more importantly, to provide additional access to the contents grouped according to the original source file. This made it easier to process all the classes for a source file together, including any anonymous inner classes. Previously, anonymous inner classes were handled much later than their enclosing classes, because while top level and nested classes are discovered and put on the "to do" list very early, anonymous inner classes are not discovered until much later.

However, an earlier bug fix got in the way of being able to effectively complete processing the contents of a single source file all together.

Normally, the compiler uses a very lazy approach to the overall compilation strategy, advancing the processing of each class as needed, with a "to do" list to make sure that everything that needs to be done eventually really does get done. However, limitations in the pipeline precluded that approach in the desugaring phase. If the supertypes of a class are being compiled in the same compilation as the subtype, they need to be analyzed before the subtype gets desugared, because desugaring is somewhat destructive. The previous implementation could not do on demand processing of the supertypes, so instead the work on the subtypes was deferred by putting them back on the "to do" list to be processed later, after any supertypes had been processed, thus defeating any attempt to process these files together. The new, better implementation is simply to advance the processing of the supertypes as needed.

All this refactoring was somewhat easier to implement than to describe here, and again per the Big Rules, the work was reasonably localized to the JavaCompiler and ToDo classes, with little or no changes to the main body of the compiler. The net result is more flexibility in the compiler pipeline, with a better implementation of the code to generate code file by file, rather than class by class. And, to bring the story back to the original goal, it makes it easier adapt the final method of the pipeline so that it was do its work serially, or with a background queue for writing class files in the background. :-)

And now ...

So where is this work now? Right now, it's here on my laptop with me in a plane somewhere between Iceland and Greenland, so let's hope for a safe journey the rest of the way back to California. The work needs some cleaning up, and more testing, on more varied machines. I've been running all the compiler regression tests and building OpenJDK with this new compiler, and it looks good so far. Finally, it will need to be code reviewed, and pushed into the OpenJDK repositories, probably as a series on smaller changesets, rather than one big one. So watch out for this work coming soon to a repository near you ...

It is my pet peeve message type ("can't apply method...") and also includes wildcards, captured type variables, and a <nulltype>. The text of the error message, excluding source file name and highlighted lines is a whopping 577 characters :-) Who says we don't need to improve this?

We have various ideas in mind. This first list is about the content and form of the messages generated by javac.

Omit package names from types when the package is clear from the context.
For example, use Object instead of java.lang.Object, String instead of java.lang.String, etc, when those are the only classes named Object, String etc, in the context.

Method name cannot be applied to given types
required: types
found: types

Don't embed captured and similar types in signatures, since they inject wordy non-Java constructions into the context of a Java signature.
Instead, use short placeholders, and a key.

For example, in the message above, replace java.lang.Iterable<capture#81 of ? extends javax.tools.JavaFileObject>
by
java.lang.Iterable<#1>
with a note following the rest of the message:
where #1 is a capture of ? extends javax.tools.JavaFileObject

It is a lot shorter (less than half the length of the original message, if you're counting), and more importantly, it breaks the message down into segments that are easier to read and understand, one at a time. It still has a long file name in it, and I'll address that below.

The following ideas are more about the presentation of messages. javac is typically used in two different ways: batch mode (output to a console), and within an IDE, where the messages might be presented as "popup" messages near the point of failure, and in a log window within the IDE.

When used in batch mode, either directly from the command line or in a build system, the compiler could allow the user to control the verbosity of the diagnostics. If you're compiling someone else's library, you might not be worried about the details in any warnings that might be generated. If you're compiling your own code, you might be comfortable with a quick summary of each diagnostic, or you might want as much detail as possible.

When used in an IDE, it would be good to provide the IDE with more access to the component elements of the diagnostic, so that the IDE could improve the presentation of the message. For example,

display the base name of the file containing the error, and link it to the compilation unit, instead of displaying the full file name as above

use different fonts for the message text and the Java code or signatures contained within it

hyperlink types used in the diagnostic to their declaration in the source code

given the resource key for the message, an IDE could use the key as an index into additional documentation specific to the type of the error message, explaining the possible causes for the error, and more importantly, what might be done to fix the problem.

To support these suggestions, the compiler could be instructed to generate diagnostics in XML, so that they could be "pretty-printed" in the IDE log window.

Here's how these ideas could be used to improve the presentation of the example message.

OK, I'll leave the real presentation design to the UI experts, but I hope you get the idea of the sort of improvements that might be possible.

Finally, we'll be looking at improving the focus of error messages. For example, this means that if the compiler can determine which of the arguments is at fault in a particular invocation, it should give a message about that particular argument, instead of about the invocation as a whole. However, care must also be taken not to narrow the focus of an error message incorrectly, so that the message becomes misleading. A typical example of that is when the compiler is parsing source code, and having determined that the next token is not one of A or B, it then checks C. If that is not found the compiler may then report "C expected", when a better message would have be "A, B or C expected." This means that such optimizations have to be studied carefully on a case by case basis, whereas all of the preceding suggestions can be applied more generally to all diagnostics.

So, do you have any "pet peeve" messages you get from the compiler? Do you have suggestions on how the messages could be improved, or how they get presented? Add a comment here, or mail your suggestions to the OpenJDK compiler group mailing list, compiler-dev at openjdk.java.net.

Thanks to Maurizio and others for contributing some of the suggestions here.

See Also

Tuesday May 06, 2008

As some of you may know, we've
made changes
recently to the KSL project that was started last year.

We thought it would be fun to share some of the ideas we had along the way.

For inspiration, we decided to throw a bunch of ideas into a kitchen sink, for real, to see what that might inspire. See how many of your favorite language ideas are represented here.

Hint: There are no wrong answers, but there have been some creative ones. :-)

Of course, as soon as we did that, this little guy wanted to get in on the act. He may not quite understand the gist of KSL, but you can't fault his enthusiasm: there's a keyboard, a couple of mice, a couple of monitors and even a KVM cable in the sink, if you look carefully!

Friday May 18, 2007

Well, the phrase may not be entirely apt, but it does carry the right sentiment, especially to the monarchists amongst us.

As you may have read, Peter is moving on to new adventures, but you can be sure that there are still plenty of adventures still to be had with javac, and the rest of us on the javac team are looking forward to carrying on with everything that is coming up.

With the compiler being open sourced last year, and the recent announcements regarding OpenJDK, and JDK 7 ahead, these are exciting times, albeit somewhat turbulent at times. So, stay tuned for more details, and in the meantime, thanks go to Peter for all his contributions.