Oracle Blog

On the design and specification of Java

Thursday May 29, 2008

Stanley Ho and I have published a two-part blog entry on how the Java Module System tries to strike a balance between flexibility and readability in its versioning scheme. The first part is about the format of version numbers and the lessons drawn from 12 years of JDK version policy. The second part is about version ranges. If you have comments, it's best to leave them on the two entries, rather than here.

Thursday May 22, 2008

Adding modules to the Java language has an interesting interaction with protected accessibility.

First, consider how protected accessibility is described in JLS 6.6.2: "A protected member or constructor of an object may be accessed from outside the package in which it is declared only by code that is responsible for the implementation of that object." The "responsible for implementation" phrase is a piece of morality which translates to the accessing code being in a subtype of the type which declares the protected member. Then, ignoring private, there is a total ordering of accessibility levels:

public
|
protected
// other pkg if subtype
|
package

For module accessibility, the JLS will say something like: "A module-private member or constructor of an object can be accessed from outside the package in which it is declared by code that belongs to the same module." There will then be two accessibility levels greater than package and less than public:

Is a total ordering possible? Consider the accessibility of a protected member:

same package, or

subtype in different package

When protected was invented, the types "responsible for implementation" of an object could be as far away as a different package. With modules, they could be as far away as a different module. This gives two options for which subtypes can access a protected member:

subtype in different package in same module

subtype in different package in any module

Suppose we decided, on a moral basis, that a single module is the unit "responsible for implementation". Then, protected members should be accessible to subtypes only in the same module. This induces a total ordering:

public
|
module
// other pkg if same module
|
protected
// other pkg if subtype and in same module
|
package

But this decision is not very pragmatic. It means two packages where one has a protected member and one has an accessing subtype cannot be factored into two modules. We have no statistics on the popularity of putting related packages in different JAR files, but it seems a reasonable thing to do. Making protected strongly respect a module boundary seems likely to cause pain when moving types into modules. Elsewhere, we have gone to some lengths to ensure that publicly accessible types remain publicly accessible if they're thrown into modules. I conclude that multiple modules can jointly be "responsible for implementation", and that the appropriate interaction of protected and modules is for protected members to be accessible to subtypes in different packages even in different modules.

What about the accessibility ordering now?

For module to be more accessible than protected, the accessors of a protected member would have to be a subset of the accessors of a module member. This is not true - an accessor of a protected member can be in a different module, hence outside the set of accessors of a module member.

For protected to be more accessible than module, the accessors of a module member would have to a subset of the accessors of a protected member. This is not true - an accessor of a module member can be any type in the module, not just certain subtypes like the set of accessors of a protected member in that module.

So we're stuck with a partial ordering, albeit with a better understanding of what protected means w.r.t. modules:

Finally, protected is really a meta-modifier - it modifies the package modifier (spelled '' of course) to add accessibility from subtypes. It's not unreasonable for protected to modify module too, such that module protected means accessible from:

The bane of protected is the obscure relationship between the type declaring the protected member and the qualifying type of the reference to the member, enforced during compilation and verification. But this relationship is independent of whether the accessing subtype is in a different package or a different module, so it is not a prima facie reason against module protected. Whether module protected is worthwhile is an open question - it can't be removed if it's somehow 'wrong', the classfile format would have to be updated, tools would need to parse it, etc, etc. module protected would however be an interesting counterpart to Peter Kriens' multi-module accessibility level, which one might denote module exported. Watch this space.

Tuesday Apr 22, 2008

Peter Kriens, the OSGi spec lead and official evangelist, takes a positive view of language-level modularity. His focus on "requirements, not solutions" is especially helpful. Here are some responses to his points:

Module-private interfaces

There is little difficulty in allowing an interface to be module-private, since it can already be package-private or public. As for interface members, it was a nice simple approach back in the day to make them automatically "accessible outside my package", since that's completely what 'public' meant. (Of course the cost was excessive exposure of implementation methods.) Now that public is no longer the only "outside my package" level, it makes sense to allow module-private interface members:

Now that JSR 277 is the place for Java modularity, look for a unified module reflection API soon. Stanley Ho and I do understand why dynamic membership is important.

Module export

Peter identifies an additional accessibility level, 'export', between module-private and globally public. I might call it multi-module-private, since a type/member marked 'export' is accessible in its own module and from any module which imports that module. The difficulty is that the VM won't know about module imports so can't determine which other modules can access a multi-module-private type/member. The same is true of javac - it generally won't know if the caller's module imports the callee module where the 'export' type/member lives. (There will be a way to compile programs in the context of a runtime module system, but it will not be mandatory because people should be able to use modules simply as "better packages" without deployment overhead like module dependencies and packaging.)

Scoping

Peter proposes to qualify type names in classfiles with their declaring module. This is not strictly necessary because when the VM resolves a module-private type/member, it must compare the caller's module with the callee's module. It cannot rely on the caller's classfile claiming that the callee's module is so-and-so, hence I don't see the benefit in embedding the callee module in the caller. Indeed, this is the kind of excessive two-way dependency we're now trying to avoid. Also, having a caller commit to which module declares a type seems contrary to having the runtime obtain types from arbitrary modules (e.g. in the context of an import-by-package dependency).

Versioning

A standard version schema for Java types would be very interesting but not in the Java SE 7 time frame. We all need to think about this more. (See Alexander Krapf at JavaPolis and "UpgradeJ: Incremental Typechecking for Class Upgrades" by Bierman, Parkinson and Noble at ECOOP 2008.)

Thursday Mar 27, 2008

With my JSR 294 spec lead hat on, I recently proposed a change to the superpackage model which JSR 294 defines in the service of JSR 277's deployment modules. Early feedback has been positive, but where to declare module membership in source code is an ongoing issue.

When module membership is decentralized, i.e. 'module M;' appears in compilation units, each compilation unit that declares 'package P;' can declare a different module. But a package must not be "split" among different modules. If two types in the same package could be in different modules, then each type could access the other's package-private members but not its module-private members. This is stupid; types in different packages in a module can access each others' module-private data so types in the same package ought always to be able to access each others' module-private data. For analogous reasons, deployment module systems frown on split packages too.

The question for JSR 294 is how to enforce consistent module membership across all the compilation units which declare a given package. I could just write a declarative statement in the JLS and let javac worry about enforcing it:

But making javac inspect every source/class file in a package whenever a compilation unit in that package is recompiled will hardly be popular. Instead, a common idea is to declare module membership in package-info.java:

// P/package-info.java
@Imports(...)
module M;
@Foo
package P;

Module membership is now not declared near a type declaration, but it's clear from the presence of a 'module' modifier in a compilation unit that a predictable package-info file should be consulted by the compiler or developer. And there's no issue with types in an unnamed package (i.e. no package-info), since they can never be module members anyway.

However, things aren't as neat as they look. The moral argument against using package-info is that an artifact in a classfile, such as module membership, should have a direct representation in the corresponding source file. This is the kind of argument you think you can ignore in the short term but that you rue ignoring in the long term. The other argument against package-info is that you should not actually annotate a module declaration there even though it feels natural to do so. Enumerating module-level annotations is important for JSR 277, and the right solution is to declare them in a module-info.java file, following the precedent of package-info.java. So now you have module-info which says 'module M;' and package-infos which say 'module M;' and maybe we should just centralize the module's package list in module-info and leave package-info and normal source/class files alone? Sadly this doesn't work because the compiler and developer have no way of easily finding the "correct" module-info file for a given package.

In summary, when viewing a module bottom-up (compiler or developer reading a single source/class file), module membership should be in package-info for convenience; when viewing a module top-down (277 tool packaging a module given its name and constituent classfiles), module attributes should be in module-info for completeness. The moral argument, that module membership should be in normal compilation units for clarity, loses. Both module-info and package-info should be able to say 'module M;', just as compilation units and package-info can say 'package P;'. Annotations on 'module M;' in a package-info file are strongly discouraged.

Thursday Feb 14, 2008

People sometimes think that 'new T()' would be possible iff generics were reified. This is not true. Consider:

class Foo<T> {
T f = new T();
}

With erasure, you implement 'new T()' as 'new Object()', since Object is the bound of T. With reification, you instantiate an object whose class is the dynamic binding for T in 'this'. Either way, you must execute a no-args constructor.

But Foo doesn't require that a type bound to T (a.k.a. a witness of T) has a no-args constructor. 'new Foo<Integer>()' is perfectly legal, but Integer doesn't have a no-args constructor, so how is the instance initialization expression supposed to call 'new T()'? It can hardly make up a default value to pass to Integer's constructor.

'new T()' is fundamentally not possible in the context of nominal type bounds. (Or, if you prefer, in a context of separate compilation, since a global compilation could compute that 'new T()' is sound for all observed instantiations of Foo.) C# 2.0 introduced a structural type bound called the new() constraint to permit 'new T()'. However, they already had a need for interesting rules about which types can witness a type parameter, and in that context the "public parameterless constraint" is straightforward. C++ "concepts" go further in allowing a structural description of the types able to witness a type parameter.

Java is not going to get structural type bounds any time soon. Nominal type bounds of the form C&I (an intersection type) are complicated enough. Consequently, neither erasure nor reification alone can support 'new T()'.

Wednesday Jan 23, 2008

As you know, the syntax and semantics of a legal Java program are described in the JLS, so javac cannot just accept any program it likes. Patches to javac that change the space of accepted programs will always be contentious, but patches that improve the usability of javac itself would be well received by many people. There is obvious "low-hanging fruit" in diagnostics, because javac's reporting of type errors is less than ideal. Here are some ideas for mini-projects which would have a huge impact on programmer understanding and productivity:

I recall numerous bug reports where javac appeared to get accessibility wrong, whereas in fact the submitter was confused about the members present in some types. These reports inevitably featured multiple packages, public types, package-private and protected members, and pathological inheritance. Only careful reasoning about inheritance, and hence the exact membership of a type, would explain javac's (correct) behavior. javac knows this membership already - why not display it? (The same can be said for intersection types.)

Formatting

This is a simple one: display messages in a hierarchical way, and have heuristics to simplify or abbreviate qualified class names. You could also imagine having short-form and long-form versions of some error messages, where the long-form version suggests a way out of the problem.

Capture conversion and type inference

Those 'capture-of-451#...' messages are tough. Why not display the type of an expression before and after capture, including the upper and lower bounds of synthetic type variables? Constraints for the formal type parameters of a method would be good to know as well. In fact, any additional info about the operation of overload resolution is valuable when no most specific method is available.

I wouldn't be surprised if the utility of an error message is inversely proportional to the amount of code within javac which must be unravelled to create it. In any case, these ideas will hopefully stimulate some discussion and experiments in the OpenJDK compiler group.

Friday Sep 21, 2007

Everyone knows that java.lang.Object is the common superclass of all Java classes. It is also the common supertype of all interfaces, which do not 'extend' Object but do support the Object protocol. This makes it the Top type, useful for programming generic algorithms.

Top represents all values in a programming language. It ensures that the type hierarchy is a complete partial order by providing an upper bound for every pair of types. Computing the upper bound of types is what makes assignment and method call work (via widening reference conversion), so a well-founded type hierarchy is important.

(Ignore that the complete partial order for primitive types is distinct from the complete partial order for reference types. Sigh.)

The counterpart to Top is Bottom, a type that is the common subtype of all other types. Bottom makes the type hierarchy into a lattice because it ensures every pair of types has a lower bound. Lower bounds play a role in Java wildcards - specifically, capture conversion and type inference - so it could be useful to know that every type has a lower bound.

Java has the null type. Pre-JLS3, the null type was not officially the subtype of any type, and the null reference was not officially a value of any type except the null type. A decree made the null reference castable to any reference type for pragmatic reasons. (This is similar to the decree that makes List<?> assignable to a List<T> formal parameter even though List<?> is not a subtype of List<T>. You know the decree as capture conversion.) JLS3 defines the null type as a subtype of every type, so it looks an awful lot like Bottom.

(Strictly, JLS3 restricts the null type to be a subtype of every reference type. Again, just ignore primitive types.)

The null type is expressible, i.e. can be the type of a term. The compiler will expose it if necessary, e.g. int x = true?null:null;. But it is not denotable, i.e. cannot be written as the type of a term. You can't write NullType v = null;. An RFE asks for a name for the null type. Is this a good idea?

Beyond the use case in the RFE, being able to denote NullType would be useful in certain situations where type inference fails, because NullType may be a better actual type argument than Object. So that's in NullType's favor.

Bottom is usually not a denotable (or even expressible) type in textbook type systems because type rules must be special-cased to ignore it. (See Pierce 15.4,16.4) But in Java, the presence of a value for the null type means expression evaluations has always had to consider the null type, responding with a NullPointerException. (Indeed, the null reference means that the null type is not a true Bottom type.) Introducing NullType would allow more variables to store the null reference, but such variables evaluate to the null reference just like any variable of reference type can.

Statements would need tweaking. Consider the if statement: "The Expression must have type boolean or Boolean, or a compile-time error occurs." A type system with Bottom would allow the expression to have the Bottom type by subsumption, so traditionally an extra rule would catch that case and assign Bottom as the type of the whole statement. We just want if ([expression of null type]) ... to be illegal, so would need an "exactly" before "type boolean or Boolean".

[The first version of this blog entry said this wasn't necessary because final types didn't have any subtypes, not even the null type. Prompted by Remi's comment, I changed my mind. While a final class has no further implementations, special subtypes are possible.]

So, since Java already has the null reference, there is less problem adding NullType than if the null reference didn't exist.

Arrays cause a slight pain. A NullType[] can store only null values, which appears useless but someone will want it. On the face of it, we need the null type to be reified to enforce array covariance:

To avoid reification, we could define NullType[] as a static equivalent of List<? super NullType>. Then, elements could be added to a NullType[] but not removed (except as Object). A more drastic idea is to make arrays of NullType denotable but uninstantiatable, like arrays of generic types. The value of all this is becoming questionable.

Denoting the null type is less useful in Java than might be expected. Consider the classic uses of the Bottom type:

A return type for a function which doesn't return. Since the Bottom type is empty, the function has no possible return value. Those of you now hoping that NullType could indicate a method which tail-calls itself are out of luck, because the method could just return null;. What you need is Neal's Unreachable type, which is a true Bottom type because it's a subtype of everything and it's empty.

Signalling errors. Java has exceptions. Next.

A stand-in where no other reference type will do, as exemplified in the RFE. Here, the special subtyping properties of Bottom are less interesting than its emptiness. Neal's Void type is useful here.

To summarize, the null reference makes NullType in Java weaker than Bottom, which in turn makes NullType less problematic than Bottom but also less useful. No other major programming language denotes NullType, let alone Bottom, so it is hard to claim that Java is falling behind by not having NullType. It doesn't make things simpler, nor radically expand the space of programs that can be easily written, so don't look for it in JLS4.

Thursday May 03, 2007

People talk about the "conceptual weight" of language features. Here's a way to make that a more precise concept, and more accurate too.

The JLS is structured bottom-up: universal elements like grammar, types, values and names; then Java artefacts both major (packages, class, interfaces) and minor (arrays, exceptions), then back to universal concepts like statements and expressions. (The chapters on execution and binary compatibility are in the wrong place; they should be at the end with assignment analysis and the memory model.)

We can exploit this structure when adding a new language feature. By asking which JLS chapters would be affected by the feature, we can gain an idea of the feature's semantic impact, and thus its complexity. For example, adding a new statement form is done near the end, so it's a minor addition. Adding a new keyword is done up front, implying there are many artefacts the new syntax could interact with - such as the names of all your variables.

So if a new language feature L needs changes in a set of chapters S, the complexity of L is:

where degree(S) is the number of cross-references between chapters necessary to fully describe L. An approximation of this factor is P(S,2) - note a permutation not a choice because a forward reference and a backward reference each add complexity.

S should never include 1, the introduction, or 18, which is really an appendix. (And a syntactic grammar is already introduced in chapters 3-15.) You'd have to watch out for a couple of things: some chapters (notably 6) trivially recap or preview other chapters, and example code should not generate vacuous cross-references.

Since b is probably higher than a, 1.75b will be higher than 2.33a, reflecting the sense that the complexity of generics is higher than the complexity of enums. However, to really capture how much higher, you'd want to improve the granularity of S's elements from chapters to sections or subsections, so that |S| shoots up for generics. Also you'd want to force the measure's range into a reasonable value set by means of constant factors. (Different factors would be needed for different granularities of S.) Finally, since I'm claiming that a total order exists for the chapters, the breadth of the changes - i.e. max(S)-min(S) - could be used in the formula.

I don't use this measure in real life but it is fairly plausible. Comparing an abstract view of a single feature against the abstract view of the whole language might serve as a proxy for the effort needed to implement and test the feature in a compiler.

Tuesday Feb 06, 2007

With all the blogging about Java 7 language features, I thought I'd point out that many ideas are already represented by proposals in the Sun Developer Network database. The comments about each proposal go back years - to a time before blogging when people left their thoughts on a sun.com site.

Why have these proposals been hanging around for so long? Mostly because the process of evolving the language can only handle a relatively small number of features per release. There are hundreds and hundreds of possible features in the database. The tough part isn't designing any single one of them (and I would encourage you not to read too much into the exact designs contained at the links below) but choosing which ones to design. We have to Do The Right Thing as well as Do The Thing Right.

Suppose there are 200 proposals and we can implement 10 per major JDK release. Now calculate C(200,10) and you'll see that every Java developer on the planet can have their own favourite combination. Which combination should make it into the JLS? (Note I say the JLS rather than javac; people can play with javac to their hearts' content, but the JavaTM Programming Language can only take so much.)

So, while the proposals below may be excellent in and of themselves - and it does seem like some will be getting a new lease of life - please realize that in the past, there were more-excellent features which you don't see below because they made it into Java 1.3, 1.4 and 1.5! Now, without further ado:

Friday Feb 02, 2007

In recent years, many Java language features have been developed under JSRs. Notable examples are generics (JSR14), assertions (41), annotations (175, 308), and enums, autoboxing, foreach, varargs and static import (201). Language JSRS are part of into the core platform and get incorporated into the JLS.

As people adopt a new Java SE release, they explore these features for the first time and file Requests For Enhancement about them. RFEs in the scope of JSR201 are especially common. Unfortunately, it's really hard to approve any language request that once fell within the charter of a language JSR. A JSR Expert Group spends years considering every aspect of a feature, so if they design something a particular way, that's what the JLS will say. Determining an Expert Group's reasoning years after the event can be hard. However, I do try to discern it and include it in the Evaluation of any RFE that concerns a JSR-derived language feature.

But by default, and especially if history just isn't available, it's not appropriate to overturn or extend the scope of such a feature, no matter how reasonable the request. Only in an exceptional case will the JLS change in a non-trivial way.

Wednesday Jan 10, 2007

Hans writes an excellent post on the use of bound properties, and how a simple property keyword that simulates getX/setX methods wouldn't buy him much.

Properties also have a difficult interaction with access control. Declaring a property to be publically readable but only package/protected/privately writeable would either be impossible or need some hacky syntax for the read v. write access level. This is not an improvement over getX/setX.

Now, since no-one is talking about VM support, properties would be implemented through translation to methods and fields. Obviously this makes them less amenable to reflection, but my main concern is this. We have an increasing list of language constructs implemented through translation: instance initializers, bridge methods, inner classes (and the calling convention for their constructors), enums. Specifying such translations in the JLS is rare because they are implementation details. (Notable exceptions: 15.9.3 implies the calling convention for anonymous classes and 8.9 has some info about enums.) We don't want to restrict the classes emitted by compilers except when it's essential for source and binary compatibility. (The binary representation of a class in 13.1 is rather loosely specified for this reason.) Clearly, a cross-compiler convention for representing properties would be necessary in the JLS, so no-one would ever be able to implement properties in a more lightweight fashion.

I must admit I do like the increased safety available in Stephen Colebourne's property proposal, though maybe you could get that with method references: (borrowing from the Javapolis whiteboards)binder.bind(user, User.getFirstName.method);binder.onChange(user, User.setFirstName.method, closure);

But overall, like Peter von der Ahé, I am moving away from properties.

Wednesday Nov 08, 2006

I want to talk about the enhanced for ('foreach') loop, an immensely popular construct. Many people are surprised to find that it only accepts Iterable expressions (and arrays, which I'll ignore). Why not also Iterators?

The JSR201 Expert Group considered this issue at length. foreach is syntactic sugar; the compiler generates an iterator() and a loop variable and a basic for loop in its place. The primary reason against passing an Iterator to foreach is that the user-provided body could modify it during iteration, and so break the compiler's assumptions about its generated code:

By requiring Iterables, JSR201 essentially placed safety above raw functionality. But even with Iterables, user code can still interfere with the compiler's code. A Collection passed to foreach can be modified concurrently and most Collection implementations aren't synchronized internally, so the compiler-generated iterator could break.

So, given that interference is possible with Iterables, and given that using Iterators would be very convenient, maybe we should dial down the safety a little to add some functionality. I'm not proposing any changes now, but I am keeping an open mind on Iterators. I control for the fact that people who want Iterator support shout the loudest

(The argument against interference from user code is also why the loop variable isn't visible. This decision is very sensible.)

Greetings one and all. You're probably here because Gilad kindly pointed the way. His contributions to Sun span two millenia and have been more immense than most people know; the Java Language and VM Specifications set the bar for modern platform documentation. It is an honour to follow in his footsteps and I wish him the best of luck with his new and dynamic endeavours.

I plan to blog about interpretation of the JLS and JVMS; design issues on JSRs that I'm involved with; and proposals for language features both old and new. I also hope to bring the JLS and JVMS into the blogging age by publishing clarifications and corrections as tagged entries.

So with Java SE 6 emerging and the lifecycle for SE 7 starting - and with the debate about Java's place in the world reaching fever pitch - it will surely be the most interesting of times.