Tuesday, 21 September 2010

Two features for #bijava

In my last post I brought up the idea of a backwards incompatible version of Java. What might just two of the incompatible changes be?

Backwards incompatible Java - #bijava

Firstly, a reminder. The key with this proposal is that these changes still result in a language that is still recognisably Java. The goal is to enhance the working lives of 10 million developers, most of whom do not have the luxury of just switching to another language. Improving their productivity by just 5% would be a huge benefit globally.

Secondly, its obviously the case that these ideas are not new. Scala, Groovy and Fantom are all trialling different approaches to a better Java right now. What hasn't been talked about has been taking some of these new approaches directly into Java once you remove the absolute hurdle of backwards incompatibility.

Remove primitives, Add nullable types

The first example is primitives and nullable types. The approach here is interesting, in that we remove a language feature in order to be able to add a better one.

Exposed primitives are a pain in the Java language. They were added to version 1.0 for performance and attracting developers from C/C++. Version 1.0 of Java was slow enough as it was, so performance mattered. And attracting developers is a key reason to add any new feature.

However, primitives have caused major problems to the language as it has evolved. Today, you cannot have a generified list of primitive types for example. The split between primitive and object types causes this.

The more appropriate solution is to have every type in the source code as an object. It is then the compiler's job to optimise to use primitives where it makes sense. Doing this effectively is where the nullable types comes in.

Nullable types allow you as a developer to specify whether any given variable can or cannot contain null. Clearly, an int is at a high level equivalent to an Integer that cannot contain null. Equivalent enough that a smart compiler could optimise the difference and use the primitive under the covers when the variable is declared as non-null.

But it turns out that academic research has shown that programmers actually intend the default of most variables to be not-null. Thus, we need to change the meaning of variable declarations.

Here we have added the ? symbol to any variable that can hold null. If a variable cannot hold null, then it is does not have the ? on the type. By doing this, the compiler can check the null-handling behaviour of the code. For example, the line "surname.equals(...)" would not compile without the previous check to ensure that surname was non-null.

In summary, this is a classic change which cannot be made today. Removing primitives would break code, so would changing the meaning of a variable declaration such that the default is non-null variables. Yet both are good changes.

The point here is that the resulting language is still effectively Java. We haven't scared off lots of developers. Its a challenging change for the language designer and compiler writer, but results in a lot better code for 10 million developers.

Equals

The second example of an incompatible change is the equals method.

In Java today, we use the .equals() method all the time for comparing two objects. Yet for primitives we have to use ==. The reasons are ones we rarely think about, yet if we take a step back its clearly odd.

Given how frequent the .equals() method is used, it makes perfect sense to have an operator for it. Clearly, the right choice for the operator is ==. But we can't make this change to Java as it is backwards incompatible.

But, with #bijava, this change can be made. The existing == operator is renamed to ===, and a new operator == is added that simply compiles to .equals(). (Technically, it has to handle null, which is another reason why nullable types help.)

As shown above, this change, seen in many other languages, has a huge impact on the readability of code. If you are working today, try spending 5 minutes replacing .equals() by == in some of your code, and see the readability benefits.

Of course this is another example of a change where we need a backwards incompatible version to gain the benefit.

Summary

Neither of these changes are really that radical. It is entirely possible to write a tool that will convert source code from the old form to the new and back again. The tool, plus the JVM bytecode, provides the backwards compatability story necessary to reassure managers.

Some will say that these examples aren't radical enough. And they're right. (They are just two proposals of many for what would be included in #bijava.) But the key point is that #bijava must be close enough to Java (post JDK 7/8) such that that huge body of 10 million developers can be brought along without excessive training or learning needs. Specifically, each change above can be taught to a developer in less than five minutes just standing at their desk.

It also means that #bijava is not a threat to Scala, Groovy, Clojure, Fantom or whatever you're favourite language is. Each of these has their own target area, and Java cannot and will not morph into them. Thus, they are free to continue to argue their own case as to why developers should just move away from Java completely.

#bijava isn't a panacea. It will not result in all the changes we might want. But it changes the mindset to allowing there to be some incompatibilities between versions of Java, providing the right supporting infrastrucure (modules, conversion tools) are present.

Feedback welcome as always! Please use #bijava to talk about backwards incompatible Java on twitter.
PS. I'll get comments invalidly marked as spam out as soon as I can!

More on the topic of this post - when it comes to equals I would favour going down the Scala path and just allowing what we call operators as method names, and allow single argument methods to be called as instance methodname argument.

It might help if you could declare types (classes) which the === operator was not permitted. That is these types were always compared by value. This avoids the problem of distinguishable instances of new Integer(3).

To me it looks like Google not Oracle will be in better position to implement biJava. They have a lot of elements ready with Java fork for Android, and will probably need to give up the Java compatibility anyway.Why not make this language better in the process.

@Robert, I can guarantee that a language in the style of Java (as #bijava is) will not have method name operators or methods called using spaces. Its a completely different style of language, and encourages the worst sides of operator overloading.

Sorry if this is a bad question, (I am kinda new). Why don't you prefer Validators instead? Where you can have a @NotNull annotation. Imagine if that gets built into the Java spec and introduced in JDK7, we can do crazy validation stuffs directly which can be more powerful than "?" and backwards compatible. I was introduced by validators from a seminar by Mike Keith, and I was impressed, but I believe it would be awesome to have in the JDK directly.

- All fields must be initialized before the super-constructor is called (or directly afterwards and calls to overridable methods are disallowed during construction); access to this is disallowed before then. Sound, but has a lot of slack: You can't create two objects referencing each other in final non-null fields.

@Mohamed Unfortunately the annotation JSR 305 (which has @Nonnull) is dormant, and JSR 308 which would allow such annotations in more places has been delayed until JDK8. I agree that this could deliver the 'nullable/nonnullable' feature suggested by Stephen.

So 1/3 of references will then need @Nullable - 9 characters - in front of them in a language already somewhat obfuscated by its verbosity. A one character ? distinguishing between references that can and cannot be null strikes me as simple, intuitive and light years better than all that extra noise.

Stephen, I definitely like the idea of BIJava.But I also believe that all incompatibilities should result in a compile-time error, in order not to silently change the semantics of existing code. I would therefore prefer keeping the semantics of == operator as is and introduce a new one for equals(). Candidates could be ~~ or ~=.

@Mark, I agree that unifying == and === for certain types like Integer makes perfect sense.

On @NotNull vs ?, I consider the class-level @NotNull to be too remote from the definition of the actual type/variable. By being close (and not verbose) it makes devs think more closely about their definitions.

@Ben, I'm interested that you find that aspect of Fantom to be a concern. Thanks for clarifying the unsound comment though.

@Stepan, The Scala/Haskell Option type requires a functional mind-set, which is a long way from mainstream. Its also a lot more verbose. If you like that style, use Scala.

* make Array a true class of Collections* hybrid types: i once had a function that would only work with serializable collections, I had to use generics to make it work, I want "Serializable Collection sc;"* unless/until : much easier to read than if(!a) or while(!a)*fix up the library in all its nasty places, lets make Date et al immutable, stack not a subclass of vector, maybe a compiler could just wrap/unwrap as necessary between bijava and regular java then?

Another concern is that the "specification" [1] states that "[a] non-nullable type is guaranteed to never store the null value." It should go into the issues of field reads and explain why a NPE might be thrown when reading a non-null field.

Since it's gone without saying until this point, we should also make sure the other comparative basics are covered, !=, >, <, >=, <=. Aside from !=, those would obviously check for a Comparable implementation to be valid. So much clearer than checking equality against the returned number.

As long as we're being backwards incompatible, I'd also have the compareTo method return an enumeration, because returning integers was confusing.

As for identity equals, do we need another operator for that? Perhaps a protected final method on Object would be sufficient for anyone longing for the old == Object behavior. Alternatively, System.identityEquals().

Identity equals should still be the default equals implementation, naturally, but I don't otherwise see much use for it. Perhaps I am biased, simply because I've so rarely needed it.

I think that entire problem with removing primitive types is around arrays (the real ones, which would exists somewhere behind the scenes, even if you want to hide them from the world).

You have basically following choices:1) As soon as array is concerned, everything is wrapped in object. Performace/memory nightmare.2) You have some kind of reified generics and require jvm to create multiple versions of the same code depending on the target type (at least at jit level). Not sure, but I think this is what C#/.NET has chosen to do?3)Marker bits for small integers like in many (if not all) implementations of smalltalk. Some performance hit for every single operation and it doesn't really help with double type (which I'm probably most concerned about).

Unless I have misunderstood you and you just want to be able to type 10.hashCode() which is not a problem at all.

To my knowledge, there isn't a "good" solution yet to the problems of initialisation with nullable types. By "good" I mean flexible, low overhead for programmers, and sound. To take a couple of examples, having to guard every dereference with a null-check is no fun for programmers who know by design that certain fields will never be null. The Delayed Types of MSR are very ingenious, but the system in the paper is far too complex for direct use at the course level. The implementation of delayed types in Spec# greatly simplifies the system, but at the expense of soundness.

As a mild shameless plug, we are hoping to publish a paper with (we think) a "good" solution :)

BTW, I think nullable types extend to generics ok but I'm not sure anyone has formally presented the details yet.