Sunday, November 05, 2006

Many people are unsatisfied with the restrictions caused by the way generics are implemented in
Java. Specifically, they are unhappy that generic type parameters are not reified:
they are not available at runtime. Generics are implemented using erasure,
in which generic type parameters are simply removed at runtime. That doesn't render generics
useless, because you get typechecking at compile-time based on the generic
type parameters, and also because the compiler inserts casts in the code (so that you don't
have to) based on the type parameters.

Generics are implemented using erasure as a response to the design requirement that
they support migration compatibility: it should be possible to add generic type
parameters to existing classes without breaking source or binary compatibility with
existing clients.
I wrote
about this two years ago. Migration compatibility is exploited widely in JDK5; all
of the collection classes and interfaces use generics, yet existing code using
collections continues to work. Without migration compatibility, the collection APIs
could not be retrofitted use generics; we would probably have added a separate, new
set of collection APIs that use generics. That was the approach used by C# when
generics were introduced, but Java did not take this approach because of the huge
amount of pre-existing Java code using collections.

While solving one set of problems, erasure adds a set of its own problems. For a
type parameter T, you can't write a class literal T.class. You
can't use instanceof to test if an object is of type T. And you
can't create an array of T. But it gets worse. You also can't write class
literals for generic types like List<String>.class, or test if an
object is an instanceof List<String>, or create an array
of List<String>. The implementation
of generics using erasure also causes Java to
have unchecked operations, which are operations that would normally
check something at runtime but can't do so because not enough information is
available. For example, a cast to the type List<String> is an
unchecked cast, because the generated code checks that the object is a
List but doesn't check whether it is the right kind of list.

It isn't too late to add reified generics to Java. There are two basic approaches.
The first uses the language as it exists today but adds the generic
type information at runtime. In the ideal world, we wait until every
bit of Java code in the world has been modified to use generics safely,
and then go through a transition in which we start recording type parameters at
runtime by using a new javac and VM. There are two main difficulties with this
approach. First, it is not
compatible. It is not source compatible because you would no longer be allowed to
use raw types (i.e., it is impractical to wait until all code has been
generified); it is not binary compatible because new clients would invoke new
kinds of constructors for generic classes that also record the type parameters,
but existing classes do not have those constructors. (A hybrid approach
is being investigated in which the system records type parameters
only when they are available, but I haven't yet seen a workable proposal along
these lines.) The second problem is more
insidious: lots of existing code uses generics but uses them in an unsafe
way. For example, I have seen code that creates an ArrayList<Object>, but later casts
it to a List<String> where the author knows that it only contains objects of type
String. I would not be surprised to find such code in the JDK.
Such code must fail at runtime when generics are reified, so some existing
(but working) code would be broken.

The other approach modifies the language so that the declaration of a generic parameter
distinguishes reified type parameters from erased ones. It is a pure extension of the
language. The existing syntax for generics would continue to designate
generics by type erasure, but a newly introduced syntax would be used to declare
reified type parameters. Perhaps something like this:

The rules for these reified type parameters are straightforward. When using a
reified generic class, the type argument has to be a reified type. You would not be
allowed to create a NewCollection in its raw form. Code that uses only reified
types would never get an unchecked warning from the compiler, because
the compiler can always generate the correct checking code. If you have a reference of
type Collection that refers to a NewCollection<String>,
you would get a runtime error if you attempt to put anything other than a
String into it. In this sense reified collections are like
java.util.Collections.checkedCollection, only better because the element
type can itself be generic and the guarantees are stronger.
You can even create an array
of E inside an implementation of this type. But there is a disadvantage: the
strategy requires a new set of reified Collections APIs parallel to the existing
erasure-based ones. This is why C# introduced new collections APIs when they introduced
generics.

As the above declaration illustrates, the new reified collection interfaces could be
extensions of the existing collection interfaces. Consequently, existing APIs that receive
old collections can be passed new collections. For example, you could pass a
NewCollection<String> to an API that is declared to receive a
Collection<String>. In addition, the new reified collection classes
(that is, the implementations) could be extensions of the existing collection
classes. That allows the implementation of the old and new collection classes to
be shared, and provides a nice migration path. Existing APIs that return old
collections can be modified to return new collections. Over time, most libraries
can be migrated to use the new APIs without breaking backward compatibility
(though it would break unsafe clients). In
this way we can have a more gradual migration than would be possible with the first
approach.

I don't mean to trivialize the work involved: this requires a significant
revision of the VM and language specifications and a significant effort from the teams
who implement VMs, java compilers (i.e. javac and Hotspot), and the core JDK
libraries. From the programmer's
point of view, however, the language change is small and mostly involves
removing restrictions.

Approaching the problem this way has a secondary benefit. Because the new reified
collection interfaces don't exist yet, it is possible for methods to be added to
them when they are introduced. Perhaps sorting methods belong in NewList?
If reified generics are added at the same time as
(or after) closures, a number of useful methods could be added to take advantage of
closures. Filters, mappers, reducers, and visitors are just some of the ideas.

58 comments:

Anonymous
said...

Maybe at the same time as new improved generics, we could have new improved arrays that fit better with generics, something like this RFE. So an array T[] could appear, at the language level, to be just syntax sugar for a List[T].

I'm no expert, but it seems to me that this would involve similar compatibility issues and VM changes as removing erasure would.

Furthermore, maybe features such as true multidimensional arrays could be added at the same time. Supposedly this could really improve Java performance in scientific computing, if that's still a goal.

Was this second solution brought up during the development of generics for Java? If so, was it merely disregarded as a way to control scope in Java 5? Why wasn't this discussed, then, for Mustang? If it was not brought up several years ago, why not?

This certainly is a very interesting proposal, but I was wondering for what style of programming reification of type parameters is required? I certainly recognize that erasure is one of the most puzzling aspects of generics for beginners, but once I got familiar enough with generics and understood the design goal of compatibility (with a good book like Java Generics and Collections), I didn't miss the reification all that much anymore. Where I had previously tried to use it, it was frequently a case of bad style (e.g. instanceof-test).

I can see that reification of type parameters is required when you depend on reflection or more advanced meta-programming, but are there compelling motivating examples besides that?

One big question for me is: should this be done in Java 1.x or should a real Java 2.0 be created (and while doing that do a real clean-up of the library, i.e. remove deprecated stuff and other breaking changes)?

I think that recent flood of RFEs for Java language indicates that we need completely new language, designed from ground up, without backward compatibility, revised libraries (remember Cloneable, Date?) etc...Adding all these new stuff to existing Java language while preserving backward compatibility just makes it more complex than it should be.

Reified types are prone to combinatorial explosion - your program will spend a lot of time constructing and taking apart types, and will use quite some heap space for representing nested parameterized. This is somewhat strange if you understand types as provable statements about your program; once they are proven, it should not be necessary to compute them again and again while the program is running.

Also, when types get more complex, so does the "instanceof" operator - you would have to carry the whole typechecking logic along at runtime.

Another interesting point is the representation of recursive types, e.g. "class Integer implements Comparable<Integer>" - reification would require a circular structure, which might make matching, interesting.

Reified generics would be much easier to explain and more intuitive to work with.

Because we'd always be working inside the type specifier, the language could give us a new keyword. "reified" would be much clearer than "class" (and I'd expect fewer collisions with "reified" than with "assert").

I'm with mileta, start a new Java-like language that happens to run on top of the JVM. Extend the JVM, if necessary, in compatible ways to support the new language. Obviously you want to allow your new language to import all Java classes as first class citizens.

Chris and Mileta: if the new language is to run on top of the JVM, support generics, and interoperate with existing Java classes, then either it implements generics by erasure or both the new language and Java are extended in this way. Given that, it isn't clear how adding a new language helps. Moreover, what is desireable to do in a new language has little impact on what ought to be done with Java.

Reified generics are what everyone expects. Languages, like user interfaces, are easier to use when they do what you expect.

That said, I worry that sentiments like "what is desireable to do in a new language has little impact on what ought to be done with Java" lead us to concepts that only make sense after 5 years of Java development, rather than making sense immediately on inspection.

If we have to add keywords, let's add them and keep things readable. Fatih Coskun's suggestion is very readable, perhaps the second-best is to break everything consistently:

I had always figured that rather than even trying to have the new reified "enforcement" which breaks the entire world into enforced and non-enforced that we'd just have more runtime information embedded in an instance of a Foo[T]. With enough extra information one could do new T(x), new T[], instanceof T, etc, in the implementation. One could also check what a Foo[?]'s parameterized type is at runtime, etc.

Rather than bifurcating the world into fully reified and not, it may make more sense just to have a set of "enforced" variations or wrappers to existing collections, etc, e.g. Collections.enforcedMap(Map[T,V]) that uses such runtime information to provide a wrapper for Map[T,V] that can be used by *all* code operating against Map but uses runtime checks to prevent type violations.

Sure the runtime checks are slower than compile time ones, but time could be spent heavily optimizing them. Most importantly the Java codebase would not be split into reified and non-reified sets.

The "checked" wrapper approach doesn't solve the problem because it can only be used for types for which you can write a type token. Those are only the reified types. That's why this requires a language change.

You mention the possibility of adding new methods (sort() etc.) to NewList.

Wouldn't it be better do allow "extension methods"?So if you had a static method m1(a1, a2, a3) - probably with an appropriate annotation - then you can call this method as a1.m1(a2, a3) - i.e. write it as if you were dispatching on the first argument?Is this being considered?

if the new language is to run on top of the JVM, support generics, and interoperate with existing Java classes, then either it implements generics by erasure or both the new language and Java are extended in this way.

That sounds right. To be specific, Scala is a language/implementation with all the properties you mention, and it implements generics by erasure.

Given that, it isn't clear how adding a new language helps.

I think Scala avoids some of the problems you mention with erasure by having a different type system, with definition site variance, that avoids some of the problems you mention. Some discussion of this here

http://lamp.epfl.ch/~emir/bqbase/2006/10/16/erasure.html

Count me with Chris and Mileta on supporting a new language. Since Scala is a working proof of concept, maybe it should just be renamed to Java 2 or 6 or whatever.

Even a new language using reified generic is introduced to JVM. How about existing Java library? Should this new Language with erasure generic be able to interact with this new language ? If so, how ? erasure <===> reifiedIf no, why ? new language will become lack of library support and lose the main advantage of using JVM platform.

Adding a new language is good. However, it doesn't help to correct generic. We still need a good idea that help us to migrate from erasure to reified.

The last paragraph of this blog entry is my favorite part. So far, I really like how carefully Sun has extended Java---always in very useful ways. Especially with the new ideas showing up for Java 7, it is still an exciting language to follow. I used to prefer Python to Java, but with the latest Java IDEs, it is now the other way around. There are now (after the improvements currently being discussed: a new module system, super-packages, closures etc.) only a few ways in which Java is losing out to other languages such as Python, Smalltalk or CLOS:

- A cleanly defined base library: This is where functional languages and Python excel. Things like the Apache Commons should not be necessary. Examples: Why can't I splice and join strings? Why can't I use negative indices for collections? Map, zip etc.

- Collection literals: All scripting languages have them. I think many people are so excited about scripting languages only because collection and string manipulation is very powerful there. But that's nothing that Java couldn't easily copy.

After reading your blog entry, i was wondering if couldnt we have a very very light version of reified generics by just maybe subclassing the generic types at runtime (ok technically not at runtime, but kinda like what anonymous inner classes are)? that way, the compiler can infer the type that is used when the type is instantiated, and through reflection, you can always know what type it is, etc. And then, maybe adding one or two shortcuts and allowing E.class or instanceof E would can have the best from the two worlds. Just wondering.best regards,takeshi

One more comment, sorry. In my version of the "Java 2.0" idea, I wouldn't recommend removing deprecated items. I'm all for backwards compatibility while removing visible cruft for common cases. A new filename extension might solve this.

Sounds good to me...BTW, how are you guys planing on interfacing with legacy code? The vast majority of the software int the world today is java, pre-generics, so if you do that wont you need multiple inheritance Vector extends Collection, but now you need a Vector<<String>>?So lets add that, and BTW, wouldent method pointers be nice? I know, we have Runnable, and reflection, but who cares, right?One last thing, why is C++, and C# not being used by most cooperations today? I mean most of them are not running red hat (though some still are)?Todd_Musheno@yahoo.comPS: If you are wondering where I am getting my opinions from I have been doing this since 1995, and I currently work as a Java Architect (So, I lead Designers, who lead Developers).

Rather than having to have java.util2 or some such, can we just admit that type safety, like any other goal, is not an absolute paragon of virtue?

I'm sure there's a more limited form of reification that is possible without having to introduce 2 collection classes in place of each one that exists today.

For instance, type information could be conditionally available at runtime in cases where the object in question is created by a class compiled with -source/-target of a some later JVM version which would capture type information during creation, etc.

This information could be used for things like T.class without a warning in cases the compiler can check apart from unsafe castes (e.g. within an instance method of SomeCollection[T], for instance) and with a warning (and possible runtime error) in other instances.

@Chris: Your proposal certainly is simple. That's because it comes without a specification of the semantics. I think you'll find that when you attempt to specify the semantics, and particularly the overriding and subtyping behavior, things aren't so simple or clean. But I'll be happy to be proven wrong. Another possible first step for be for you to provide an implementation of your idea in ksl. Either way, I look forward to seeing how you turn your idea into a language construct.

Would refeicated generics allow us to create a return for a method that would always return the type of this? This question was asked in the java desktop forums, in relation to introducing method chaining to the swing api, without having to redefine each method in each subclass

arithmetic in java seem very awful to me,object wrapper of primitive types dont have addiction, subtraction and so on...so the programmer must surf to primitive types and object and back, autoboxing help a little,but dont mix well with generics, for example, imagine a class that add 2 number, without generics i write:

Reified generics are better for a million of reasons; two priorities for me are:

1. Behave as expected (principle of least surprise). 2. Performance! Amazing that very few people mentions this one on erasure-vs-reified debates; e.g. reified generics a la NET can be heavily optimized for primitives, removing boxing overhead.

Regarding a new syntax for reified, I would prefer something uncluttered like:

public class List<T'> {}

But I really don't care about the syntax all that much as long as the feature is available!

"Reified types are prone to combinatorial explosion - your program will spend a lot of time constructing and taking apart types, and will use quite some heap space for representing nested parameterized."

People have decades of experience with reified generics, and this simply isn't true. In fact, operations on reified generics are no different from operations on generics with erasure: method calls, member access, etc. all still behave the same way.

The only thing that differs is the availability of runtime information and better static type checking.

I posted an alternate Generics proposal on http://java.dzone.com/news/java-5-sucks-according-ibatis- which I would appreciate your feedback on.

Essentially, code compiled against non-Generic List would get interpreted as List<Object> under JDK 1.5. Then if this code tries to invoke Generic code you cast from List<Object> to List<T> at runtime as necessary.

It is my understanding that this approach would provide full migration support. You gain Reification but you lose performance when running JDK 1.4 code on newer JREs. Still, I believe this is a good trade-off and as code is upgraded to use Generics the performance penalty goes away.

http://java.dzone.com/news/java-5-sucks-according-ibatis- explains what I mean in more detail, including some sample code. Please let me know if I missed anything obvious.

Dozens of language "proposals" are now generated weekly by people incompetent to the task. Sometimes the author is quietly tolerated by the community until their attention to the issue wanes. Sometimes a more persistent author attempts to pursue their proposal, and eventually comes to realize that they are out of their depth. Occasionally an individual with a complete lack of talent never comes to appreciate the futility of their approach, and attempts to recruit support from a wider community of amateurs.

For more ambitious changes that affect the type system, you should create a working prototype of your proposal to help refine your ideas. The attempt will at least help you learn if your ideas hang together. If they do, you'll want to formally specify the system (or find an interested academic who can help you do so) in a technical paper and submit it to a peer-reviewed journal or conference such as OOPSLA or TOPLAS. That's an excellent way to get some attention for a possible language change.

@Retention controls whether annotation will be included into classfile/available at runtime or not - this is actually type-related information. Isn't it something similar to what is discussed here - availability of some type information at runtime?

@Target controls applicability of annotation - is there any significant difference between compiler type error on method application to incorrectly typed parameters and annotation application to incorrect "type" of annotation target?

@Free: @Target only affects the legality of annotations. It does not affect the generated bytecodes of statements or expressions. Similarly, @Retention controls what information is preserved statically, but that doesn't affect the dynamic semantics of Java code. Reification would require a modification of the dynamic semantics of Java code and a corresponding change to the generated bytecode, as the currently generated bytecode does not provide enough information for type parameters to be preserved.

Richard: thank you for the reference to your unattributed description of my ideas, also documented at http://gafter.blogspot.com/2006/12/super-type-tokens.html and http://gafter.blogspot.com/2007/05/limitation-of-super-type-tokens.html

So I think it should be relatively safe to assume that the JDK is free of such code.

So far I haven't seen a good use case for reified generics. It seems to me like just adding bloat and I certainly hope such things are not added to Java anytime soon. If it is, I hope it is added in such a way that it will only record and run-time use extra type information in situations that can't be checked compile-time, and that the compiler can be made to issue a warning in such cases.

10 years later...still waiting for a such useful feature (scheduled for java 10, but who knows) maybe it will be "unuseful" by the time it will come, as scripting languages have won certain challenges and came close to be de facto industry standard.

About Me

Neal Gafter is a Computer Programming Language Designer, Amateur Scientist and Philosopher.
He works for Microsoft on the evolution of the .NET platform languages.
He also has been known to Kibbitz on the evolution of the Java language.
Neal was granted an OpenJDK Community Innovators' Challenge award for his design and
implementation of lambda expressions for Java.
He was previously a software engineer at Google working on Google Calendar, and a senior staff engineer at Sun Microsystems,
where he co-designed and implemented the Java language features in releases 1.4 through 5.0. Neal is coauthor of
Java Puzzlers: Traps, Pitfalls, and Corner Cases (Addison Wesley, 2005). He was a member of the C++ Standards
Committee and led the development of C and C++ compilers at Sun Microsystems, Microtec Research, and Texas Instruments.
He holds a Ph.D. in computer science from the University of Rochester.