Tuesday, July 31, 2007

Who controls the iteration? A fundamental issue is deciding which party controls the iteration, the
iterator or the client that uses the iterator. When the client controls the iteration, the iterator is called an
external iterator (C++ and Java), and when the iterator controls it, the iterator is an internal iterator
(Lisp and functional languages). Clients that use an external iterator must advance the traversal and
request the next element explicitly from the iterator. In contrast, the client hands an internal iterator an
operation to perform, and the iterator applies that operation to every element in the aggregate.

External iterators are more flexible than internal iterators. It's easy to compare two collections for
equality with an external iterator, for example, but it's practically impossible with internal iterators.
Internal iterators are especially weak in a language like C++ that does not provide anonymous
functions, closures, or continuations like Smalltalk and CLOS. But on the other hand, internal iterators
are easier to use, because they define the iteration logic for you.

To make this very concrete, one might define a collection-like interface
using external iterators like this:

Languages with well-integrated support for closures (such as Scala, Smalltalk,
and Ruby)
usually provide support for looping over their collections
using internal iterators - they are, after all, easier to use in most cases -
while other object-oriented languages (such as C++, Java, and C#) tend to
use external iterators. Without
well-integrated language
support for closures, internal
iterators would be
too painful to use effectively.
For that reason, the Java collection framework uses external
iterators. But once we have closures
in the language, wouldn't
it be worth reversing that decision?

The answer is no, and it isn't
just because it would be an incompatible change to an existing interface.
As discussed above, external iterators are more flexible for some
clients. The simpler code that clients can write using internal
iterators is already achieved in many clients (of external iterators)
due to the previous addition
of the for-each loop in JDK5. For the remaining clients, simple library
methods can bridge the gap between internal and external iterators.
See, for example,
the "eachEntry" method for iterating over the entries
of a map, discussed in my earlier postings on closures. To see how easy
the conversion is,
here is the code to convert from an external iterator to an internal one:

Iteration using internal iterators is often much easier to
implement, because the iterator implementation doesn't have to
explicitly store and
manage the state of the iteration. Much of the complexity in the implementation
of the iterators
for Java's HashMap and TreeMap
(and their Set cousins)
would simply vanish if the iterators were internal.
For that reason, it is interesting to see if it is possible to have the iterator
implemented internally, but exposed to the client externally, by writing a
utility method that converts between the two iterable interfaces.
This is the reverse of the conversion above. How easy
this is to implement depends on the features of your
programming language.

You can solve the problem in Java by resorting to the use of a separate
thread to simulate coroutines. The result is messy and expensive,
as each converted external iterator requires its own thread.
Here is my implementation;
can you do better?

Thursday, July 05, 2007

One of the ideas for improving the Java Programming Language
is
"type inference" on variable declarations. The idea is to simplify a
pattern of code that now appears in programs due to generics:

Map<String,List<Thing>> map = new HashMap<String,List<Thing>>();

surely we shouldn't have to give the same type parameters
twice?
The simplest proposal to relieve this redundancy allows

map := new HashMap<String,List<Thing>>();

This introduces the new colon-equals token and the
declaration-assignment statement. The variable appearing on the
left-hand-side of the statement is implicitly defined by this
statement, and its type is the type of the expression on the
right-hand-side. I don't like this proposal. It both goes too far and
not far enough.

It goes too far in that it allows the programmer to elide the
type
in a variable declaration. The type in a variable declaration is
valuable documentation that helps the reader understand the program,
and this proposal reduces the readability of programs by allowing it to
be elided. Worse, it assigns the wrong type to the variable. Following Effective Java
(first edition, item 34),
the type of a declared variable should be an interface type. This
statement form forces the variable to be of the (likely more specific)
type of the right-hand-side. Consequently, the programmer may
inadvertently depend on features of the concrete implementation class
when using the variable. That would make it more difficult to modify
the
program later by selecting a different implementation type.

This syntax doesn't go far enough because the verbosity of
creating
generic classes is worth eliminating in other contexts as well.
Programmers today work around the verbosity by providing static factory
methods corresponding to constructors:

static <K,V> HashMap<K,V> makeHashMap() { return new HashMap<K,V>();}

This addresses the immediate problem:

Map<String,List<String>> map = makeHashMap();

Unfortunately, this idiom replaces one form of boilerplate (in
variable initialization) with another: trivial static factories. A
generic class is typically created more than once, so adding a single
static factory can simplify the code at every creation site. But with
language support, we can do better.

I propose a new form of class instance creation expression:

Map<String,List<Thing>> map = new HashMap<>();

Using empty type parameters on a class instance creation
expression asks the language/compiler to perform type inference,
selecting appropriate type parameters exactly as it would in the
invocation of the equivalent trivial static factory.

Type inference today works on the right-hand-side of an
assignment. I also propose that we enable this new form to be used in
more situations by improving type inference for expressions appearing
in other contexts:

the argument of a method call

the receiver of a method call

the argument of a constructor

the argument of an alternate constructor invocation

This would enable generic methods to be invoked in these
contexts without providing explicit type parameters.

About Me

Neal Gafter is a Computer Programming Language Designer, Amateur Scientist and Philosopher.
He works for Microsoft on the evolution of the .NET platform languages.
He also has been known to Kibbitz on the evolution of the Java language.
Neal was granted an OpenJDK Community Innovators' Challenge award for his design and
implementation of lambda expressions for Java.
He was previously a software engineer at Google working on Google Calendar, and a senior staff engineer at Sun Microsystems,
where he co-designed and implemented the Java language features in releases 1.4 through 5.0. Neal is coauthor of
Java Puzzlers: Traps, Pitfalls, and Corner Cases (Addison Wesley, 2005). He was a member of the C++ Standards
Committee and led the development of C and C++ compilers at Sun Microsystems, Microtec Research, and Texas Instruments.
He holds a Ph.D. in computer science from the University of Rochester.