5 things you didn't know about ... the Java Collections API, Part 2

Mutables to watch out for

You can take Java™ Collections anywhere, but don't take
them for granted. Collections hold mysteries and can make trouble if you don't
treat them right. In this installment of 5 things, Ted Neward
explores the complex and mutable side of the Java Collections API, with tips
that will help you do more with Iterable, HashMap,
and SortedSet, without introducing bugs to your code.

Learn more. Develop more. Connect more.

The new developerWorks
Premium membership program provides an all-access pass to
powerful development tools and resources, including 500 top technical
titles (dozens specifically for Java developers) through Safari Books Online, deep discounts on premier
developer events, video replays of recent O'Reilly conferences, and
more. Sign up today.

The Collections classes in java.util were designed to help,
namely by replacing arrays and, thus, improving Java performance. As you
learned in the previous article, they're also malleable, willing to be
customized and extended in all kinds of ways, in service of good, clean
code.

Collections are also powerful, however, and mutable: use them with care and
abuse them at your own risk.

1. Lists aren't the same as
arrays

Java developers frequently make the mistake of assuming that
ArrayList is simply a replacement for the Java array.
Collections are backed by arrays, which leads to good performance when
looking up items randomly within a collection. And, like arrays,
collections use integer-ordinals to obtain particular items. Still, a
collection isn't a drop-in replacement for an array.

The trick to differentiating collections from arrays is knowing the
difference between order and position. For example,
List is an interface that preserves the order in
which items are placed into a collection, as Listing 1 shows:

When the third element is removed from the above List, the
other items "behind" it slide up to fill the empty slots. Clearly, this
collections behavior differs from that of an array. (In fact, removing an
item from an array is itself not quite the same thing as removing it from
a List— "removing" an item from an array means
overwriting its index slot with a new reference or null.)

2. Iterator, you surprise
me!

There's no doubt that Java developers love the Java Collections
Iterator, but when was the last time you really looked at the
Iterator interface? Most of the time, we just slap
Iterator inside a for() loop or enhanced
for() loop and move on, so to speak.

But, for those who go digging, Iterator has two surprises in
store.

About this series

So you think you know about Java programming? The fact is, most
developers scratch the surface of the Java platform, learning just
enough to get the job done. In this series, Ted Neward digs beneath the core functionality of the
Java platform to uncover little-known facts that could help you solve
even the stickiest programming challenges.

First, Iterator supports the ability to remove an object from
a source collection safely, by calling remove() on the
Iterator itself. The point here is to avoid a
ConcurrentModifiedException, which signals precisely what its
name implies: that a collection was modified while an
Iterator was open against it. Some collections will let you
get away with removing or adding elements to a Collection
while iterating across it, but calling remove() on the
Iterator is a safer practice.

Second, Iterator supports a derived (and arguably more
powerful) cousin. ListIterator, only available from
Lists, supports both adding and removing from a
List during iteration, as well as bidirectional scrolling
through Lists.

Bidirectional scrolling can be particularly powerful for scenarios such as
the ubiquitous "sliding set of results," showing 10 of many results
retrieved from a database or other collection. It can also be used to
"walk backwards" through a collection or list, rather than trying to do
everything from the front. Dropping in a ListIterator is much
easier than using downward-counting integer parameters to
List.get() to "walk backwards" through a
List.

3. Not all Iterables come
from collections

Ruby and Groovy developers like to brag about how they can iterate across a
text file and print its contents to the console with a single line of
code. Most of the time, they say, doing the same thing in Java programming
takes dozens of lines of code: open a FileReader, then a
BufferedReader, then create a while() loop to
call getLine() until it comes back null. And, of
course, you have to do all this in a try/catch/finally block
that will handle exceptions and close the file handle when finished.

It may seem like a silly and pedantic argument, but it does have some
merit.

What they (and quite a few Java developers) don't know is that not all
Iterables have to come from collections. Instead, an
Iterable can create an Iterator that knows how
to manufacture the next element out of thin air, rather than blindly
handing it back from a pre-existing Collection:

This approach has the advantage of not holding the entire contents of a
file in memory, but with the caveat that, as written, it doesn't
close() the underlying file handle. (You could fix this by
closing whenever readLine() returns null, but that won't
solve cases where Iterator doesn't run to completion.)

4. Beware the mutable
hashCode()

Map is a wonderful collection, bringing us the niftiness of
key/value pair collections often found in other languages like Perl. And
the JDK gives us a great Map implementation in the form of
the HashMap, which uses hashtables internally to support fast
key lookups for corresponding values. But therein lies a subtle problem:
Keys that support hash codes dependent on the contents of mutable fields
are vulnerable to a bug that will drive even the most patient Java
developer batty.

Assuming the Person object in Listing 3 has a typical
hashCode() (which uses the firstName,
lastName, and age fields — all non-final
— to calculate the hashCode()), the get()
call to Map will fail and return null:

5. equals() vs
Comparable

When cruising through the Javadocs, Java developers frequently happen
across the SortedSet type (and its lone implementation in the
JDK, the TreeSet). Because SortedSet is the only
Collection in the java.util package that offers
any sorting behavior, developers often begin using it without questioning
the details too closely. Listing 4 demonstrates:

After working with this code for a while, you might discover one of the
Set's core features: that it disallows duplicates. This
feature is actually described in the Set Javadoc. A
Set is a "collection that contains no duplicate elements.
More formally, sets contain no pair of elements e1 and e2 such that
e1.equals(e2), and at most one null element."

But this doesn't actually seem to be the case — although none of the
Person objects in Listing 4 are equal
(according to the equals() implementation on
Person), only three objects are present within the
TreeSet when printed.

Contrary to the stated nature of the set, the TreeSet, which
requires objects to either implement Comparable directly or
have a Comparator passed in at the time of construction,
doesn't use equals() to compare the objects; it uses the
compare or compareTo methods of
Comparator/Comparable.

So, objects stored in a Set will have two potential means of
determining equality: the expected equals() method and the
Comparable/Comparator method, depending on the context of who
is asking.

What's worse, it isn't sufficient to simply declare that the two should be
identical, because comparison for the purpose of sorting isn't the same as
comparison for the purpose of equality: It may be perfectly acceptable to
consider two Persons equal when sorting by last name, but not
equal in terms of their contents.

Always ensure that the difference between equals() and the
Comparable.compareTo()-returning-0 is clear when implementing
Set. By extension, the difference should also be clear in
your documentation.

In conclusion

The Java Collections library is scattered with tidbits that can make your
life much easier and more productive, if only you know about them.
Unearthing tidbits often involves some complexity, however, like
discovering that you can have your way with HashMap, just as
long as you never use a mutable object type as its key.

So far, we've dug beneath the surface of Collections, but we haven't yet
hit the gold mine: Concurrent Collections, introduced in Java 5. The next
five tips in this series will focus on java.util.concurrent.

"Introduction to the Collections Framework" (MageLang Institute,
Sun Developer Network, 1999): This old but good tutorial is a complete
introduction to the Java Collections Framework prior to concurrent
collections.

The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.