Macs, Modularity and More

Eclipse memory optimisation

Apologies for the dearth of posts recently – 2015 will go down as one of my
lowest blog posting years, from a high of a hundred+ per year a few years ago.
Partly that’s because I’ve been writing books and also a change of job as well as hosting
the Docklands LJC. But
never mind that now, onto Eclipse …

I’ve been looking at performance of Eclipse over the last few months,
specifically regarding the start-up time and also a micro-memory optimisation
as well. I’ve been promising myself to blog about this for a while, but not
had time until now.

The first I’ll talk about is the use of new Boolean in the codebase. This
turned up by accident when I was looking at memory usage and whether String
de-duplication would be beneficial to Eclipse (there are a lot of Strings
in a runtime Eclipse instance). Side note: Eclipse Memory Analyser Tool (MAT)
is excellent; and it’s part of the Eclipse Mars release, so you should install
it right away. Go on, I’ll wait.

String de-duplication can be turned on with -XX:+UseStringDeduplication in an
eclipse.ini file, or with an option -vmargs -XX:+UseStringDeduplication on
the command line. This works by comparing Strings to a prior list of values
when performing garbage collection, and if the data of two Strings are the
same, then the backing array of one is replaced with the backing array of the
other. You still have two independent String instances (so a == b is false)
but the underlying char array is the same (so a.value == b.value).

It turned out that there were a heck of a lot of references to
www.eclipse.org (several tens of thousands, if I recall). Now Eclipse doesn’t
need to be that vain, and it turned out that all of these references were
created with new URI calls that were indirectly being driven by references in
P2 files like content.xml and artifacts.xml (or their compressed
counterparts). It turns out that if you cache these in a Map based on the
hostname, then you can create an efficient way of acquiring these objects to
prevent excessive memory usage/recycling. This change
was merged in for Eclipse Mars.

Anyway, whilst in the MAT view I ran the ‘Boolean instances’ check, and it
showed that there were around ten instances of Boolean in the heap, and this
was from a relatively empty Eclipse instance. Now, a boolean value only has
two values, so finding ten instances was a little confusing. It turns out that
most of these are from code that looks like new Boolean(value) where value
is either a String (e.g. "true" or "false") or a plain wrapped boolean
value. The former is used widely for representing options and preferences
in Eclipse (e.g. use tabs or use spaces) and so the code used new Boolean()
to do the parsing. In some cases, the booleanValue() was being used to then
convert the object wrapper into its boolean counterpart, for use in an if
statement or a local boolean value.

The main use, then, of new Boolean was to perform parsing on the string
value, so that it could be used in a test; or in some cases, stored as a value
in another collections class. (There are a few places where Boolean is being
used as a tri-state; true, false and null). When Java originally was
created, it didn’t have a separate parse method; and when Eclipse was written,
Java 1.2 didn’t have any other way of doing parsing of truth values other than
using the constructor.

Fortunately Java 1.5 added Boolean.parseBoolean() which does the same parsing
as the constructor, and returned a boolean value from a String. (In fact
the constructor now delegates to that static method to do its work.) However,
by that time large quantities of Eclipse code had been written using the
constructor and with no warnings raised by Eclipse itself these went undetected
for a long time. Java 1.5 also added Boolean.valueOf() which acted in
exactly the same way as the constructor, taking either a String or boolean
value, and then returning one of the canonical Boolean.TRUE or Boolean.FALSE
instances. In fact, several of the changes turned up things like
Boolean.valueOf(true) which could trivially be replaced with Boolean.TRUE
and Booolean.valueOf(something).booleanValue() which could be replaced with
Boolean.parseBoolean(something) that has exactly the same effect, but without
object creation.

It’s primarily because of Eclipse’s age and size that these changes existed.
Eclipse 3.0 was released the same year – 2004 – that Java 1.5 came out, and
had almost
two million lines of code
already in place by the time that happened; it wouldn’t be until Eclipse 3.3
was released in 2007 that support for Java 1.4 was dropped and Java 1.5 was a
minimum, so that was the earliest time such a change could have taken place,
by which time there were 17 million lines of code.

In any case, thanks to a number of successful code reviews, many of the places
where new Boolean is called have now been weeded out:

Many of these were found by running Eclipse and doing a search for references
to the new Boolean constructor (Cmd+Shift+G) when importing projects, but
once the set of repos were known I did a git grep "new Boolean(" to find
the locations, followed by a sed -i~ "s/new Boolean(/Boolean.valueOf(" to
do the rewrites. Places where true and false were seen in the diffs were
then replaced with Boolean.TRUE and Boolean.FALSE and combinations of
Boolean.valueOf().booleanValue() were replaced with Boolean.parseBoolean()
by inspection.

It turns out that Sonar is good at spotting these things as well, with its
Boolean Instantiation rule; a list of all projects (that are covered by Sonar)
that have new Boolean() calls can be found by
running a search and putting Boolean Instantiation as a More Criteria field;
apparently there are some 244 references that are still present in the Eclipse
codebase – though this won’t contain any in-flight reviews or some of the
recent changes. It looks like I need to submit a patch for CDT next …

Thanks to Mickael Istria
for pointing out the Sonar results to me (see his
blog post
for more details), and of course everyone who has been reviewing patch-bombs
from me, and hopefully using this we’ll be able to banish new Boolean
from Eclipse completely.

PS Micro-optimisations should not be done 99% of the time but code cleanups may
be worth it for their own sake.