interned Strings : Java Glossary

Interned Strings avoid duplicate Strings. Interning saves RAM (Random Access Memory)
at the expense of more CPU (Central Processing Unit) time to detect and replace
duplicate Strings. There is only one copy of each
String that has been interned, no matter how many
references point to it. Since Strings are immutable, if
two different methods incidentally use the same
String, (even if they concocted the same String by totally independent means, e.g. one might use the
Stringsin in the context of Moses and another in the context of
trigonometry.) they can share a copy of the same String.
The process of converting duplicated Strings to shared
ones is called interning.
String.intern() gives you the
address of the canonical master String. You can compare
interned Strings with simple == (which compares pointers) instead of equals which compares the characters of the String one by one. Because Strings are
immutable, the intern process is free to further save space, for example, by not
creating a separate String literal for pot when it exists as a substring of some
other literal such as
hippopotamus

Why Intern?

To speed up String equality compares. Interned
Strings will compare faster even if you use equals instead of ==.

For example, if you wanted to read CSV (Comma-Separated Value)
files containing the party affiliation of 20,000 people into a HashMap, you would have 20,000Strings floating around in memory to record the affiliations. If you
interned the affiliation String, there would only be a
dozen or so. Every Democrat would safely share the same copy of the immutable
democratString.

Interning and String.substring

when you use
String.substring
the JVM (Java Virtual Machine) allocates a new String
descriptor, but it just points into the original String literal. It does not need to allocate space for the substring.
It does not copy any characters. String. substring does notintern the result. The original base
String cannot be garbage collected as long as there are
any live references to substrings inside it.

Empty Strings resulting from String. substring are not automatically
interned either. Because of this, the resulting empty substring can still
indefinitely encumber a long base String preventing it
from being garbage collected.

Interning and the void String

To ensure
you don’t accidentally encumber base Strings and to avoid the confusion of using a mixture of blank (i.e x.length() != 0 &&
x.trim().length() == 0, e.g. ), empty (i.e.
x.length() == 0, e.g. "") and null (i.e. x == null) to represent the voidString, you may want to use

The Intern Gotcha

All String literals present at compile time are automatically interned. It
is only Strings generated on the fly as the program runs
that might not be interned. A nasty side effect of this behaviour is that a program
will work fine for some simple cases, but fail on complex ones. The problem comes if
you used == to test for String equality where you should have used equals. The wrong code will still work much of the time because most
String literals are naturally interned.

Intern and newString( String)

Newbies

String s = new String("hello");

instead

String s = "hello";

This is the opposite of interning. You are deliberately creating a duplicate distinct
(but identically valued and definitely not interned) helloString
object. There are two legitimate uses for doing that:

To provide a unique String synchronization
object.

Unencumbering the huge base String on which a
substring is embedded. By making a copy with newString( String ), the
original String is free to be garbage collected. It can
pay to use newString(
String ), if you have only a few short substrings into a
common mother base String. Then garbage collection can
let go of the mother String. If you have a large number
of substrings so that the entire mother String is
represented in some substring, then there is no point in doing that. It is more
efficient to just reference into the common mother String with the substring.

Is newString compelled
to create a brand new underlying String when you use
newString( String )? Yes! You might imagine a clever JVM
that always interned every newString or that simply passed back the original reference, treating it
as a no-op. The language specification says that it is fact compelled, that
newString must create a new
unique reference, however, the JVM
could
theoretically do that by treating newString as if it were String. substring(0) or String. intern().substring( 0 ) and avoid
actually making a physical copy.

This brings up yet another related question. Is s==s. substring( 0 ) compelled to be false?
Yes!

One other place will see newString used legitimately is

String password = new String (jpassword.getPassWord());

getPassword returns a char[], so it is not the silliness it first appears to be. It does
this to permit you to empty the char array after use in
high security situations.

Consider piece of code like this: Strings = newString(
Hello ); The compiler puts
the literal Hello in the
class file is such a way that it will become an interned String when the class is loaded. When you stupidly use newString you create a new String on the heap, one with an address different from the interned
version. (In Oracle’s JVM, the interned Strings are stored in a special pool of
RAM called the
perm gen, where the JVM
also loads classes and stores natively compiled code. However, the intered Strings
behave no differently than had they been stored in the ordinary object heap.) Had you
written sensible code like this: Strings = Hello; you
would not have created a duplicate StringObject. You would not have defeated the interning. s would point directly to the interned StringHello.

Intern and garbage Collection

In the early JDK (Java Development Kit)
s, any String you interned could never be garbage
collected because the JVM had to keep a reference to in its Hashtable so it could check each incoming String to see if it already had it in the pool. With
Java version 1.2 came weak references. Now unused interned Strings will be garbage collected.

With JDK 1.2+, an interned String
can be garbage collected if there are no more references to it and it is not a
compile time constant. This means if you programmatically recreate the String (e.g. with a StringBuilder) and
reintern it, a new different String object, with a
different identityHashCode will become the master unique
String object. This quirk does not cause any practical
problems. When you compare two interned strings containing the same characters with
== they still always come out
true.

Overflow

java.lang.OutOfMemoryError:
String intern table overflow means you have too
many interned Strings. Some older
JVM ’s may
limit you to 64K Strings, which
leaves perhaps 50,000 for your application. The
IBM (International Business Machines) Java 1.1.8 JRE (Java Runtime Environment)
has this limit. This is an Error not an Exception if you want to catch it. Here is the source for a simple
Java program called InternTest.

Also be aware interning inhibits garbage collection of interned Strings.

Under the Hood

This is a simplified
version of how interning works under the hood. Inside the JVM
is the heap where all allocated Objects reside. This
includes Strings both interned and ordinary. (In
Oracle’s JVM, the interned Strings (which includes String literals)
are stored in a special pool of RAM
called the
perm gen, where the JVM
also loads classes and stores natively compiled code. However, the intered Strings
behave no differently than had they been stored in the ordinary object heap.) In
addition, interned Strings are registered in a weak
HashMap.

The collection of Strings registered in this
HashMap is sometimes called the String
pool. However, they are ordinary Objects and live
on the heap just like any other (perhaps in an optimised way since interned
Strings tend to be long lived). The StringObject lives on the heap and a
reference to it lives in the HashMap. There is so separate
pool of interned String objects.

Whenever a String is interned, it is looked up in the
HashMap to see if it exists already. If so the user gets
passed a reference to the master copy. Normally he will use that copy in preference
to his. His duplicate copy then will likely soon have no references to it and will be
eventually garbage collected. If the String has never been
seen before, a reference to it will be added to the HashMap and intern will hand him a
reference to his own String, now registered as the unique
master. Note that the intern process does not make a copy of the String, it just keeps a reference to the unique master copies.

All the Strings, interned and ordinary live on the
heap. When there are no references left to a String except
the internHashMap registry
reference, it will be garbage collected since intern
keeps only a weak reference to it.

When you say newString,
it is not automatically interned. Thus there may then be duplicates on the heap. If
you later use intern on that String, those duplicates won’t be cleaned up. Only when you
intern all copies of a String and discard references to the uninterned versions do you
maintain but a single copy.

Manual Interning

The big problem with intern is once you intern a String, you are stuck with it in RAM
until the program ends. It is no longer eligible for garbage collection, even if
there are no more references to it. If you want a temporary interned String, you might consider interning manually.

However, in the most recent JVM
s, the interned string cache is now usually implemented in soft references fashion,
so that interned strings may become eligible for garbage collection as soon as they
are no longer strongly referenced. Here is how you might manage the dedup internening
proceses yourself, similar to the way the JVM
does it.

For example, let as assume you were reading a CSV
file of names and addresses and storing it internally in a Collection of some sort. Since many people live in the same city,
RAM will soon
become cluttered with hundreds of duplicate String object
copies of the names of local cities.

Create a HashMap (not a HashSet) to look up by city a master String
object for each city. Every time you get a city, you look it up in the HashMap. If it is there, replace your reference with a reference to
the master copy. Your String object duplicate will then
become eligible for garbage collection. If it is not in the HashMap, add the city String to the
HashMap.

When you are finished with the adding cities, you can discard the HashMap. The master city Strings you put in
the HashMap will still exist, will still be unique, will
still behave as if they had been String. interned, except those without any other references will become
eligible for garbage collection.