Friday, February 26, 2016

JEP 254 proposes changing the internal representation
of strings inside the JVM. As most readers surely know,
strings are stored using UTF-16, which uses two bytes per
character. This proposal suggests using a more compact,
one-byte-per-character representation internally: “Data
gathered from many different applications indicates that
strings are a major component of heap usage and, moreover,
that most String objects contain only Latin-1 characters.
Such characters require only one byte of storage,
hence half of the space in the internal char arrays of such
String objects is going unused,” says the JEP proposal.

Changing to the more compact form would not affect
existing code or any APIs; it would be a purely internal
change inside the JVM and not visible to programmers.
Interestingly, the information on the JEP’s web page
reveals that a string compression feature was tested in
Java 6. It converted String.value to an Object that pointed
either to an array of 7-bit characters or an array of regular
Java characters. That feature, though, was removed
subsequently.