Modern Java applications do a lot of string manipulations due to web service API calls (i.e. JSON, REST, SOAP, …), external data source calls (SQL, data returned back from DB, …), text parsing, text building, etc. Thus, string objects can easily occupy at least 30% of memory. Apparently, the majority of those String objects are duplicates. Because of string duplication, a considerable amount of memory is wasted. Thus, to optimize the memory wasted by duplicate string objects, JEP 192 has become a welcome enhancement to Java.

What Does JEP 192 Do?

When the G1 GC algorithm runs, it removes garbage objects from memory. It also removes duplicate string objects from memory — string deduplication. This feature can be activated by passing the following JVM arguments:

-XX:+UseG1GC -XX:+UseStringDeduplication

Note 1: In order to use this feature, you need to run on Java 8 update 20 or later versions.

Note 2: In order to use ‘-XX:+UseStringDeduplication’, you need to be using the G1 GC algorithm.

Let’s Study With an Example

Let’s validate this feature with this simple program. This example has been chosen basically to study how the JVM handles duplicate strings.

We ran this program a couple of times with two different JVM arguments.

Run #1

The first time we ran the program by passing ‘-XX:+UseStringDeduplication’ JVM argument. i.e.:

-Xmx20M -XX:+UseG1GC -XX:+UseStringDeduplication

Run #2

The second, time we ran the same program without passing the ‘-XX:+UseStringDeduplication’ argument:

-Xmx20M -XX:+UseG1GC

During both the runs, we captured heap dumps and analyzed them through the heap dump analysis tool HeapHero.io. HeapHero.io detects the amount of memory wasted due to various inefficient programming practices, including the amount of memory wasted due to duplicate strings.

Even though the same code was executed, in Run #1 (where ‘-XX:+UseStringDeduplication’ was passed), you can see the overall heap size was 7.94mb, whereas in Run #2 (‘-XX:+UseStringDeduplication’ is not passed), there is a considerable increase in the overall heap size — 15.89mb.

Even though there is an equivalent number of string objects in both the runs (206k), the amount of memory wasted due to duplicate strings in Run #1 is 5.6mb, whereas in Run #2, it was 13.81mb.

This dramatic reduction in memory consumption was made possible because of the ‘-XX:+UseStringDeduplication’ argument, which evicted a significant number of duplicate strings from the application.

Thus we encourage you to take advantage of ‘-XX:+UseG1GC -XX:+UseStringDeduplication’ and reduce memory wastage caused by duplicate strings. This change has the potential to reduce the overall memory footprint of your application.