I am facing a strange performance problem in our production environment when I
use the conversion pattern "C:L" to output class name and line number. I DO
expect the performance to be slower but what I got was a gradual slowdown where
our application would literally slow down to a grinding halt after a few days of
running. The application would get back to normal if I force it to run a full
GC. After working with HP JVM engineer on this issue, the problem was found to
be the use of String.intern() in CategoryKey class AND the use of
java.lang.Throwable.getStackTraceElement() (to get line number and classname).
The use of intern() causes the StringTable in JVM (in the Permanent heap) to
fill up very quickly as each call to Logger.getLogger() will create an instance
of CategoryKey object. The method getStackTraceElement() will use the
StringTable for lookup and it will take longer for it to run as time goes. The
StringTable may contain Strings which are not live in the java program.
However, the collection of Strings in StringTable is done only in Full GC.
That's the reason a Full GC recovered the application performance. I have
attached a sample program that will reproduce the problem easily on any
platforms using 1.4.2 JVM.
My question to the logging group is whether the call to intern() is necessary in
creating the CategoryKey? Is it used to speed up hash lookup? This seems like a
VM bug but I doubt it will get fixed by Sun anytime soon and I am worried that
other system will experience the same problem I do. (took us a few weeks to
figure out the problem!)
My workaround is to adopt the use of Apache Common Logging API which will cache
each instance of logger and eliminate the need to call Logger.getLogger() which
in turn eliminate the need for CategoryKey to call String.intern().
The code that will duplicate the problem is here, run it as followed:
"java --XX:MaxPermSize=512m -XX:PermSize=512m st" and telnet to port 12345 to
trigger garbage collection
import java.io.*;
import java.net.*;
public class st {
public static void main(String[] args) {
gc_monitor_thread gcm = new gc_monitor_thread();
gcm.start();
trace_thread tt = new trace_thread();
tt.start();
for (int i = 0; i < 4; i++) {
string_thread st = new string_thread();
st.start();
}
}
}
class string_thread extends Thread {
boolean alive = false;
String tstr = "abc";
public void run() {
alive = true;
long lcnt = 0;
int mcnt = 0;
while (alive) {
String s = getName() + Long.toHexString(lcnt);
if (s.equals(tstr)) mcnt++;
s.intern();
lcnt++;
if (lcnt % 100000 == 0) {
try { Thread.sleep(10); } catch (InterruptedException ie) {}
}
}
}
}
class trace_thread extends Thread {
boolean alive = false;
public void run() {
alive = true;
while (alive) {
long t0 = System.currentTimeMillis();
for (int i = 0; i < 100; i++) {
do_stack_trace_op();
}
long t1 = System.currentTimeMillis();
System.out.println("time for trace: " + (t1 - t0));
try { Thread.sleep(1000); } catch (InterruptedException ie) {}
}
}
void do_stack_trace_op() {
Throwable t = get_throwable();
PrintWriter pw = null;
// try {
pw = new PrintWriter(new StringWriter());
t.printStackTrace(pw);
// } catch (IOException ioe) {
// System.out.println(ioe);
// System.exit(2);
// }
}
Throwable get_throwable() {
return new Throwable("dummy");
}
}
class gc_monitor_thread extends Thread {
boolean alive = false;
public void run() {
alive = true;
int try_limit = 100;
int try_count = 0;
while (alive) {
int pn = 12345;
ServerSocket ss = null;
while (ss == null) {
try {
ss = new ServerSocket(pn);
} catch (IOException ioe) {
pn++;
try_count++;
if (try_count > try_limit) {
System.out.println("too many failures while creating server socket.");
System.out.println("exitting...");
System.exit(2);
}
}
}
if (ss == null) {
System.out.println("unexpected situation. server socket null");
System.exit(3);
}
System.out.println("GC monitor thread started on port " +
ss.getLocalPort() + ". Connect to this port to run GC.");
while (alive) {
Socket s = null;
try {
s = ss.accept();
} catch (IOException ioe) {
System.out.println("GC monitor thread got an exception in accept.");
ioe.printStackTrace();
System.exit(4);
}
if (s == null) {
System.out.println("accept returned a null Socket.");
System.exit(5);
}
System.out.print("Performing System.gc()...");
System.out.flush();
System.gc();
System.out.println("done.");
try {
s.close();
} catch (IOException ioe) {
System.out.println("error while closing socket.");
ioe.printStackTrace();
}
}
}
}
}
Vincent

It doesn't seem like it makes much sense for intern() to be used... I find it
does save a lot on memory, if you are reading values from a file or database and
expect the same values to appear multiple times, but this is not that use case.
In any case, the typical usage pattern is Logger.getLogger() is called once and
the returned value is kept in a static variable.
I don't see any other use of String.intern() in mainline, which is good.

So, this is the code for CategoryKey:
class CategoryKey {
String name;
int hashCache;
CategoryKey(String name) {
this.name = name.intern();
hashCache = name.hashCode();
}
final
public
int hashCode() {
return hashCache;
}
final
public
boolean equals(Object rArg) {
if(this == rArg)
return true;
if(rArg != null && CategoryKey.class == rArg.getClass())
return name == ((CategoryKey)rArg ).name;
else
return false;
}
}
I think we are using the String.intern() call so that we can do the == between
the two CategoryKey objects. Theoretically the String.intern() call will
return the same String object for the same string value. I am assuming that
using this instead of String.equals() is the speed increase. But I don't
understand why this is really an issue. Only one version of the String should
be added/created in the StringTable. Once a given logger has been created, no
more entries to the StringTable should be added related to that logger. Are
you creating that many unique Loggers over time?

I am not creating many unique loggers over time, but Logger.getLogger() will
still call new CategoryKey() even thought I am accessing the same logger.
e.g. String.intern() will be called twice if I do the following:
Logger.getLogger("logger1");
Logger.getLogger("logger1");

Right. But this "should" not matter, as per the String Javadoc for intern():
"A pool of strings, initially empty, is maintained privately by the class
String.
When the intern method is invoked, if the pool already contains a string equal
to this String object as determined by the equals(Object) method, then the
string from the pool is returned. Otherwise, this String object is added to the
pool and a reference to this String object is returned.
It follows that for any two strings s and t, s.intern() == t.intern() is true
if and only if s.equals(t) is true."
So, for any given logger, only one String object should be in the StringTable,
not multiple. Unless String.equal() is returning not equal for Strings with
the same values? I wonder if this problem is related to CategoryKey or
something more to do with the "C:L" conversion pattern.

I can't see any problem with the use of intern here either.
Class CategoryKey is used in only one way: as the key to the hashtable of named
Logger objects maintained by a Hierarchy. And under normal patterns of use, even
a very large program is only likely to have a few hundreds of differently-named
Logger objects.
And as mark said, calling
String.intern("foo")
any number of times still creates just one entry in the interned-string table.
That's the whole point of intern!
Are you perhaps dynamically generating category names, eg by
Logger l = Logger.getLogger("some.category." + currentDateTime);
This would cause large numbers of distinctly-named Logger objects to be created,
and of course all those distinct names *would* clutter the interned-string
table. However it's a pretty odd way to use log4j..

I agree with both of you that in most cases, the String.intern() should not
cause any problems. I have to revisit my application to make sure I am not
creating so many unique loggers which led to my original problem. However, if
you using a small fixed size permanent heap, a few thousands different loggers
would start to create problem in the stringtable.

A few thousand distinct logger names implies one monstrous app. I wouldn't ever
expect to see something like that running with "a small heap". What kind of app
are you writing that has "thousands" of distinct logger names but only a small heap?
Note that the number of *loggers* is irrelevant. It's the *names* of those
loggers that go into the String.intern pool, so a thousand Logger objects with
the same name --> one entry in the intern pool.
In addition, the String.intern method says this:
"All literal strings and string-valued constant expressions are interned."
I've checked this and it is definitely true. So if these loggers are created like
Logger l = new Logger("some.literal.string")
then that constant string is automatically interned anyway; it happens when the
class is loaded. So the CategoryKey call to String.intern isn't adding any new
entries to the constant string pool; it simply locates the literal value that
was put there by the classloading process.
Only when the logger name is dynamically generated like:
l = new Logger("foo" + someParam)
will the ConstantKey's call to String.intern force the constant string pool to
get a new value. And that just isn't a common logging idiom.
I really think your initial problem diagnosis is wrong and you should look
again. You've clearly got a problem but I find it hard to believe it's due to
calls to String.intern. NB: this is all just my personal opinion of course.

NB: your example code does show that a problem can occur when many hundreds of
thousands of *unique* strings are generated and interned. But that isn't the
case for logging - unless you're generating logger names via some process like
Logger l = Logger.getLogger("some.category." + currentDateTime);
as I described earlier. If you are -- don't do that!

It sounds to me that getting rid of intern doesn't hurt performance considerably.
Consider these three cases:
1. You're doing:
Logger.getLogger("logger1");
...
Logger.getLogger("logger1");
Using a string constant here, "logger1" was already interned and
"logger1".equals("logger1") is fast. No need to explicitly do an intern()
2. You're evaluating the string, you're doing:
int i = 1;
Logger.getLogger("logger" + i);
...
Logger.getLogger("logger" + i);
In which case, the cost of intern() is MUCH more expensive than intern anyway,
so why bother?
3. You're storing the evaluated string and a combination of above. Again,
intern() is more expensive than equals() so why bother.
I don't understand the need for intern() at all. Sounds like a premature
optimization bug.

(In reply to comment #10)
> Logger.getLogger("logger1");
>
> Using a string constant here, "logger1" was already interned
true
> "logger1".equals("logger1") is fast. No need to explicitly do an intern()
I agree that the literal strings in code are automatically interned, and the
String.equals method should be checking for identity before bothering to do a
loop checking the string content [1]. There's still the overhead of a
(non-virtual) method invocation and return though.
However the main point of this is to optimise the comparisons executed as part
of Hashtable.get when looking up the logger from the map of all loggers. And
Hashtable.get may well do several *failed* comparisons before finding the right
logger to return.
The operation "string1" == "string2" is fast always; if the references are
different then the result is false.
However "string1".equals("string2") has a fair bit of overhead. The
String.equals method needs to do:
* if (param instanceof String)
* cast param to String
* if string lengths differ: return false
* otherwise compare each byte [1]
[1] or potentially compare hashcodes.
The reference comparison starts to look nice..
How many such non-equal comparisons occur in the Hashtable lookup? Well that
depends upon the hashcode collision rate inside the hashtable; obviously two
strings whose hashcodes cause them to be allocated to different buckets never
get compared. The java.util.HashMap class has a default loadFactory of 0.75. So
at a wild guess I would think only about 30% of lookups would hit a bucket with
more than one entry, and in 50% of those cases the right entry would be first.
So I agree this optimisation is not critical, but it could have a measurable
improvement.
> 2. You're evaluating the string, you're doing:
>
> int i = 1;
> Logger.getLogger("logger" + i);
> ...
> Logger.getLogger("logger" + i);
>
> In which case, the cost of intern() is MUCH more expensive than intern anyway,
> so why bother?
True. However such usage really is a little bizarre.
>
> 3. You're storing the evaluated string and a combination of above. Again,
> intern() is more expensive than equals() so why bother.
>
> I don't understand the need for intern() at all. Sounds like a premature
> optimization bug.
A very common pattern is:
Logger.getLogger(someClass.getName());
I've checked with Sun java 1.3.1 and 1.5 and the string returned by
someClass.getName does get interned automatically like literal strings in the
code. The question is whether that behaviour should be relied on or not. I'm not
aware of anything in the java specs that mandate this string going into the
intern pool (unlike literals in the code which *must* go there) but I think it
likely that all Java implementations *will* do this; when the JVM is loading the
raw .class file into memory it needs to store the literal strings somewhere and
putting them in the intern pool seems as good a place as any.
So: first of all, string comparisons in the logger Hashtable lookup will not be
common (maybe 30% of lookups). And in all the scenarios where interning makes
sense, the strings seem to already be interned; comparing two equal interned
strings with .equals should be close to the speed of comparing them with ==.
Where the two strings are actually different (maybe 15% of lookups) there are
two cases:
* strings have different length (most of the time)
--> minor win for the interned comparison
* strings have same length (pretty rare)
--> major win for the interned comparison
Summary: From a theoretical point I tend to agree that this use of String.intern
won't have much effect and could have been omitted. Of course actually testing
this might be wise -- real stats win over theory any day :-).
But on the other hand, String.intern only does active harm when the user is
generating large numbers of logger names dynamically - and I can't see any sane
reason for anyone to do that.
[1] Actually, GNU Classpath doesn't! Currently comparison of a string object
with itself does a reasonable amount of work (though not a scan of the actual
string data). I've sent off an email querying this.

The String.intern (potentially mal-) optimization appears to be the only reason that CategoryKey exists.
Switching to a String-keyed hashtable would eliminate the need to create and reclaim CategoryKey objects
on the calls to getLogger(). Unfortunately, Hierarchy.Hashtable is package-visible, so it might be
necessary to maintain a distinct string-keyed hashmap/hashtable in preference and preserve ht for classes
squatting in our package.

This is ASF Bugzilla: the Apache Software Foundation bug system. In case
of problems with the functioning of ASF Bugzilla, please contact
bugzilla-admin@apache.org.
Please Note: this e-mail address is only for reporting problems
with ASF Bugzilla. Mail about any other subject will be silently
ignored.