Memory Utilization Question

I am trying to get an idea of how much memory will be used by our application in a particular scenario. For this I have written some multi-threading test code. Code segment below is executed by each thread (it is in run() method). You can notice that I am creating 100 new BitSet objects (of size 50 bits each) and persist these in database. Outside the for loop, showMemoryUsage() method is called which prints how much memory is utilized after creating these objects in memory:

As you can see the memory utilization goes up with more threads. I have noticed that increasing or decreasing the size of bitmaps (from 50 to 100 or to 10) does not make much difference. But reducing the number of BitSet objects from 100 to 10 or 200 makes huge difference. So it is the number of Java objects in memory that's critical.

I have this question if you can help:

Q - We know that we will be creating a lot of java objects to perform search operations in our app. But our idea was to keep objects local to methods to ensure small scope for them. But still I think it may not make a lot of difference because we will be dependent on garbage collector to visit and free "out of scope" local objects. So, even with objects with very small scope, we will need to allocate enough memory to hold all of them together for all threads in the worst case to get even performance. Is my understanding correct?

Yes,Even if the scope of the object is small, it has to be picked by the garbage collector. The best that can be done is to have small scopes so it is marked for GC. Which is the way it works. If you have a lot of concurrent threads working then you will have to plan that. The best way is to size your heap is with some tests with verboseGC on and study your gc logs,their pattern,GC frequency etc.

One thing to remember about GC, is that it is designed to be lazy. If there is space available in the heap, it won't run, even if there are lots of objects to be GCed. The reason to do this is 2 fold
a) GC is a costly operation. Just scanning the references in the object tree is costly
b) Running GC very often leads to fragmentation of the heap. It's better for GC to wait till a lot of objects need to be cleared/moved, and then it does a mega cleanup that leads to bigger free blocks

So, yeah, if you increase your number of threads, you can expect heap size to go up. However, that is not necessarily a bad thing. If you take an application that is doing some heavy processing, and look at heap usage in a tool like JConsole, you will see your memory usage keep going up till it reaches max memory, at which point GC triggers and releases memory and the memory usage drops down. It sort of looks like a right angled triangle. . The slope of the right angled triangle depends on how much memory you are using. The more the number of threads/memory usage per thread, the steeper the slope. If you run a memory intensive application for a while, the memory usage looks like a saw toothed blade

In your test application there, you are using only 7MB of memory per thread. It's highly unlikely that your heap is going to get full. In periods of low memory activity, GC runs periodically. It's highly unlikely that GC is ever going to run in the 2 seconds it takes to run your threads. Your observation that you never saw memory being released is spot on, and is entirely expected. I wouldn't be worried about what you see in your test.

If you are going to run a lot of threads, you should continue to limit the number of Java objects being created, within limits of reason, of course. You can get a lot of optimization in your code, if you pool large objects, rather than create/release them. However, pooling comes at it's own cost and complexity in code. You really want to pool objects that give you the most bang for the buck, rather than trying to pool every little object.