We experience 5 minute shutdown times on 10GB BDB (version 4.1.17), and trying to see if we can reduce this time to under 2 minutes
I have looked around and did not find anything specifically of help. So questions along this line

-- What exactly causes this delay during shutdown? I assume its mainly writing back all the dirtied nodes back to disk.
-- Does some log cleaning also happen to improve space utilization? In most cases, we bring up environment right away during a server bounce. Hence, such an overhead is actually not helping.

In other words, we just want to flush all dirtied nodes in memory changes to disk, during shutdown and be able to get back up again fast. Is there some option/configuration we can leverage?

Please take some thread dumps during the shutdown, to see where the bulk of the 5 min is being spent. With that information, I can give you better advice. It may not be log cleaning, it may be the final checkpoint.

I can believe that the checkpoint may take a while. But it is strange if you see that exact stack trace often -- removing an item from the hash map in the dirty set -- but maybe that's not what you mean?

The last checkpoint won't be very different than other checkpoints, so the thing to look at is why checkpoints are taking a long time in general.

Are you using cleaner lazy migration (which is on by default in JE 4.1 and earlier)? If so, that's the first thing to do: turn it off. With lazy migration, much of the work of log cleaning is done by the checkpoint.

Also please look at your configured checkpoint interval (env config), and the actual checkpoint interval -- "DbPrintLog -S" will list the checkpoint intervals, and you can use the "-s 0xFILENUMBER" option to only look at the last 10 or 20 files (to speed up the run).

This is a problem with our old je 4.0.92 servers, where the cleaner_lazy_migration did not exist I think. What is the behaviour on those versions? I will be upgrading our server to 4.1.17 shortly and turning off lazy_cleaner_migration though.

Sorry, my question wasn't clear. If you take a number of thread dumps, how often do you see this method on the stack (including methods it calls of course)? Earlier you implied that it was often, but you didn't actually say that, or how often. The majority of the time? Also note that this is not related to cleaner lazy migration.

The cleaner lazy migration feature did exist in JE 4.0.92, but was always on (there was no config option). To be more accurate, it was always on when high priority checkpointing was off, which is the default. The lazy migration config option was added in 4.0.117.

Do you expect cleaner lazy migration=off will significantly improve the shutdown time?

Yes. I'm not sure how to say that emphatically enough. Lazy migration was useful in the distant past, but slows down checkpoints significantly, which has all kinds of negative side effects. It should no longer be used. I wish we changed the config setting default to false much sooner than we did (in JE 5).

I've noticed this frequently as well (DirtyINMap.removeNextNode() in stacktraces during checkpointing), and I have long checkpointing times, so I'd be very interested in any fix you come up with for this. (This is under JE 4.1.17.)

I also saw in the implementation that DirtyINMap.removeNextNode() allocates an iterator for one-time use on each call, but I don't think that's the inefficiency, since for me these stacktraces are always inside java.util.HashMap$HashIterator.remove(). That implies to me that we are in a case in which not much else is happening during the loop, which I think means that we are failing the currentLevelVal <= maxFlushLevel test in Checkpointer.flushDirtyNodes(). I wonder if that test could somehow be done above the removeNextNode call. Just a thought.

Also, just to clarify: Turning off cleaner lazy migration helps with checkpointing times, but it's a separate issue from this, right? So in the case of lazy migration slowing down checkpointing, is there a distinctive way that would show up in stacktraces?

One more observation (since I just had the opportunity to watch this again carefully): During the checkpointing, though the stacktraces almost always lead to this HashMap method, it is managing to write to the filesystem the whole time (at a gradually decreasing rate). That is, there are no noticeable periods during which it stops writing, via monitoring of I/O activity at the OS level. That seems to imply that it's not iterating through an entire level with no work to do. Thought I'd pass this along, in case it tells you something.

Thanks. We are testing a fix (Charles Lamb is doing this) and will be getting back to you shortly. The problem is an inefficiency with the removal of elements from a large HashMap and we believe the solution is simply to replace the HashMap with a TreeMap. If you'd like to try that out yourself, as an additional test to confirm this, you can do the following (this is from Charlie):

Go to com.sleepycat.je.recovery.DirtyINMap.addIN, and change the assignment to nodeMap from an HashMap to a TreeMap (i.e. s/Hash/Tree/).