We are trying to migrate a 4.0.92 database to 5.0.34 (with duplicates), but it keeps running out of memory in the migration process (when we first try to open it with the new version). We already gave it 60GB but it's not enough.
The database is roughly 2TB big, 225M entries, ~4KB value size in average (I know it's too big, we are trying to make it smaller but first we need an efficient way to iterate over all the entries, currently it is veeeery slow, we hope DiskOrderedCursor would make it much faster).
As far as I can tell, it is trying to preload all the internal nodes in memory, but our key size is 30 bytes long at most, so why is not 60GB enough?
This is the relevant part of the stack trace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:477)
at java.util.HashMap.addEntry(HashMap.java:768)
at java.util.HashMap.put(HashMap.java:402)
at com.sleepycat.je.dbi.SortedLSNTreeWalker.addEntryToLsnMap(SortedLSNTreeWalker.java:677)
at com.sleepycat.je.dbi.SortedLSNTreeWalker.addToLsnINMap(SortedLSNTreeWalker.java:662)
at com.sleepycat.je.dbi.SortedLSNTreeWalker.accumulateLSNs(SortedLSNTreeWalker.java:442)
at com.sleepycat.je.dbi.SortedLSNTreeWalker.fetchAndProcessLSN(SortedLSNTreeWalker.java:527)
at com.sleepycat.je.dbi.SortedLSNTreeWalker.processAccumulatedLSNs(SortedLSNTreeWalker.java:353)
at com.sleepycat.je.dbi.SortedLSNTreeWalker.walkInternal(SortedLSNTreeWalker.java:341)
at com.sleepycat.je.dbi.EnvironmentImpl$PreloadLSNTreeWalker.walk(EnvironmentImpl.java:3131)
at com.sleepycat.je.dbi.EnvironmentImpl.preload(EnvironmentImpl.java:2985)
at com.sleepycat.je.tree.dupConvert.DupConvert.preloadAllDatabases(DupConvert.java:210)
at com.sleepycat.je.tree.dupConvert.DupConvert.convertDatabases(DupConvert.java:151)
at com.sleepycat.je.dbi.EnvironmentImpl.convertDupDatabases(EnvironmentImpl.java:1758)
at com.sleepycat.je.dbi.EnvironmentImpl.finishInit(EnvironmentImpl.java:676)
at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:210)
at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:246)
at com.sleepycat.je.Environment.<init>(Environment.java:227)
at com.sleepycat.je.Environment.<init>(Environment.java:170)

Although it is only loading the internal nodes, in duplicates DBs the internal nodes contain both the key and the data. In any case, obviously you don't have enough memory to preload all duplicate DBs.

This note from the change log has a suggestion that should help:
>

To make the conversion predictable during deployment, users should measure the conversion time on a non-production system before upgrading a deployed system. When duplicates are converted, the Btree internal nodes are preloaded into the JE cache. A new configuration option, EnvironmentConfig.ENV_DUP_CONVERT_PRELOAD_ALL, can be set to false to optimize this process if the cache is not large enough to hold the internal nodes for all databases. For more information, see the javadoc for this property. [#19165]
>

Thanks Mark for the answer, I forgot to mention that I already tried setting that flag to false with no luck.

With the old log format (pre version 8, introduced in 5.0.34), I thought DIN and DBIN didn't have the entry values. This is taken from the DIN dupkey attribute javadoc: "Full key for this set of duplicates. For example, if the tree contains k1/d1, k1/d2, k1/d3, the dupKey = k1". And the same attribute is present in DBIN. Can someone clarify why the "internal" values would contain the entry values for databases with duplicates?

Is there any other way of "migrating" the database information? Should I export/import? (as I mentioned before, export is really slow because it relies on key-oredered traversal)
Thanks,

Diego

If it's of any help, this is a fragment of the heap histogram right before running out of memory:

>
With the old log format (pre version 8, introduced in 5.0.34), I thought DIN and DBIN didn't have the entry values. This is taken from the DIN dupkey attribute javadoc: "Full key for this set of duplicates. For example, if the tree contains k1/d1, k1/d2, k1/d3, the dupKey = k1". And the same attribute is present in DBIN. Can someone clarify why the "internal" values would contain the entry values for databases with duplicates?
>

In the old log format, the key of each slot in a DBIN is the record data. In other words, the record data is in the tree. But this is an academic issue. It is not going to lead to a solution.

>
Is there any other way of "migrating" the database information? Should I export/import? (as I mentioned before, export is really slow because it relies on key-oredered traversal)
>

You are running out of memory in the preload phase and most of the memory is allocated by the preload process, not the Btree. I will think about possible solutions and get back to you later today.

I think the solution is for us to give you a way to specify the PreloadConfig that is used during the duplicate conversion process. Please send me email, mark.hayes at o.com (o == oracle), so I can work with you on this.

If anyone is still running into this problem, please be sure to use the new 5.0.48 release and the JE 4.1.20 pre-upgrade utility. If you still have out-of-memory problems during the upgrade to JE 5, post here and I will suggest solutions.

Note that DiskOrderedCursorConfig.setInternalMemoryLimitVoid is a good solution if you're using a DiskOrderedCursor -- thanks for posting this -- but it doesn't apply to an out-of-memory error that occurs while you're upgrading to JE 5 (in the Environment constructor).