We have a Java ETL code which was written using Hashmap to pull some records from a sql databse & upload into an Oracle database. The number of records pulled in an hour is around 500,000 records. Once the number of records exceeds 3 million the heap size is full & the ETL fails saying the java heap memory is full. I see that there are options to re-write the java ETL using java arraylist instead of Hashmap which might consume less memory but I am nnot sure if it will handle such a huge data like we have here. In a day around 30 million records are collected & we have to find a way to accommodate this via our ETL.

The maximum heap size recommended by the vendor (application vendor of our app) & oracle is 4GB & we have already increased the heap memory to 4GB. Oracle version is 11g.

Note: scheduling the ETL to run every 1 hour is also not an option as in an hour sometimes it pulls in so much records that the heap memory gets filled in again.

Collecting all the information in memory is a huge design flaw, no matter which collection you use. You need to load that stuff in batches or even better: stream it from point A to point B so you use almost no memory at all.

running the ETL (Extract Transform Load) program in batches ie; running it every few hours doesn't solve the problem as the number of records still grows large & in a sporadic manner. It is not possible to stream it from point A to B directly because our application (BMC - Capacity Optimization) requires these data in a specified format. There is no escape from using either hashtable or hashmap or linkedlist or array-list.

rakshan wrote:
There is no escape from using either hashtable or hashmap or linkedlist or array-list.

Lets suppose that you have an API that takes a Hashtable.

So you can create a child class from Hashtable, create overloads for the methods of that class, and have your child class get its information from a file, completely surpassing all of the functionality in the Java API Hashtable class (this implementation would use almost zero memory.) So yes it is in fact possible to do something entirely different.

You need to start focusing on the problem that needs to be solved rather than focusing on the existing solution. And the reason for that is simple. The current solution has reached the maximum memory - due to growth. If you seek a solution that only frees up a small amount of additional memory then you will face the SAME failure in the near future.

You might be able to create a temporary solution by creating a custom Hashmap however that depends on the data and how it is being stored. Such a solution might provide a small amount of additional memory. But you must ONLY undertake that as a solution if you understand that it might not succeed and it will only provide a solution for a short time.
You can probably use multiple processors where each one processes a portion of the total input.