I have a Java HashMap with a String key and POJO value in a long-running application, and it's taking up a large chunk of memory (over 500mb, and this number is expected to grow - I'm guesstimating that it will exceed 2GB in two to three months); this is used to memoize the results of an expensive calculation (typically 2-4 seconds, but up to 20 seconds), so I'd like to offload the HashMap to the hard drive rather than replace it with a [Soft/Weak]HashMap with the expectation that external lookup will be less expensive than recalculation; I'd also like to make the map persistent in case the app crashes.

My only experience with NoSql databases has been with DynamoDB, but I'd like a freeware database rather than trying to restrict myself to DynamoDB's free tier.

The app is written in Java, so I'll need a Java API for the database

The app runs on a single machine, with no expectation of migrating to a distributed architecture

I prefer the database be strongly consistent, but eventual consistency is acceptable

The machine has a traditional (non-SSD) hard drive

The map's keys are strings (length < 40), and its values are POJOs; if need be I can serialize the POJOs to strings with Jackson before persisting them, though I'd prefer the database handle this

The POJOs belong to several different subclasses with a common abstract parent class; all of the fields are in the parent class (the subclasses only add/override methods, any fields they add are transient)

There aren't any security requirements - the data I'd be storing doesn't need to be password-protected or anything

The values in the database won't expire (I'll take care of stale values in application code - if POJO.someProperty != someOtherProperty then I recalculate the POJO)

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions asking us to recommend a tool, library or favorite off-site resource are off-topic for Programmers as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." – gnat, Bart van Ingen Schenau, Dan Pichelman, Martijn Pieters, Kilian Foth

3 Answers
3

I use Apache CouchDB often, accessing it through Ektorp. Ektorp uses Jackson JSON serialization and de-serialization natively. It makes reading and writing POJOs very easy. Each document's _id would likely be your hash map key.

I don't know what your algorithm is, but you might even benefit from using CouchDB's map-reduce views to implement all or part of the algorithm. The results of a view in CouchDB are persisted, and the database handles updating the results for you.

You could probably use MongoDB just as well, but I've not used it personally.

Also, Apache Cassandra is a key-value store based on Amazon's Dynamo and Google's BigTable papers. So it fits the map paradigm well. It does not store data as JSON, but you can do this yourself by saving the data as a JSON string. It is similar to DynamoDB, but is freely available as open source.

You just need a cache with disk persistence. EhCache is a glorified HashMap that supports disk persistence with a host of configuration options. It will probably be the simplest option because your code will change very little.