ragowthaman has asked for the
wisdom of the Perl Monks concerning the following question:

Hi Monks,
I am trying to load millions of records into a Hash. I tried to lean it up, still its couple Gigs of data. So, wondering, how do i manage it. Someone suggested tie hash. But, wondering, how can I use that.......ANY tutorials or pointers to that?
Or any other suggestions....will be very much appreciated.
Gowthaman

If using Perl's hashes means you are running out of memory (or into swapping) then using a file based hash (eg. BerkeleyDB ) will get you around that problem, but will be slow. Slower to access (compared to memory-based hashes); much slower to construct.

Depending what you are doing with the hash, there are sometimes alternatives that provide some functionality similar to a hash that are more compact that can avoid moving to disk-based. But you'd need to describe the data, and the use you are making of the hash.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

If you're finding that you have a "doesn't scale well" problem (ie, your hash or array are simply getting too large, and possibly could even get larger), one option is to move to a database solution. You're already pondering that option when you talk about 'tie hash'; that's probably going to lead you to a hash tied to a database. But you may just find that rethinking your needs and designing a solution around a database while forgetting about the hash altogether will give you a better approach.

Of course this is speculation; we don't know what you're really trying to do, and thus can only take a blind stab at how to help. But a lightweight database such as SQLite can really help when you find that it's just not practical to slurp a big hash into memory.

Something else to think about: do you need to store all your data at once?

If your script is going to be called more than once and each invocation only needs to access part of the data, perhaps you can filter the data somehow before loading it into a hash for further processing.

Without knowing more about the detail of your application it's hard to tell whether this approach may be usable.