On Sat, 12 Aug 2006, Francis Cianfrocca wrote:
> With all the times I've reinvented this wheel, I've never tried
> storing millions of data elements each in its own file. Do you have
> performance metrics you can share? Did you tune it for a particular
> filesystem?
Not really. I have not stored the results of any tests to date. The
largest test, however, created a million item cache, and I had no
problems.
The main consideration if one were planning on storing large numbers of
elements is making sure that the filesystem is built with large numbers of
small files taken into consideration.
However, a lot of files can fit into a surprisingly non-messy directory
structure.
Consider an SHA512 hash for a key that has the first few characters of:
dd232b224979c0e
If I am running a cache with a bucket depth of 2, and a bucket width of 2,
the file for that cache is going to go into the directory
dd/23/
Given an even distribution of hashes, with a million records, one should
have 256 directories at the top level, 256 at the second level, and 15 or
16 files under each of those.
That's pretty easy to handle efficiently.
Kirk Haines