I am trying to find a simple way to use something like Perl's hash functions in R (essentially caching), as I intended to do both Perl-style hashing and write my own memoisation of calculations. However, others have beaten me to the punch and have packages for memoisation. The more I dig, the more I find, e.g.memoise and R.cache, but differences aren't readily clear. In addition, it's not clear how else one can get Perl-style hashes (or Python-style dictionaries) and write one's own memoization, other than to use the hash package, which doesn't seem to underpin the two memoization packages.

Since I can find no information on CRAN or elsewhere to distinguish between the options, perhaps this should be a community wiki question on SO: What are the options for memoization and caching in R, and what are their differences?

As a basis for comparison, here is a list of the options I've found. Also, it seems to me that all depend on hashing, so I'll note the hashing options as well. Key/value storage is somewhat related, but opens a huge can of worms regarding DB systems (e.g. BerkeleyDB, Redis, MemcacheDB and scores of others).

Other

Base R supports: named vectors and lists, row and column names of data frames, and names of items in environments. It seems to me that using a list is a bit of a kludge. (There's also pairlist, but it is deprecated.)

The data.table package supports rapid lookups of elements in a data table.

Use case

Although I'm mostly interested in knowing the options, I have two basic use cases that arise:

Caching: Simple counting of strings. [Note: This isn't for NLP, but general use, so NLP libraries are overkill; tables are inadequate because I prefer not to wait until the entire set of strings are loaded into memory. Perl-style hashes are at the right level of utility.]

Memoization of monstrous calculations.

These really arise because I'm digging in to the profiling of some slooooow code and I'd really like to just count simple strings and see if I can speed up some calculations via memoization. Being able to hash the input values, even if I don't memoize, would let me see if memoization can help.

Puzzled about your comments re: environments. If you create a new environment it will be hashed. ?environment e.g., env.profile(new.env())$size # [1] 29
–
BondedDustAug 31 '11 at 20:01

@DWin: You are correct. I only mention it as an option for a hash capability.
–
IteratorAug 31 '11 at 20:05

1

This post, by the author of 'R in a Nutshell' includes speed tests of several different options for looking up objects, including putting them in an environment (where lookup uses hashed names) broadcast.oreilly.com/2010/03/lookup-performance-in-r.html . Don't know if it's useful to you, but thought I'd tack it on to this post for anyone else that comes along.
–
Josh O'BrienNov 3 '11 at 15:42

I did not have luck with memoise because it gave too deep recursive problem to some function of a packaged I tried with. With R.cache I had better luck. Following is more annotated code I adapted from R.cache documentation. The code shows different options to do caching.