README

doRedis: A simple parallel backend for foreach using Redis.
See https://raw.githubusercontent.com/bwlewis/doRedis/master/vignettes/doRedis.pdf
for the package vignette, which contains many technical details and examples.
See https://github.com/bwlewis/doRedis for the latest examples, bug fixes and
documentation.
Here is a copy of the compiled Windows binary package (32- and 64-bit). It will
soon also be available on CRAN:
http://bwlewis.github.io/doRedis/doRedis_1.1.1.zip
IMPORTANT NOTES
Users are encouraged to upgrade to Redis versions >= 2.6 that include server-side
Lua scripting. doRedis can take advantage of that to enable new features outlined
in the package vignette. Backwards compatibility with older versions of Redis
is maintained, however.
Set the following parameter in your redis.conf file before using doRedis:
timeout 0
DESCRIPTION
Steve Weston's foreach package is a remarkable parametric evaluation device for
the R language. Similarly to lapply-like functions, foreach maps expressions to
data and aggregates results. Even better, foreach lets you do this in parallel
across multiple CPU cores and computers. And even better yet, foreach
abstracts the parallel computing details away into modular back-end code. Code
written using foreach works sequentially in the absence of a parallel back-end,
and works uniformly across a growing variety of back ends. Think of foreach as
the lingua Franca of parallel computing for R.
Redis is a powerful, fast networked database with many innovative features,
among them a blocking stack-like data structure (Redis "lists"). This feature
makes Redis useful as a lightweight backend for parallel computing. The rredis
package provides a native R interface to Redis. The doRedis package defines a
simple parallel backend for foreach that uses Redis.
Here is a quick example procedure for experimenting with doRedis:
1. Install Redis on your computer.
2. Install foreach, rredis and doRedis packages.
3. Start the redis server running (see the redis documentation). We assume
that the server is running on the host "localhost" and port 6379 (the
default Redis port). We assume in the examples below that the worker R
processes and the master are running on the same machine. In practice,
they can of course run across a network.
4. Open one or more R sessions that will act as back-end worker processes.
Run the following in each session:
require('doRedis')
redisWorker('jobs')
(The R session will display status messages but otherwise block for work.)
Note: You can add more workers to a work queue at any time. Also note
that each back-end worker may advertise for work on multiple queues
simultaneously (see the documentation and examples).
5. Open another R session that will act as the master process. Run the
following example (a simple sampling approximation of pi):
require('doRedis')
registerDoRedis('jobs')
foreach(j=1:10,.combine=sum,.multicombine=TRUE) %dopar%
4*sum((runif(1000000)^2 + runif(1000000)^2)<1)/10000000
removeQueue('jobs')
DISCUSSION
The "jobs" parameter of the redisWorker and registerDoRedis function specifies
a Redis "list" that will be used to transfer data between the master and worker
processes. Think of this name as a reference to a job queue. You are free to
configure multiple queues.
The doRedis parallel backend supports dynamic pools of back-end workers. New
workers may be added to work queues at any time and can be immediately used by
in-flight foreach computations.
The doRedis backend accepts a parameter called "chunkSize" that sets the
number of function evaluations to be doled out per job. The default value
is one. Increasing chunkSize can improve performance greatly for quick-running
function evaluations. Here is an example that sets the chunkSize to 100:
foreach(j=1:5, .options.redis=list(chunkSize=100)) %dopar% ...
Setting chunkSize too large will adversely impact load-balancing across the
workers.
The redisWorker function is used to manually invoke worker processes that
advertise for jobs on one or more queues. The function also has parameters
for a Redis host and port number. For example, if the Redis server is
running on a host called "Cazart" with the default Redis port 6379:
redisWorker('jobs', host='Cazart', port=6379)
The registerDoRedis function also contains host and port parameters. Neither
the worker nor master R session needs to be running on the same machine as
the Redis server.
The startLocalWorkers function invokes one or more background R worker
processes on the local machine (using the redisWorker function). It's
a convenient way to invoke several workers at once.
Workers self-terminate when their work queues have been deleted with the
removeQueue function.