This post describes a useful application of functional programming. Parallel processes on Windows don't have access to the global environment. Closures can encapsulate dependencies in the functions you pass, saving you from missing function errors.

.

Your situation:

you have a big data frame

you want to apply a (pretty complex) function to each row

you are on a Windows server

For example, you know baby names are much cooler when they have no vowels and no uppercase letters.

But babynames has 1,858,689 rows! It may take a long time to process everything. So your first idea is to start parallel processes, split the data frame, parLapply the function to each table in the split list, then bind_rows them back together.

On Mac, you can choose between parLapply and mclapply, but on Windows you only have parLapply. parLapply creates processes that don’t have access to the global environment (see this Stack Overflow Q for technical details on parallel processing on Windows vs Mac), so add_bang isn’t passed to the clusters that are trying to run mutate_names. They don’t know anything other than what you give them (they also don’t know where to find mutate unless you use dplyr::mutate or library(dplyr) inside the function). (Even then, if your libraries aren’t on the search path, you can’t find them no matter what, but I haven’t solved that problem yet.)

One option is to use parallelsugar (via nathanvan), which approximates the mclapply function on Mac that has access to the global environment. parallelsugar works by using parallel::clusterExport to export objects to the child processes. However, if your data frame is large (or if you have other large things in the global environment, or lots of unrelated packages), it can take forever to set up the clusters.

If you like, you can use the same logic to pick and choose a few things to export until it starts to work. Here we could use:

But you don’t want to keep track of all the dependencies every time you write a new function, and you can’t do it when you write a wrapper to parLapply that takes a function like mutate_names as an input (because the wrapper function won’t know what the argument function depends on).

In this case, you can’t dynamically clusterExport things until you get it right without going with the full parallelsugar approach (passing everything in globalenv, big and small). But the closure passes exactly what you need!

(N.B. the tryCatch keeps you from hanging child processes without any way to kill them.)