When making code using BiocParallel that should allow some parallel computations on both Linux and Windows I noticed the following surprising behaviour (ultimately creating an error message):

Note, at this point I'm using Windows ! When setting/changing BPPARAM from MulticoreParam() to SnowParam() other functions previously declared may not be available any more. This happens only when a new function is declared within the bplapply command, finally an error message will appear.

In the end I'll switch BPPARAM according to the current platform detected as either MulticoreParam or to SnowParam, the rest of the code should remain the same.

So the workaround I see so far, consists in avoiding declaring new functions within bplapply() .

However, I thought sharing this (to me quite unexpected) behaviour might be useful on this list.

MulticoreParam() uses a 'shared memory' model where the workers share the memory of the calling parent, so automatically 'know' about functions that are defined in the manager R session.

SnowParam() starts separate processes that do not know about one another at all. It has rules for transferring objects from the manager environment to the worker environment. To understand the rules, one needs to know that every R symbol is defined in an environment, and that environments have 'parent' (possibly empty) environments. Working at the R prompt, one is in the .GlobalEnv environment. The rule is to NOT export symbols in the global environment to the workers. So

This works, because the rule is that symbols defined in the environment (other than the global environment) where bplapply() is invoked (the body of each function, e.g., f(), represents an environment; the parent of the environment is the environment in which the function was defined, e.g., the parent environment of f() is the global environment) are forwarded to the worker.

also works -- bplapply exports the environment g(), and the parent environment of g() (i.e., the environment f()), but not the parent environment of f() (the global environment).

The reason for 'stopping' at the global environment also illustrates a potential hazard. The global environment frequently contains many and sometimes large symbols irrelevant to the calculation, so it would be inefficient to export all of these. Note though that with

I have been trying to understand and read several posts about sending data (objects, functions, whatever) to workers. And I just can't seem to get it. It seems to be that the way that it is explained always is just impenetrable .... I have read about environments etc. I have a situation where I have a function that uses parallel processing inside it. So obviously you want to pass data, arguments etc from the function call to the workers. I have ended up writing temporary files in the "main" part of the function (with a defined file name) that are loaded in by the workers, but surely this cannot be the optimal way...