Replication of Parallel in R

Introduction

Today I was studying maching learning. Ng mentioned about the usage of multicore in ML, which draw my attention. Because in another course, Bioinformatics Algorithms (Part 1), the calculation of clump finding is a complex project that it may spent hours to find correct result. So I slightly dug the parallel use in R by google. One of the interesting result is coming from r-bloggers. So I repeated Daniel’s code on my machine.

parallel(multicore)

This is one of the most popular packages of parallel computing. Since R 2.14.0, it had been included in R and called Parallel. Daniel’s code still use the old name, so I had to rename all multicore as parallel.

snow

The snow (Simple Network of Workstations) package by Tierney et al. can use PVM, MPI, NWS as well as direct networking sockets. It provides an abstraction layer by hiding the communications details. The snowFT package provides fault-tolerance extensions to snow.

snowfall

The snowfall package by Knaus provides a more recent alternative to snow. Functions can be used in sequential or parallel mode.

Conclusion

According to this result, we should use default function such as lapply until the loop number is bigger than a hundred thousand. Also as the data size increase, parallel computing is relatively more cheap.