Running R at Scale on Google Compute Engine

1 hour 30 minutes9 Credits

GSP134

Overview

This QwikLab shows you how to run R scripts on multiple physical nodes in the Google Cloud Platform (GCP). R is an open source programming language that's used extensively by statisticians and economists for modeling and data visualization. Many of these models require serious memory and computational power—far more than what's available on a single node or virtual machine. In turn, computational clusters are used to aggregate memory and computation across tens to hundreds of nodes and thousands of computation cores. This tutorial shows you how to leverage computational clusters with R so you can start scaling your own analytic models.

R has a number of packages that make it easy to program a cluster of nodes for your modeling
and analytics:

This lab uses Rmpi largely because it supports a number of different libraries. With Rmpi, an R developer uses high-performance computing (HPC) clusters and workload managers to submit a job. The job consists of an R script that uses the Rmpi interface to create processes across the nodes in the cluster, and to send and receive messages across those nodes.