(4) Study Spark (Dataframes, RDDs, etc) for example with the Oreilly book "Learning Spark". I find that it always helps to understand how something works under the hood. The same holds for SparkR you can easily find some videos about Youtube to understand how it works under the hood, especially the distributed character of SparkR + Spark.