Beginner Guide For Spark

In this eBook we will be discussing the basics of Spark's functionality and its installation.
Apache spark is a cluster computing framework which runs on Hadoop and handles diﬀerent types of data. It is a one stop solution to many problems. Spark has rich resources for handling the data and most importantly, it is 10-20x faster than Hadoop's MapReduce. It attains this speed of computation by its in-memory primitives. The data is cached and is present in the memory (RAM) and performs all the computations in-memory.