Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

8.
Spark contd.
• Control placement of partitions of RDD
– can specify number of partitions
– can partition based on a key in each record
• useful in joins
• In-memory storage
– Up to 100X speedup over Hadoop for iterative
applications
• Spark can run on Hadoop YARN and read files
from HDFS
• Spark is coded using Scala
DOS Lab, IIT Madras

16.
Spark: Filter transformation in RDD
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line =>line.contains("a"))
Here is a example of filter
Transformation, you can
notice that the filter method
will be applied on each line
and return a new RDD
test
Give me those lines which contains ‘a’
Here is a example of filter
Transformation, you can
notice that the filter method
will be applied on each line
and return a new RDD
DOS Lab, IIT Madras

17.
Count
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(
line =>line.contains("a"))
numAs.count()
5
Here is a example of filter
Transformation, you can
notice that the filter method
will be applied on each line
and return a new RDD
test
DOS Lab, IIT Madras