Sunday, March 5, 2017

Developing and Running a Spark WordCount Application written in Scala :

Apache Spark runs on Hadoop, Mesos, standalone, or in the
cloud. It can access diverse data sources including HDFS, Cassandra,
HBase, and S3. You can run Spark using its standalone cluster mode, on
EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra,
HBase, Hive, Tachyon, and any Hadoop data source.

WordCount example reads text files and counts how often
words occur. The input is text files and the output is text files,
each line of which contains a word and the count of how often it
occured, separated by a space (" ").

Simple Build Tool (SBT)
is an open source build tool for Scala and Java projects, similar to
Java's Maven or Ant. Its main features are: native support for compiling
Scala code and integrating with many Scala frameworks. sbt is the de
facto build tool in the Scala community.

This
tutorial describes how to write, compile and run a simple Spark word
count application in scala language supported by Spark.

NOTE:
--class: The entry point for your wordcount application
--master: The master URL for the cluster/local (e.g. spark://23.195.26.187:7077)
---------------------------------------------Step 9: Verify the wordcount output file as mentioned in previous step.

These examples give a quick overview of the Spark API.
Spark is built on the concept of distributed datasets, which contain arbitrary Java or
Python objects. You create a dataset from external data, then apply parallel operations
to it.

Simple Build Tool (SBT) is an open source build tool for Scala and Java projects, similar to Java's Maven or Ant. Its main features are: native support for compiling Scala code and integrating with many Scala frameworks. sbt is the de facto build tool in the Scala community.

4) sbt run -Compiles your code, and runs the main class from your project, in the
same JVM as SBT. If your project has multiple main methods (or objects
that extend App), you’ll be prompted to select one to run.
----------------------------------------------------------------------------------------
spb@spb-VirtualBox:~/Scala_

Additionally , you could also make use of sbt console that Compiles the source code files in the project, puts them on the classpath, and starts the Scala interpreter (REPL). SBT has some interesting features that come in handy
during development, such as starting a Scala REPL with project classes
and dependencies on the classpath, continuous compilation and testing
with triggered execution, and much more.