Running Spark in Eclipse Using SBT

This article aims to provide a clear idea of running Spark application in Eclipse using SBT. SBT stands for Simple Build Tool, which is used for pulling the dependencies required to build the project. It is just like Maven, Ant or Gradle.

Before proceeding to running Spark we recommend you to install SBT. If you have not installed the above-mentioned tool, you can go through this link for a clear guidance on how to install SBT.

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

Once SBT is installed, you need to add the SBT-eclipse plugin details, so that you don’t have to repeat the same steps for every project you create. To do this, create a plugin directory inside SBT using the below syntax.

Steps to Create a Plugin Directory

mkdir -p .sbt/sbt-version/plugins

Example:

mkdir -p .sbt/0.13/plugins

Then, create a plugin.sbt file inside the .sbt/0.13/plugins directory and add the following details.

sudo gedit .sbt/0.13/plugins/plugins.sbt

After this, you have to create a project directory that will contain the Spark code and build the .sbt file.

Also, make sure that the main directory has sub-directories src/main/scala and .scala file inside this sub-directory.

mkdir sparkproj

Note:You can give any name to your project directory. Here, we have used sparkproj as project directory.

Change the path from current working directory to sparkproj.

cd sparkproj

Initially, this project directory will not contain anything. Now, as stated earlier, let’s create sub-directories src/main/scala in the same order.

mkdir -p src/main/scala

Inside the Scala directory, we need to create our Spark code ( .scala file). We have used the file name myspark.scala and this will contain the logic.

gedit myspark.scala

Let’s put the below logic in the file. The code is self-explanatory and easy to understand. We are finding the number of occurrences for the word sumit,satyam in the input file.

Now, save and close the .scala file, and revert to the main project directory to create a build.sbt file.

Build.sbt will contain all the dependencies required by our project. As we have already written a simple Spark code, we just need to add Spark dependencies. Below are the details that need to be present in the file.

Running Spark

As the Scala and Spark version in Acadgild spark VM is 2.10.4 and 1.6 respectively, same details are available in the build.sbt file. If you are using your own setup, make sure to fill the version details accordingly. Now we are ready to build the project using SBT-eclipse. Before we proceed, let’s cross-check the project structures once again.

Next, we need to go back to the main project (sparkproj) directory and fire the command sbt eclipse from the terminal to build it.

Note: When you run SBT for the first time, it will take a longer time to download the dependencies.

Once the project is built successfully, it can be imported to eclipse.

Next, Click Browse and select the main project (myproj) directory. Once the project is imported, wait for some time so that eclipse can build the workspace. Once done, run the code inside eclipse. After running spark, required results can be seen in the eclipse output console.

Hope this post will be helpful to understand running Spark in eclipse using SBT. In the case of any queries, feel free to comment below and we will get back to you at the earliest.
Keep visiting www.acadgild.com for more updates on the courses