Some Notes and Errors Encountered After Trying Out Apache Spark

January 12, 2015

I was playing around with Apache Spark a couple of weeks back. For those of you who are not familiar with Apache Spark, they have a really good documentation which you should read here. You should also check out UC Berkeley’s paper on RDD which will help you gain a deeper understanding on how Spark works. Don’t be daunted by the paper, it’s actually a good read.

Here are a few things I jotted down while working on it.

to assign a name to master, you will need to create conf/spark-env.sh and set that value for SPARK_MASTER. Basically, you just need to add this:

SPARK_MASTER_IP=local.paolo.com

You’ll be able to view the status of your cluster in in http://localhost:8080/

I have also encountered a few errors when I tried running my application. Here’s a list of the errors I’ve stumbled on together with steps on how I fixed it.

Initial job has not accepted any resources.

Check your cluster UI to ensure that workers are registered and have sufficient memory. I realized that I only have two cores I could connect too, and Spark Shell was already using it up. You can either update your config, or shutdown the other application using the resources. Source

java.lang.IllegalStateException: unread block data.

This is most likely caused by versioning issue. In my case, my Mac was running a different version of Scala (v2.11)compared to the one I downloaded (using Scala v2.10). To make this work, I just updated my Mac’s Scala version. I did this through:

I encountered this when I was trying to run my test application via eclipse. I have a running master in the background so I wanted my test to connect to it instead of spawining its own. To make this work I had to create a fat jar containing all the dependencies. Using Shade plugin from maven will help fix the akka issue too. Source

ERROR ContextCleaner: Error in cleaning thread java.lang.InterruptedException. This can be safely ignored. Source

Hopefully, this will help someone out there who is also looking into Apache Spark.