Friday, August 12, 2016

I was following the Spark example to load data from MySQL database. See "http://spark.apache.org/examples.html"

There was an error upon executing:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 233, ip-172-22-11-249.ap-southeast-1.compute.internal): java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver
To force Spark to load the "com.mysql.jdbc.Driver", add the following option as highlighted below

Friday, August 5, 2016

There are errors related to the lack of permissions in the EMR_EC2_DefaultRole whenever I launch a Amazon EMR cluster. After some searching on the support forum, the default EMR role may not be created automatically for you. Hence, I removed the old default role and created new one as follows:

Thursday, July 7, 2016

Apache Zeppelin + Apache Spark is a perfect match. Basically, you can do the following in one console:

Data Ingestion

Data Discovery

Data Analytics

Data Visualization & Collaboration

As it's still under incubation, the error handling is still not as rock solid. Often, I have experienced Spark jobs being stuck for long time. Usually, restarting the Spark interpreter should do the trick. However, there are times that this simple trick won't work and the only way is to restart the Zeppelin daemon. On Amazon EMR console, do the following:

/usr/lib/zeppelin/bin/zeppelin-daemon.sh stop

/usr/lib/zeppelin/bin/zeppelin-daemon.sh start

If you wish to execute the scripts in zepplin account, which has a nologin shell. Execute following instead:

If you encounter this Java connection error: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method), it's probably because Zeppelin starts the spark interpreter in a different process.

Tuesday, May 31, 2016

To use multiple JSON configurations when you launch the new Amazon EMR cluster, I want to configure Spark to use dynamic allocation of executors and store Zeppelin notebook on S3 storage. Rename the bold red below according to your S3 bucket location. In the following example, create the folder '/user/notebook' under your-s3-bucket. You'll see new note.json under the S3 folder, as you create new Zeppelin notebooks.