I have a wide scope of interests in IT, which includes hyper-v private cloud, remote desktop services, server clustering, PKI, network security, routing & switching, enterprise network management, MPLS VPN on enterprise network etc. Started this blog for my quick reference and to share technical knowledge with our team members.

Thursday, July 7, 2016

Unstuck Spark/Zeppelin Jobs on Amazon EMR

Apache Zeppelin + Apache Spark is a perfect match. Basically, you can do the following in one console:

Data Ingestion

Data Discovery

Data Analytics

Data Visualization & Collaboration

As it's still under incubation, the error handling is still not as rock solid. Often, I have experienced Spark jobs being stuck for long time. Usually, restarting the Spark interpreter should do the trick. However, there are times that this simple trick won't work and the only way is to restart the Zeppelin daemon. On Amazon EMR console, do the following:

/usr/lib/zeppelin/bin/zeppelin-daemon.sh stop

/usr/lib/zeppelin/bin/zeppelin-daemon.sh start

If you wish to execute the scripts in zepplin account, which has a nologin shell. Execute following instead:

If you encounter this Java connection error: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method), it's probably because Zeppelin starts the spark interpreter in a different process.