Spark application development work flow in scala

Spark application development work flow in scala

Hi,

What's a typical work flow of spark application development in scala?

One option is to write a scala application with a main function, and keep executing the app after every development change. Given a big overhead of a moderately sized development data, this could mean slow iterations.

Another option is to somehow initialize the data in REPL, and keep the development inside REPL. This would mean faster development iterations, however, it's not clear to me how to keep the code in sync with REPL. Do you just copy/paste the code into REPL, or is it possible to compile the code into jar, and keep reloading the jar in REPL?

Re: Spark application development work flow in scala

I typically use the main method and test driven approach, for most simple application that works out pretty well. Another technique is to create a jar containing the complex functionality and test it. Create another jar just for streaming/processing that hooks into it and handles all the data flow. Then integrate the two. None of this feels like a production development process :)

One option is to write a scala application with a main function, and keep executing the app after every development change. Given a big overhead of a moderately sized development data, this could mean slow iterations.

Another option is to somehow initialize the data in REPL, and keep the development inside REPL. This would mean faster development iterations, however, it's not clear to me how to keep the code in sync with REPL. Do you just copy/paste the code into REPL, or is it possible to compile the code into jar, and keep reloading the jar in REPL?

Re: Spark application development work flow in scala

d of a moderately sized development data, this could mean slow iterations.

Another option is to somehow initialize the data in REPL, and keep the development inside REPL. This would mean faster development iterations, however, it's not clear to me how to keep the code in sync with REPL. Do you just copy/paste the code into REPL, or is it possible to compile the code into jar, and keep reloading the jar in REPL?

I have been using Vim + Conque Shell for this purpose. With this plugin you can split your Vim window to two, and run spark shell in one of them. You can write your code in the other editor window and execute it line by line.