README.md

Spark junit

Example of Spark tests using junit.

To create tests for Spark code is not an easy task, specially if it is a spark streaming. I built some base classes to make it easier using junit. The sample code uses Gradle to manage build and dependencies.

Spark batch

To run spark batch tests you need to start a local SparkContext at the beginning of all tests and stop it at the end.

Your tests should first create a RDD using the parallelize method from the SparkContext. If your code is based on DataFrames, you can use the method toDF, available when you imported the SQLContext implicits.

Spark Streaming

To create a test for a Spark Streaming code is much more difficult than for batch tests. The main challenge is to create a manageable clock where you can advance the virtual time, so the code executes as the tests run. Then we create a queue, in order to simulate the streaming order.

I created 2 classes that overwrite the regular streaming behavior: org.apache.spark.ClockWrapper and org.apache.spark.streaming.StreamingContextWrapper. Another class that creates a DStream based on a queue: werneckpaiva.spark.test.util.TestInputDStream. And a last utility class to start and stop streaming and manage the clock asynchronicity: werneckpaiva.spark.test.BaseTestSparkStreaming.