Here, we want to show one Beam feature: the same code running on different execution runtimes.

So, for that, we will use Maven profiles: each profile will define a specific runner dependencies set. Then, we will be able to execute our pipeline (exactly the same code) on a target runner, just using specifying a JVM argument to identify the runner.

Direct runner

Let’s start with the Direct runner. This is the preferred runner to use for test: it uses several thread in the JVM.

It’s pretty easy to use as it just requires a dependency. So, we create a Maven profile with the Direct runner dependency:

Like for the Direct runner, we can directly use the spark-runner profile and the --runner=SparkRunner JVM argument to execute our pipeline on Spark. Basically, it performs the equivalent of a spark-submit:

Conclusion

In this article, we saw how to execute exactly the same code (no change at all in the pipeline definition) on different execution engines. We can easily switch from an engine to another just changing the profile and runner.

In a next article, we will take a deeper look on the Beam IOs: the concepts (sources, sinks, watermark, split, …) and how to use and write a custom IO.