Anatomy of a Command Builder with Example – Cloudera Kite Morphlines

Cloudera Kite Morphlines

In the last post we have seen the internals
of a configuration file also known as morphline. In this post we are going to
explore the actual code that does all the
job in the background. It doesn’t make a
difference whether you are using the in
built command (bundled with Cloudera Kite Morphlines SDK) or writing your own custom command , basic structure and semantics of all the commands are same.

All the commands in the Cloudera Kite Morphlines implements

org.kitesdk.morphline.api.CommandBuilder

interface. This interface contains 2 methods for which you have to provide the implementation in your CommandBuilder implementation.

getNames() : – This is the place where you give name to your Command Builder so that it can be used in the configuration file. Command name and command builders are tightly coupled to each other. During the initialization of the first command i.e (Pipe) see previous post , it loads all the command builders through the importCommands statements available in the configuration file. If you recollect from our previous post configuration file we have given two packages there.

, so while context initialization it will scan all the packages and finds all the command builders. During scanning itself it will register the command builder with its defined command name via invocation of getNames().You can also see the complete code in the

Note : A command builder can be registered with the multiple names provided in the implementation of getNames().

build(…): Actual syntax for this method can be seen above.Almost all of the times , this method will have a single line of implementation code in it , and sole responsibility for this method is to return the Command which in turn override the doProcess()method which is the most important actor in the complete process.If you recollect from the previous post where we have seen the how the doProcess() runs the whole show and controls the chain of invocation.

Enough of theory let’s see couple of command builders to get the actual feel of them. First we will see the

There is nothing special about the above command builder , it expects output location for the avro schema file as an input argument from the configuration file. After that it reads a Java map which has been passed on by the previous command , writes all of its keys in the schema file with the data type as “string” (can be customized as per requirements).

Idea behind writing this command builder to avoid the writing of avro schema file manually again and again for different structured objects , as this avro schema file will be used in the next command when I load this data into the hive table in Parquet Format.

This is one of simple use case of command builder , we can utilize the Cloudera Kite Morphlines SDK to break any complex transformation in a series of small commands. Other advantage of these custom command builders as I mentioned in my previous post is re-usability. Once I have this command builder (GenerateAvroSchemaFileBuilder), I can use it in any number of configuration files.

In the last three posts (including this one) , I tried to summarize whatever i have learnt till now from Kite SDK. Currently I am exploring the other modules of Cloudera Kite Morphlines and will keep posting my experiences and findings here.