Featured Post

This example uses the same controller as in a previous post but adds a use case to support email validation. A Commons Validator object is ...

Friday, March 21, 2014

Collaborating on Talend Open Studio Routines: Part 2 - Overview

This post describes a development process for Talend Open Studio Routines that fosters collaboration.

In Part 1, I made a case for using standard Java development technologies (Git, Maven, Jenkins) for collaborating on Talend Open Studio Routines. These technologies are not a requirement for creating Talend Open Studio Routines; simply run Create Routine from TOS. Rather, they define a quality process by which multiple developers can interleave their changes, resolve conflicts, and fend of regression bugs.

The Bottom Line

Compare the following two screenshots. The first screenshot, Java Code in TOS shows the routine being edited in Talend Open Studio (DI). This is the Eclipse Java Code Editor which provides code completion and real-time syntax checking. Once I produce a syntactically-correct Java class, I can use the static methods in a Talend Job.

Java Code in TOS-Packaged Eclipse

The following screenshot shows a version of Eclipse not packaged by Talend. It features the same Java Code Editor. However, notice the surrounding tabs. There is a History tab at the bottom showing a version history with date, author, and descriptive information. There is a JUnit tab at the bottom left showing the outcome of running 63 unit tests. Finally, there is a tab at the top left showing related projects, including one called brules-json that produces a supporting library for the Talend Routine "BRules".

Java Code in Standard Eclipse

The Standard Eclipse screenshot not only allows you to edit the Routine, but manipulate the technologies used to maintain the Routine: Git, JUnit. More importantly, there is a strong incentive to build an object-oriented design. Instead of being limited to a single file of static methods, I can define and relate other classes for use in my Routine. You don't have to stuff every scrap of functionality in one file created by Create Routine.

The Process

The following Process Diagram shows a sequence of development activities and the products that are generated. This is just an overview, and future posts will break down each of the technologies.

Talend Routine Development Process

Deployment
The result of the Talend Routine Development Process is a zip file that can be loaded into Talend Open Studio DI using the Import Items command. I'm distributing the zip file -- shown here as brules-1.0.0-bin.zip -- on both the Talend Exchange and the Maven Central Repository. I expect most Talend users will pull the one from the Exchange. The Maven Central Repository copy is of use by developers collaborating on BRules.

Packaging
The zip file (brules-1.0.0-bin.zip) is created using a Java build tool called Maven. Let's black-box Maven for now as there are some advanced concepts like Assemblies and custom Plugins. Assume that it takes in Java code, third-party libraries, and some metadata information for TOS and produces the zip file. The zip file is placed in something called the Local Maven Repository which resides on your hard disk.

Source Code Control An input to Maven is Java source. The Talend Routine is presented as BRules.java in the diagram. For collaboration, the best place to store Java source is a configuration management system. I'm using Git hosted at GitHub. Git lets me take in changes from different sources -- other developers or different activities -- merge them, and produce tagged versions. The following screenshot of the Git client tool SmartGit shows Git highlighting a recent change that adds a function "len" (right). The left side shows the original len-less class file.

Git Tracks Changes to your Routine

Testing

In the middle of the diagram, there is a Test activity. There are actually two types of testing going on in the process. The first is the importing of the zip file (brules-1.0.0-bin.zip) into Talend Open Studio where the Routine is tested against Talend Open Studio jobs. The second is a unit testing activity that requires a unit test to accompany every change made to the Routine. This is a requirement that can be enforced with reports demonstrating the coverage with each commit by an individual developer (Cobertura).

Moreover, the unit tests can be run in an automated fashion using a tool called Jenkins which each changes made by a developer. If anyone introduces a change that breaks the test, we know right away.

Bringing standard Java development technologies into your Routine development enhances the quality and makes collaboration feasible. There is overhead and a learning curve associated with these technologies and this overhead is not justified in all instances. However, for BRules, I'm interested in involving other developers and the automated support means that I can run the project more efficiently.