Apache Nifi is a Data Flow Management system. Apache Velocity is a Template Engine. I have used Velocity for many years now: it is a swiss-army knife for converting or formatting data. And I have used it in web applications to separate the design - the web pages - from the code. Groovy (I used Beanshell as well) does the work to get the data and apply logic and Velocity is responsible for the visual display.

Now that I have spent some time with Apache Nifi, I wanted to write my first processor (like a puzzle piece) for it and as Velocity is straight forward and I know it well, I decided to start a little project.

The idea is, that in Nifi there is data arriving and that data is formatted through the template engine. Here is an example of one row from the data:

To achieve this simply create a template. Replace the actual data in the Json above with placeholders like this:

In the process, the actual data is merged with the template to produce the desired format. Actually it is very simple, but isn't that the beauty of some of the tools?: simple yet powerful. Take a look at the Apache Velocity website at: http://velocity.apache.org.

Below find the data flow I have put together. The data is read from a file (respectively multiple ones). The file is split into individual lines and runs through the "Merge attributes with template" processor shown below. Here the Velocity template is merged with the data. Then the results of each merge are put back together in one file. Finally it is written to disk again.

Give Nifi a try - it is very interesting for managing data flows. The installation is easy and straight forward.

Hope this is interesting and enjoy your day.

Carpe Diem

The last post showed how to retrieve tweets from Twitter and store them in separate folders based on an attribute. This time I take a similar approach but the tweets are stored in a MongoDb collection and finally displayed using Highcharts.

So the first step retrieves the tweets for some of my favorite bands. I extract the text, user id and name and the date and time from the tweats Json representation. Then the Json is pimped with some extra attributes: createddate, createdyear, createdmonth, createdtime and searchtopic. Below is a sample Json document.

Below is a screenprint of the Nifi flow. It shows three processors that retrieve the tweets. Then the UpdateAttribute processors are used to tag the incomming flowfiles (tweats). And then the processes unite into the "Store in MongoDb" group.

Groups in Nifi allow to group multple processors (a part of the flow). This helps to create logical units of parts that belong together. What I did here is to put everything into one group that is equal for all three incomming streams of data/files.When I double-click the "Store in MongoDb" group, the content of the group is shown, as can be seen below.

The next thing I have done is to create a MongoDb Mapreduce job. Yes, MongoDb uses map and reduce as well. Because I like Groovy and I can use it as a scripting language (based on Java), I have chosen it as the language for retrieving the results from MongoDb. The result will be a Json representation of the results from the MongoDb server.Here is the Json I generate from the MongoDb. The Json contains the name of the band and the counts per month in form of an array. The counts are done using the MongoDb mapreduce job inside the Groovy script.

Then, the results are merged (still in the Groovy script) into a template using the Apache Velocity template engine. The template is a Html page with placeholders, where the data will be merged into. So Groovy formats the data it gets from MongoDb and inserts it into the placeholders in the Html file. The result is one file containing the Html code and the data.

Here is the result - a webpage showing the counts of tweets per month for my favorite bands. I have only started to collect data today, so there is not much data yet.If you run the Groovy script at regular intervals through cron, it will generate a new html page containing the recent numbers that the Nifi flow collected. And then just refresh the web page to get them displayed.

I hope this post is helpful. If somebody is interested, I will be happy to share the Nifi flow in the form of a template and of course also the Groovy code and the Apache Velocity template.