Welcome to this first episode of this series: “Unlock your [….] data with Red Hat JBoss Data Virtualization (JDV).”

This post will guide you through an example of connecting to a Hadoop source via the Hive2 driver, using Teiid Designer. In this example we will demonstrate connection to a local Hadoop source. We’re using the Hortonworks 2.5 Sandbox running in Virtual Box for our source, but you can connect to another Hortonwork source if you wish using the same steps.

Hortonworks provides Hive JDBC and ODBC drivers that let you connect popular tools to query, analyze and visualize data stored within the Hortonworks Data Platform (HDP).

Note: we support HBase as well, stay tuned for an episode of Unlock your HBase data with Hortonworks and JDV.

What about automating configuration ?

A question you might ask: Can we automate the above configuration steps?
The answer is yes. we can, with Ansible, by Red Hat.

As you can imagine, I like drinking our own champagne (cool-aid).

Ansible is a radically simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs. It uses no agents and no additional custom security infrastructure, so it’s easy to deploy – and most importantly, it uses a very simple language (YAML, in the form of Ansible Playbooks) that allow you to describe your automation jobs in a way that approaches plain English.

For your convenience most of the steps are automated in an ansible playbook called hdphive2 on github and to run you only need to run one command and you should see similar output as shown below:

Steps to unlock your Hadoop data

Start your local JDV 6.3 environment

$ cd $JDV_HOME/bin
$ ./standalone.sh

Start your local HDP Sandbox environment
Note: Before starting the HDP Sandbox environment please change the port 8080 to 8070 in the port forwarding section of the HDP Sandbox environment since it exposes the 8080 port by default for Ambari). The HDP Sandbox environment contains several databases with Hive tables out of the box as shown below in the Hive View of Ambari.

Start your local JBDS environment
Start JBDS 9.1.0 and open the Teiid Designer Perspective as shown belowNote: Use the following menu options Window > Perspective > Open Perspective > Teiid Designer to set JBDS in Teiid Designer perspective

Create connection profile and source model
We are now going to use the HDP Hive2 JDBC driver directly and import metadata directly using the JDBC importer. Right-click on project HDPHive2Sample and select Import and select JDBC Database >> Source Model as shown below.Click “Next >”.a) Create Connection Profile
Setup Connection Profile to connect to local HDP Sandbox environment
On the first page of the wizard, click “New…” to create a new Connection ProfileBefore we can proceed we need to setup a new connection profile to be able to connect to the HDP Sandbox environment using the JDBC jars previously downloaded from the HDP Hive JDBC archive site. Click the “Add Driver Definition” button in order to be able to connect to the HDP Sandbox environment from within JBDS. Select “Generic JDBC” for the Connection Profile Type and name the Connection Profile “HDPHive2DS”. Click “Next >”.
Add the jars (hive-jdbc-standalone.jar, hadoop-common.jar, and hadoop–auth.jar) we mentioned previously in the JAR List tab see below.Now we are ready to connect to the HDP Sandbox environment using Hive by providing the correct connection details in the Properties tab as shown below:
.
The Hive JDBC URL is a string with the following syntax:

Click “Ok”.Click “Test Connection” to validate if the connection to the HDP Sandbox environment can ping successfully.

Since the connection can ping successfully we are ready to select Hive tables from the HDP Sandbox environment and create a source model out of it.

b) Create Source model

Click “Next >”.

Click “Next >” to select database objects. Select all tables in the foodmart database.
Click “Next >”.Make sure that the JNDI name corresponds to one we created in the JDV environment (Hint: HDPHive2DS) and that Auto-create Data Source is not selected. Click “Finish” to create the source models. Select the customer model and click the running man icon to preview the data as depicted below.

Conclusion

In this post we’ve shown the configuration steps one needs to perform in order to unlock your Hadoop data using Hive, with Hortonworks and Red Hat JBoss Data Virtualization. We have shown that Ansible is not only targeted at system administrators, but that it’s also an invaluable tool for developers, testers, etc. I encourage you to experiment with the simple basics provided in this article, and expand the functionality of your playbook gradually to create even more sophisticated provisioning scripts.

Now we are ready to add other data-sources from physically distinct systems into the mix such as SQL databases, XML/Excel files, NoSQL databases, enterprise applications and web-services etc.

Special thanks to Marc Zottner and Roeland van der Pol for their ansible scripts which give me the inspiration to use ansible for the configuration in this post. Do you want to get inspired as well? See for more sophisticated Red Hat JBoss Middleware Ansible playbooks at https://github.com/Maarc/ansible_middleware_soe.