Implementing Apache Drill in ODI 12C – Part 1

In the previous post I had explained how to use json file in odi mapping and loading the data to oracle table. We had also modified certain properties of Complex File technology to generate correct syntax for json queries. If you have missed it then refer this post. Today I am going to demonstrate the installation and configuration of Apache drill. After the configuration we will run some sample queries to read data from CSV, TEXT and PARQUET files. In part2, we will see drill implementation in ODI with simple joins between different source systems.

Prerequisite:

You are familiar with hadoop ecosystem. To know more about drill please refer the official documentation.

You have a virtual box and the appliance is already imported.

The VM is running fine

Hadoop, Hive is already installed and the required services are up and running.

For this series of blog posts I will be using Oracle Bigdatalite which is a Cloudera distribution. It has almost 15 zip files and the size sums up to 30 GB. If you want a lighter version you can directly download the vm from Cloudera or Hortnworks or MapR.

Lets take a look at the definition taken from drill documentation:

Drill is an Apache open-source SQL query engine for Big Data exploration. Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data coming from modern Big Data applications, while still providing the familiarity and ecosystem of ANSI SQL, the industry-standard query language.

So it is quite similar to Cloudera Impala or Facebook Presto. Lets proceed with the installation. If the VM is not up start it. Once it is up and running, open a terminal and run the following command to verify that Java 7 or 8 is the version in effect:

Well whatever we queried above was just a json file named as employee.json and located at ./jars/3rdparty/foodmart-data-json.0.4.jar. The cp command is nothing but the classpath and the json file presents inside the jar file.

Using PARQUET files:

To query a parquet file , we need to move the sample-data from installation location to hdfs location using below command.

Thats it for today. I would suggest you to practice some basic commands and get your hands dirty on all type of joins. Once you are familiar with the commands we can then use it in ODI. Though ODI will do the work for you but having knowledge on the working principle will help you in accomplishing your objective quickly and seamlessly.

About the author

Bhabani(http://dwteam.in) - Currently Bhabani is working as Sr Development Engineer at Harman International. He has good expertise on Oracle, Oracle Data Integrator, Pervasive Data Integrator, MSBI, Talend and Java. He is also contributing in ODI-OTN forum for last 5 years. He is from India. If you want to reach him then please visit contact us page.
If you have any doubts or concerns on the above article, please put your question here. Dw Team will try to respond it as soon as possible. Also dont forget to provide your comments / suggestions / feedback for further improvement. Thanks for your time.

Disclosure

The views expressed on this blog are those of the author and do not necessarily reflect the views of Oracle. All content and s/w code on this site are offered without any warranty, or promise of operational quality or functionality.