Query your Hadoop databases with HiveQL

So, you’ve got your Hadoop cluster all set up and you’re ready to begin your data analysis. You could go down the route of learning how to use Pig, a nosql language which can be used with Hadoop, but, unless you’re familiar with Java, that’s probably going to be a bit of a challenge.

So, what can you do? Well, if you’re already familiar with SQL, you can take advantage of HiveQL (Hive Query Language), which is an SQL-like language that can be used to query your Hadoop databases.

It should be noted that Hive is very good for querying your database, but due to the sheer amount of data held within the Hadoop cluster, it’s not the best way to extract real time data, as queries do have a certain level of latency.

You can access hive in three ways: through the web user interface; by using Microsoft HD Insight or you can access it through the command line interface.

The great thing about Hive is that he user does not need to know about Map Reduce (described in a little detail here), as Hive automatically converts your SQL scripts.

Hive supports the following operators:

Select

Where

Group By

Order By

Sort By

Union

Left Join

Right Join

Inner Join

Outer Join

Cross Join

Rank

This list is growing with each Hive release, so as time goes on, you’ll become even more familiar with the commands that are available to you.