Oracle Blog

Inside Oracle's Data Integration community

Monday Dec 31, 2012

The Hive external table let's us do lots of cool stuff including processing data from NoSQL. We have seen how custom SerDes are used, Hive storage handlers also provide some cool capabilities. Using the Hive storage handler defined here, an external table can be defined to project data from a NoSQL key-value store. The external table can then be used as a source in ODI, very simple.

The illustration on github has the following data stored in a Oracle NoSQL Database (the key is the lastname/firstname etc):

/Smith/Bob/-/birthdate: 05/02/1975

/Smith/Bob/-/phonenumber: 1111-1111

/Smith/Bob/-/userid: 1

/Smith/Patricia/-/birthdate: 10/25/1967

/Smith/Patricia/-/phonenumber: 2222-2222

/Smith/Patricia/-/userid: 2

/Wong/Bill/-/birthdate: 03/10/1982

/Wong/Bill/-/phonenumber: 3333-3333

/Wong/Bill/-/userid: 3

Using the Hive external table and the custom storage handler for a key value store, we define a mask to project the data through the external table.

we specify the keyvalue store using TBLPROPERTIES, identify the host/port and the keystore name (kvstore).

the SerDe properties contains the mapping of the keys to column names, you will get a row for each value of birthdate, phonenumber, userID

Fairly straightforward. We can then reverse engineer this into ODI, using the same mechanism as I described in previous posts here setting the ODI_HIVE_SESSION_JARS and so forth. The data projected looks like this;

hive> SELECT * FROM MY_KV_TABLE;

OK

Smith Patricia 10/25/1967 NULLNULL

SmithPatriciaNULL2222-2222 NULL

SmithPatriciaNULLNULL2

SmithBob05/02/1975NULLNULL

SmithBobNULL1111-1111NULL

SmithBobNULLNULL1

WongBill03/10/1982NULLNULL

WongBillNULL3333-3333NULL

WongBillNULLNULL3

In ODI by defining the Hive collect_set function as an aggregation function, we can then aggregate the data and pivot the data to get it as a row;