How-to: Test HBase Applications Using Popular Tools

While Apache HBase adoption for building end-user applications has skyrocketed, many of those applications (and many apps generally) have not been well-tested. In this post, you’ll learn some of the ways this testing can easily be done.

We will start with unit testing via JUnit, then move on to using Mockito and Apache MRUnit, and then to using an HBase mini-cluster for integration testing. (The HBase codebase itself is tested via a mini-cluster, so why not tap into that for upstream applications, as well?)

As a basis for discussion, let’s assume you have an HBase data access object (DAO) that does the following insert into HBase. The logic could be more complicated of course but for the sake of example, this does the job.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

publicclassMyHBaseDAO{

publicstaticvoidinsertRecord(HTableInterface table,HBaseTestObj obj)

throwsException{

Put put=createPut(obj);

table.put(put);

}

privatestaticPut createPut(HBaseTestObj obj){

Put put=newPut(Bytes.toBytes(obj.getRowKey()));

put.add(Bytes.toBytes("CF"),Bytes.toBytes("CQ-1"),

Bytes.toBytes(obj.getData1()));

put.add(Bytes.toBytes("CF"),Bytes.toBytes("CQ-2"),

Bytes.toBytes(obj.getData2()));

returnput;

}

}

HBaseTestObj is a basic data object with getters and setters for rowkey, data1, and data2.

The insertRecord does an insert into the HBase table against the column family of CF, with CQ-1 and CQ-2 as qualifiers. The createPut method simply populates a Put and returns it to the calling method.

Using JUnit

JUnit, which is well known to most Java developers at this point, is easily applied to many HBase applications. First, add the dependency to your pom:

What you did here was to ensure that your createPut method creates, populates, and returns a Put object with expected values.

Using Mockito

So how do you go about unit testing the above insertRecord method? One very effective approach is to do so with Mockito.

First, add Mockito as a dependency to your pom:

1

2

3

4

5

6

&lt;dependency&gt;

&lt;groupId&gt;org.mockito&lt;/groupId&gt;

&lt;artifactId&gt;mockito-all&lt;/artifactId&gt;

&lt;version&gt;1.9.5&lt;/version&gt;

&lt;scope&gt;test&lt;/scope&gt;

&lt;/dependency&gt;

Then, in test class:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

@RunWith(MockitoJUnitRunner.class)

publicclassTestMyHBaseDAO{

@Mock

privateHTableInterface table;

@Mock

privateHTablePool hTablePool;

@Captor

privateArgumentCaptor putCaptor;

@Test

publicvoidtestInsertRecord()throwsException{

//return mock table when getTable is called

when(hTablePool.getTable("tablename")).thenReturn(table);

//create test object and make a call to the DAO that needs testing

HBaseTestObj obj=newHBaseTestObj();

obj.setRowKey("ROWKEY-1");

obj.setData1("DATA-1");

obj.setData2("DATA-2");

MyHBaseDAO.insertRecord(table,obj);

verify(table).put(putCaptor.capture());

Put put=putCaptor.getValue();

assertEquals(Bytes.toString(put.getRow()),obj.getRowKey());

assert(put.has(Bytes.toBytes("CF"),Bytes.toBytes("CQ-1")));

assert(put.has(Bytes.toBytes("CF"),Bytes.toBytes("CQ-2")));

assertEquals(Bytes.toString(put.get(Bytes.toBytes("CF"),

Bytes.toBytes("CQ-1")).get(0).getValue()),"DATA-1");

assertEquals(Bytes.toString(put.get(Bytes.toBytes("CF"),

Bytes.toBytes("CQ-2")).get(0).getValue()),"DATA-2");

}

}

Here you have populated HBaseTestObj with “ROWKEY-1”, “DATA-1”, “DATA-2” as values. You then used the mocked table and the DAO to insert the record. You captured the Put that the DAO would have inserted and verified that the rowkey, data1, and data2 are what you expect them to be.

The key here is to manage htable pool and htable instance creation outside the DAO. This allows you to mock them cleanly and test Puts as shown above. Similarly, you can now expand into all the other operations such as Get, Scan, Delete, and so on.

Basically, after a bunch of processing in MyReducer, you verified that:

The output is what you expect.

The Put that is inserted in HBase has “RowKey-1” as the rowkey.

“DATADATA1DATA2” is the value for the CF column family and CQ column qualifier.

You can also test Mappers that get data from HBase in a similar manner using MapperDriver, or test MR jobs that read from HBase, process data, and write to HDFS.

Using an HBase Mini-cluster

Now we’ll look at how to go about integration testing. HBase ships with HBaseTestingUtility, which makes writing integration testing with an HBase mini-cluster straightforward. In order to pull in the correct libraries, the following dependencies are required in your pom:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

&lt;dependency&gt;

&lt;groupId&gt;org.apache.hadoop&lt;/groupId&gt;

&lt;artifactId&gt;hadoop-common&lt;/artifactId&gt;

&lt;version&gt;2.0.0-cdh4.2.0&lt;/version&gt;

&lt;type&gt;test-jar&lt;/type&gt;

&lt;scope&gt;test&lt;/scope&gt;

&lt;/dependency&gt;

&lt;dependency&gt;

&lt;groupId&gt;org.apache.hbase&lt;/groupId&gt;

&lt;artifactId&gt;hbase&lt;/artifactId&gt;

&lt;version&gt;0.94.2-cdh4.2.0&lt;/version&gt;

&lt;type&gt;test-jar&lt;/type&gt;

&lt;scope&gt;test&lt;/scope&gt;

&lt;/dependency&gt;

&lt;dependency&gt;

&lt;groupId&gt;org.apache.hadoop&lt;/groupId&gt;

&lt;artifactId&gt;hadoop-hdfs&lt;/artifactId&gt;

&lt;version&gt;2.0.0-cdh4.2.0&lt;/version&gt;

&lt;type&gt;test-jar&lt;/type&gt;

&lt;scope&gt;test&lt;/scope&gt;

&lt;/dependency&gt;

&lt;dependency&gt;

&lt;groupId&gt;org.apache.hadoop&lt;/groupId&gt;

&lt;artifactId&gt;hadoop-hdfs&lt;/artifactId&gt;

&lt;version&gt;2.0.0-cdh4.2.0&lt;/version&gt;

&lt;scope&gt;test&lt;/scope&gt;

&lt;/dependency&gt;

Now, let’s look at how to run through an integration test for the MyDAO insert described in the introduction:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

publicclassMyHBaseIntegrationTest{

privatestaticHBaseTestingUtility utility;

byte[]CF="CF".getBytes();

byte[]QUALIFIER="CQ-1".getBytes();

@Before

publicvoidsetup()throwsException{

utility=newHBaseTestingUtility();

utility.startMiniCluster();

}

@Test

publicvoidtestInsert()throwsException{

HTableInterface table=utility.createTable(Bytes.toBytes("MyTest"),

Bytes.toBytes("CF"));

HBaseTestObj obj=newHBaseTestObj();

obj.setRowKey("ROWKEY-1");

obj.setData1("DATA-1");

obj.setData2("DATA-2");

MyHBaseDAO.insertRecord(table,obj);

Get get1=newGet(Bytes.toBytes(obj.getRowKey()));

get1.addColumn(CF,CQ1);

Result result1=table.get(get1);

assertEquals(Bytes.toString(result1.getRow()),obj.getRowKey());

assertEquals(Bytes.toString(result1.value()),obj.getData1());

Get get2=newGet(Bytes.toBytes(obj.getRowKey()));

get2.addColumn(CF,CQ2);

Result result2=table.get(get2);

assertEquals(Bytes.toString(result2.getRow()),obj.getRowKey());

assertEquals(Bytes.toString(result2.value()),obj.getData2());

}}

Here you created an HBase mini-cluster and started it. You then created a table called “MyTest” with one column family, “CF”. You inserted a record using the DAO you needed to test, did a Get from the same table, and verified that the DAO inserted records correctly.

The same could be done for much more complicated use cases along with the MR jobs like the ones shown above. You can also access the HDFS and ZooKeeper mini-clusters created while creating the HBase one, run an MR job, output that to HBase, and verify the inserted records.

Just a quick note of caution: starting up a mini-cluster takes 20 to 30 seconds and cannot be done on Windows without Cygwin. However, because they should only be run periodically, the longer run time should be acceptable.

One response on “How-to: Test HBase Applications Using Popular Tools”

Great post! I wanted to make you aware of another option: on the kiji project, we’ve mocked out the HBase API with an in-process/in-memory version that functions like HBase, within a single process, for use by tests.