Implementing HBase filters using Java APIs

In our previous blog we discussed about Need and Working of Filters in HBase. In this blog, we will be implementing a filtering operation on a set of rows in a HBase table.We also recommend readers to go through our our below posts on HBase as it would help them in understanding the concepts given in this post in a better way.Beginners Guide For HBaseWorking of HBase componentsRead and Write Operations in HBasePerforming CRUD Operations on HBase using JAVA APIFor the below example, we will be using an existing table named “customer” from HBase default database. We can observe in the below image that by using HBase “list” command, we are listing the tables present in the HBase default database.Table “customer” contents :As shown in the below image, the table “customer” consists of three rows, namely Kiran, Manjunath and Prateek with a single column family named “order” and its column qualifier name as the number.Scenario 1 : Write a Java API to list the row values of the “customer” table without using Filter function.Expected Output:We can refer to the below screenshot to see the what the expected output will be.Source Code: package com.acadgild.hbase;import java.io.IOException;import java.util.ArrayList;import java.util.List;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.Result;import org.apache.hadoop.hbase.client.ResultScanner;import org.apache.hadoop.hbase.client.Scan;import org.apache.hadoop.hbase.filter.Filter;import org.apache.hadoop.hbase.filter.FilterBase;import org.apache.hadoop.hbase.filter.FilterList;import org.apache.hadoop.hbase.filter.RegexStringComparator;import org.apache.hadoop.hbase.filter.RowFilter;import org.apache.hadoop.hbase.filter.SubstringComparator;import org.apache.hadoop.hbase.filter.ValueFilter;import org.apache.hadoop.hbase.filter.CompareFilter;import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;import org.apache.hadoop.hbase.util.Bytes;import org.jruby.compiler.ir.operands.Array;

Here’s the explanation of each line of code : In line 1, we are declaring a class name Filter_RowValue.In line 3, the Configuration class adds HBase configuration resources to its object conf with the help of create() method of the HBaseConfiguration class.In line 4, the class HTable instance “table” will allow to communicate with a single HBase table, it accepts configuration object and the table name as the parameters.In line 5, we are creating class Scan “scan” instance to perform Scan operations.In line 6, we are using addColumn method to column in the table “customer”, where “order” is the column family name and “number” is the column qualifier name of the column family “order”.In line 7, we are declaring ResultScanner instance “result” which returns a scanner on the current table “customer” as specified by the Scan object.In line 8, a foreach loop is taken, which will run each time for the rows inside the “customer” table until the result scanner value is found.In line 9, we are storing entire rows, if the column family name is “order” and column qualifier name is “number” found in the table “customer” in the variable val.In line 10, we are printing the entire variable val values with its associated column qualifier value.In line 13, we are closing the table operation.Output:Scenario 2 : Write a Java API to list the row values of the “customer” table where the column qualifier value is “Fli” and discarding the rows which is not having column qulaifier value as “Fli” using HBase Filter function.Expected Output :
We can refer to the below screenshot to see the what the expected output will be.package com.acadgild.hbase;import java.io.IOException;import java.util.ArrayList;import java.util.List;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.Result;import org.apache.hadoop.hbase.client.ResultScanner;import org.apache.hadoop.hbase.client.Scan;import org.apache.hadoop.hbase.filter.Filter;import org.apache.hadoop.hbase.filter.FilterBase;import org.apache.hadoop.hbase.filter.FilterList;import org.apache.hadoop.hbase.filter.RegexStringComparator;import org.apache.hadoop.hbase.filter.RowFilter;import org.apache.hadoop.hbase.filter.SubstringComparator;import org.apache.hadoop.hbase.filter.ValueFilter;import org.apache.hadoop.hbase.filter.CompareFilter;import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;import org.apache.hadoop.hbase.util.Bytes;import org.jruby.compiler.ir.operands.Array;

Here’s the explanation of each line of code : In line 1, we are declaring a class name Filter_RowValue.In line 3, the Configuration class adds HBase configuration resources to its object conf with the help of create() method of the HBaseConfiguration class.In line 4, the class HTable instance “table” will allow to communicate with a single HBase table, it accepts configuration object and the table name as the parameters.
In line 5, we are creating class Scan “scan” instance to perform Scan operations.
In line 6, we are using addColumn method to column in the table “customer”, where “order” is the column family name and “number” is the column qualifier name of the column family “order”.In line 7, we are using the class ValueFilter to filter the cells based on the value. It takes a CompareFilter.CompareOp operator (equal, greater, not equal, etc), and either a byte[] value or a ByteArrayComparable.Here, “order” is the column family name, “number” is its column qualifier name, and “Fli” is the value in the table “customer”. We are using CompareOp.EQUAL and Substringcomparator operator to check whether the value “Fli” is present in the column family qualifier name “number”.
In line 9, we are declaring a variable “list” of FliterList class and using FilterList.Operator.MUST_PASS_ONE which evaluates and compares all the filters and doesn’t stops the evaluation process like FilterList.Operator.MUST_PASS_ALL method as soon as one filter does not include the KeyValue.
In line 10, we are using setFilter method to perform Filter operation on the list.In line 11, we are declaring ResultScanner instance “result” which returns a scanner on the current table “customer” as specified by the Scan object.In line 12, a foreach loop is taken, which will run each time for the rows inside the “customer” table until the result scanner value is found.In line 13, we are storing entire rows, if the column family name is “order” and column qualifier name is “number” found in the table “customer” in the variable val.In line 14, we are printing the entire variable val values with its associated column qualifier value.In line 17, we are closing the table operation.Output :Thus, from the above steps we can observe that how HBase custom Filter helped us to retrieve a set of rows with column qualifier value as ‘ flip’ by scanning on a particular column family and its column qualifier value which was passed as an argument in the program instead of scanning the whole table.We hope this post has been helpful in understanding the working of Filters in HBase, for retrieving results from a HBase database. In case of any queries, feel free to comment below and we will get back to you at the earliest.Keep visiting our website for more post on Big Data and other technologies.

Related

is working with AcadGild as Big Data Engineer and is a Big Data enthusiast with 2+ years of experience in Hadoop Development. He is passionate about coding in Hive, Spark, Scala.
Feel free to contact him at [email protected] for any further queries.