Data Migration from SQL to HBase Using MapReduce

Data Migration from SQL to NoSQL

Data migration is the process of transferring data from one system to another by changing the storage or database or the application. In this tutorial, let us learn how to migrate the data present in MySQL to HBase which is a NoSQL database using Mapreduce.

MySQL is one of the most widely used Relational Database systems. But due to the rapid growth of data nowadays people are searching for better alternatives to store and process their data. This is how Hbase came into existence which is a Hadoop database capable of storing a huge amount of data in the clusters and can scale massively.

Let us now see how to migrate the data present in MySQL to HBase using Hadoop’s map reduce.

Here, for reading the data in MySQL, we will be using DBInputFormat which is as follows:

Using DBInput format, our MapReduce code will be able to read the data from MYSQL. In the table which we are using in this example, we have two fields emp_id & emp_name. So we will take the two fields from MYSQL table and store them in HDFS.

Here is our data present in MYSQL. In the database Acadgild we have employee table and in that table, we have two columns emp_id & emp_name as shown in the below screenshot.

Our DBInputFormat will read this data, so this will be the input of our mapper class. To store this data in Hbase, we need to create a table in Hbase. You can use the below Hbase command to create a table.

create 'employee','emp_info'

In the above screenshot, you can see that employee table has been created in HBase. emp_info is the column family which contains the information of the employee.

The mapper class which reads the input from MySQL table is as follows.

Above is the Mapper class implementation which can read the data from a MySQL table. The output of this Mapper class is the emp_id as key and emp_name as value. HBase writes data into it as bytes, so we need to take the key as ImmutableBytesWritable.

So from this mapper class MySQL table data is read and the data is kept as key and value. Key will be the same in both MySQL and HBase.

Now the key and the rest of the columns of the MySQL table will be sent to the reducer. For writing data into the HBase a reducer class called TableReducer is called.Using TableReducer class we need to write the MySQL data into Hbase and is as follows:

This Tablereducer receives key and the rest of the columns as values. Now we need to write a for-each loop to iterate the rest of the columns. In this particular table, we have only two columns so we have taken a variable name which stores the emp_name and using Put class provided by HBase we write the data into HBase column family.

Now we can use the command mvn clean compile assembly:single to build an executable jar for this program. After running this command, you need to get the success message as shown in the below screenshot.

In your project directory, inside target folder, you will be able to see the jar file created.

We will now run this jar as a normal Hadoop jar. After which we can check for the output in HBase table.

In the above screenshot, you can see that the job has been completed successfully. Let us now check for the output in our HBase table.

In the above screenshot, you can see the data in HBase table after running the jar file. We have successfully migrated the data present in MySQL table to HBase table using MapReduce.

Hope this blog helped you in understanding how to transfer data present in MYSQL to HBase using MapReduce. Keep visiting our site www.acadgild.com for more updates on big data and other technologies.