Monday, 22 April 2013

MySQL Applier For Hadoop: Implementation

This
is a follow up post, describing the implementation details of Hadoop
Applier, and
steps to configure and install it. Hadoop Applier integrates MySQL
with Hadoop providing the real-time replication of INSERTs to HDFS,
and hence can be consumed by the data stores working on top of
Hadoop. You can know more about the design rationale and per-requisites in the previous post. Design
and Implementation:Hadoop
Applier replicates rows inserted into a table in MySQL to the Hadoop Distributed File System(HDFS). It uses an API provided by libhdfs, a C library to manipulate files in HDFS.

The
library comes pre-compiled with Hadoop distributions.

It
connects to the MySQL master (or read a binary
log generated by MySQL) and:

fetches
the row insert events occurring on the master

decodes
these events, extracts data inserted into each field of the row

uses content handlers to get it in the format required and appends
it to a text file in HDFS.

Schema
equivalence is a simple mapping:Databases
are mapped as separate directories, with tables in them as
sub-directories. Data inserted into each table is written into text
files (named as datafile1.txt) in HDFS. Data can be in comma
separated format; or any other delimiter can be used, that is configurable by command line
arguments.

The
diagram explains the mapping between MySQL and HDFS schema.

The
file inwhich the data is stored is named datafile1.txt here; you can
name is anything you want. The working directory where this datafile
goes is base_dir/db_name.db/tb_name.

The
timestamp at which the event occurs is included as the first field in
each row inserted in the text file.

The
implementation follows these steps:-
Connect to the MySQL master using the interfaces to the binary log#include
“binlog_api.h”

Binary_log
binlog(create_transport(mysql_uri.c_str()));

binlog.connect();

-
Register content handlers

/*Table_index is a sub class of Content_handler class in the Binlog API*/

Table_index
table_event_hdlr;

Applier
replay_hndlr(&table_event_hdlr, &sqltohdfs_obj);

binlog.content_handler_pipeline()->push_back(&table_event_hdlr);

binlog.content_handler_pipeline()->push_back(&replay_hndlr);

-
Start an event loop and wait for the events to occur on the master

while
(true)

{

/*

Pull
events from the master. This is the heart beat of the event listener.

*/

Binary_log_event *event;

binlog.wait_for_next_event(&event);

}

-
Decode the event using the content handler interfaces

class
Applier : public mysql::Content_handler

{

public:

Applier(Table_index *index, HDFSSchema *mysqltohdfs_obj)

{

m_table_index= index;

m_hdfs_schema= mysqltohdfs_obj;

}

mysql::Binary_log_event *process_event(mysql::Row_event
*rev)

{

int
table_id= rev->table_id;

typedef
std::map<long int, mysql::Table_map_event *> Int2event_map;

int2event_map::iterator ti_it=
m_table_index->find(table_id);

-
Each row event contains multiple rows and fields.Iterate one row at a time using Row_iterator.

mysql::Row_event_set rows(rev, ti_it->second);

mysql::Row_event_set::iterator it= rows.begin();

do

{

mysql::Row_of_fields fields= *it;

long
int timestamp= rev->header()->timestamp;

if
(rev->get_event_type() == mysql::WRITE_ROWS_EVENT)

table_insert(db_name, table_name, fields, timestamp,
m_hdfs_schema);

}
while (++it != rows.end());

- Get
the field data separated by field delimiters and row delimiters.

Each
row contains a vector of Value objects. The converter allows us to
transform the value into another representation.mysql::Row_of_fields::const_iterator
field_it= fields.begin();

mysql::Converter converter;

std::ostringstream data;

data
<< timestamp;

do

{

field_index_counter++;

std::vector<long int>::iterator it;

std::string
str;

converter.to(str, *field_it);

data
<< sqltohdfs_obj->hdfs_field_delim;

data
<< str;

}
while (++field_it != fields.end());

data
<< sqltohdfs_obj->hdfs_row_delim;

-
Connect to the HDFS file system.

If not provided, the
connection information (user name, password host and port) are read
from the XML configuration file, hadoop-site.xml.

Install
and Configure:Follow
these steps to install and run the Applier:

1.
Download a Hadoop release (I am using 1.0.4); configure and install
(for the purpose of the demo, install it in pseudo distributed mode).
Flag
'dfs.support.append'
must be set to true while configuring HDFS(hdfs-site.xml). Since
append is not supported in Hadoop 1.x, set the flag
'dfs.support.broken.append' to
true.

2.
Set the environment variable $HADOOP_HOME to point to the Hadoop
installation directory.

3.
CMake doesn't come with a 'find' module for libhdfs. Ensure that the
'FindHDFS.cmake' is in the CMAKE_MODULE_PATH. You can download a
copy here.

4.
Edit the file 'FindHDFS.cmake', if necessary, to have HDFS_LIB_PATHS
set as a path to libhdfs.so, and HDFS_INCLUDE_DIRS have the path
pointing to the location of hdfs.h.

For
1.x versions, library path is $ENV{HADOOP_HOME}/c++/Linux-i386-32/lib
, and header files are contained in
$ENV{HADOOP_HOME}/src/c++/libhdfs. For 2.x releases, header files and
libraries can be found in $ENV{HADOOP_HOME}/lib/native, and
$ENV{HADOOP_HOME}/include respectively.

5.Since
libhdfs is JNI based API, it requires JNI header files and libraries
to build. If there exists a module FindJNI.cmake in the
CMAKE_MODULE_PATH and JAVA_HOME is set; the headers will be included,
and the libraries would be linked to. If not, it will be required to
include the headers and load the libraries
separately (modify LD_LIBRARY_PATH).

6.
Build and install the library 'libreplication', to be used by Hadoop Applier,using CMake.

'mysqlclient'
library is required to be installed in the default library paths.
You can either download and install it (you can get a copy here),
or set the environment variable $MYSQL_DIR to point to the parent
directory of MySQL source code. Make sure to run cmake on MySQL
source directory.

$export
MYSQL_DIR=/usr/local/mysql-5.6

Run
the 'cmake' command on the parent directory of the Hadoop Applier
source. This will generate the necessary Makefiles. Make sure to set
cmake option ENABLE_DOWNLOADS=1; which will install Google Test
required to run the unit tests.

$cmake
. -DENABLE_DOWNLOADS=1

Run
'make' and 'make install' to build and install. This will install
the library 'libreplication' which is to be used by Hadoop Applier.

7.
Make sure to set the CLASSPATH to all the hadoop jars needed to run
Hadoop itself.

$export PATH=$HADOOP_HOME/bin:$PATH

$export
CLASSPATH=$(hadoop classpath)

8.
The code for Hadoop Applier can be found in /examples/mysql2hdfs, in the Hadoop Applier
repository. To compile, you can simply load the libraries (modify
LD_LIBRARY_PATH if required), and run the command “make happlier” on your
terminal. This will create an executable file in the mysql2hdfs directory.

..
and then you are done!

Now
run hadoop dfs (namenode and datanode), start a MySQL server as
master with row based replication (you can use mtr rpl suite for
testing purposes : $MySQL-5.6/mysql-test$./mtr --start --suite=rpl --mysqld=--binlog_format='ROW' --mysqld=--binlog_checksum=NONE), start hive (optional) and run the executable
./happlier, optionally providing MySQL and HDFS uri's and other
available command line options. (./happlier –help for more info).

There
are useful filters as command line options to the Hadoop applier.

Options

Use

-r,
--field-delimiter=DELIM

Use
DELIM instead of ctrl-A for field delimiter. DELIM can be a string
or an ASCII value in the format '\nnn' .

Escape sequences are not
allowed.

Provide the string by which
the fields in a row will be separated. By default, it is set to
ctrl-A

-w,
--row-delimiter=DELIM

Use
DELIM instead of LINE FEED for row delimiter . DELIM can be a
string or an ASCII value in the format '\nnn'

Escape sequences are not
allowed.

Provide the string by which
the rows of a table will be separated. By default, it is set to
LINE FEED (\n)

-d,
--databases=DB_LIST

DB-LIST
is made up of one database name, or many names separated by
commas.

Each
database name can be optionally followed by table names.

The
table names must follow the database name, separated by HYPHENS

Example:
-d=db_name1-table1-table2,dbname2-table1,dbname3

Import entries for some
databases, optionally include only specified tables.

-f,
--fields=LIST

Similar
to cut command, LIST is made up of one range, or many ranges
separated by commas.Each range is one of:

N
N'th byte, character or field, counted from 1

N-
from N'th byte, character or field, to end of line

N-M from N'th to M'th
(included) byte,

character
or field

-M
from first to M'th (included) byte, character or field

Import
entries for some fields only in a table

-h, --help

Display help

Integration
with HIVE:

Hive
runs on top of Hadoop. It
is sufficient to install Hive only on the Hadoop master node.

Take
note of the default data warehouse directory, set as a property in
hive-default.xml.template configuration file. This must be the same
as the base directory into which Hadoop Applier writes.

Since
the Applier does not import DDL statements; you have to create
similar schema's on both MySQL and Hive, i.e. set up a similar
database in Hive using Hive QL(Hive Query Language). Since timestamps
are inserted as first field in HDFS files,you must take this into
account while creating tables in Hive.

SQL Query

Hive Query

CREATE TABLE t (i INT);

CREATE
TABLE t ( time_stamp INT, i INT)

[ROW
FORMAT DELIMITED]

STORED AS TEXTFILE;

Now,
when any row is inserted into table on MySQL databases, a
corresponding entry is made in the Hive tables. Watch the demo to get
a better understanding.The demo is non audio, and is meant to be followed in conjunction with the blog.You can also create an external table in hive
and load data into the tables; its your choice!

In
the first version we support WRITE_ROW_EVENTS, i.e. only insert
statements are replicated.

We
have considered adding support for deletes, updates and DDL's as
well, but they are more complicated to handle and we are not sure how
much interest there is in this.

We
would very much appreciate your feedback on requirements - please use
the comments section of this blog to let us know!

The
Hadoop Applier is compatible with MySQL 5.6,
however it does not import events if binlog checksums are enabled. Make sure to set them to NONE on the master, and
the server in row based replication mode.

This
innovation includes dedicated contribution from Neha Kumari, Mats
Kindahl and Narayanan Venkateswaran. Without them, this project would
not be a success.

One question: does binlog.wait_for_next_event(&event); needs 'event' memory be released?I find examples/mysql2hdfs/ releasing memory with 'delete event;', while the rest of the examples are not (memory leak?).

Thank you for trying out the product, and it is great to see you willing to contribute to it. Thanks!

The product is GPL, so it is open source by the definition of the Open Source Initiative, and you can definitely contribute to it. To start with patches, you have to sign the OCA, Oracle contributor agreement.

However,please note that we are refactoring the code, and I am not sure if your patches would work.

I wanted to ask about how is the translation done from MySQL schema to Hive Schema. Does that have to be done as offline process for both the systems separately, or simply creating schema in 1 system, say MySQL, will allow us to replicate data to HDFS and also put auto-generated Hive schema on top of that?

Yes, at this point, only row format is supported by the Applier. Mixed mode is not completely supported, i.e. in this mode, only the inserts which are mapped as (table map+row events) in MySQL will be replicated.

Thank you for the question. Can I please request for a use case where this is a requirement? It can help us shape the future releases of the Applier.

Sorry, the issue with the compilation is because you are using the libmysqlclient library released with MySQL-5.5.

Since the data types MYSQL_TYPE_TIME2, MYSQL_TYPE_TIMESTAMP2, MYSQL_TYPE_DATE_TIME2 are defined for the latest release of MySQL (5.6 GA)only, 'make' command fails.

This is a bug, and has been reported.

In order to compile, I suggest to please use either the latest released version of MySQL source code (5.6), or the latest GA release of connector C library(libmysqlclient.so -6.1.1).You can get a copy of the connector here.

Hope it helps.Please reply on the thread if you are still facing issues!

Thank you, I've successfully compiled out the applier after changed MySQL to 5.6. But run applier comes out an error says that "Can't connect to the master.",[root@localhost mysql2hdfs]# ./happlier --field-delimiter=, mysql://root@127.0.0.1:3306 hdfs://localhost:9000The default data warehouse directory in HDFS will be set to /usr/hive/warehouseChange the default data warehouse directory? (Y or N) NConnected to HDFS file systemThe data warehouse directory is set as /user/hive/warehouseCan't connect to the master.

The above error means that the applier is not able to connect to the MySQL master. It can be due to two reasons:- MySQL server is not started on port 3306 on localhost- You nay not have the permissions to connect to the master as a root user.

To be sure, could you try opening a seprarate mysql client, and connect to it using the same params, i.e. user=root, host=localhost, port=3306?

Good that you can run it using mtr. :)The problem is that the MySQL master is mis-configured, since you do not provide server-id in the cnf file. Please add that too in the conf file.

The file /etc/my.cnf should contain at least

[mysqld]log-bin=mysqlbin-logbinlog_format=ROWbinlog_checksum=NONEserver-id=2 #please note that this can be anything other than 1, since applier uses 1 to act as a slave (code in src/tcp_driver.cpp), so MySQL server cannot have the same id.port=3306

1. How can Applier connect to a server with password?2. If I want to collect data from more than one MySQL Server in the same time, how can I implement it with Applier? To write a shell script to set up many Applier connections together? Can you give me some advice?

1. You need to pass the MySQL uri to the Applier in the following format user[:password]@host[:port]For example: ./happlier user:password@localhost:13000

2. Yes, that is possible. However, one instance of the applier can connect to only one MySQL server at a time. In order to collect data from multiple servers, you need to start multiple instances of the applier ( you can use the same executable happlier simultaneously for all the connections).

Yes, you may write a shell script to start a pair of MySQL server and the applier, for collecting data from all of them).

Also, I find it very interesting to improve the applier in order that a single instance can connect to more than one server at a time. We might consider this for the future releases. Thank you for the idea. If I may ask, can you please provide the use case where you require this implementation?

We are a game company that operates many mobile and internet games. Most of the games use MySQL as database. Data produced by games and players are daily growing. Game operation department needs to know information of games then make marketing decisions.

In order to store and analyze the huge amount of data. We used Hadoop. First we used Sqoop to collect and import data from multiple MySQL server. And developed a system to manage all collecting tasks, like create tasks via plan, view the process and report. However, in order not to affect the running of game, collection always running at night. So when we get the status information about the games. There is so much delay. Then I found Applier, I think the real time data replicate manner is great, so I want to replace our collecting with Applier.

This is our use case. :)

I'm looking forward to see applier's future releases. And If the applier can support connecting multiple servers in a single instance. maybe you can also provide a tool to manage and control the process.

Thanks a lot for the wonderful explanation. The Applier is aimed to solve issues with exactly such delays involved in the operation.

It is a very valid use case for the Applier, and I can clearly mark out the requirement of supporting data feed from multiple servers. Thanks once again, this will help us decide on the future road map for the Applier.

Please stay tuned for the updates on the product, and provide us with any other feedbacks you have.

The warning is generated because of the inclusion of the name "basic_transaction_parser.cpp" twice in the cmake file while setting the targets for the library.( code: mysql-hadoop-applier-0.1.0/src/CMakeLists.txt : line no. 5 and line no. 7)

This is our fault, thank you for pointing this out. This will be fixed in the next release.

For now, to fix it I request you to do the following:

-please modify the file mysql-hadoop-applier-0.1.0/src/CMakeLists.txt to remove any one of the two names (i.e. basic_transaction_parser.cpp , either from line no.7 or line no. 5)

-execute rm CMakeCache.txt from the base dir (/home/thilanj/hadoop/mysql-hadoop-applier-0.1.0), if it exists

I've been trying to execute "make -j8" command as the video tutorial. But getting the following set of errors, the files mentioned in the error log are already there but still getting this error. Please advice

Can you please mention how are you addressing the dependency on libmysqlclient- using MySQL source code, or the connector/C library directly?

Please make sure of the following:If you are using MySQL source code for the mysqlclient library, make sure 1. The MySQL source code is built (i.e. run the cmake and make command on MySQL source code) 2. Set the environment variable MYSQL_DIR to point to the base directory of this source code (please note, donot give the path upto the lib directory, but only the base directory) 3. Please check that the file 'my_global.h' is present in the path $MYSQL_DIR/include and the library libmysqlclient.so in $MYSQL_DIR/lib 4. Delete CMakeCache.txt from the Hadoop Applier base directory (rm CmakeCache.txt) 5. Run cmake and make again.

If you are using the library directly, make sure that you have the following:If not explicitly specified, 1. The above mentioned files (my_global.h) must be in the standard header paths where the compiler looks for. (/usr/lib) 2. The library should be in the standard library paths

Have you considered replication to HBase, utilizing the versioning capability (http://hbase.apache.org/book.html#versions) to allow a high fidelity history to be maintained to support time series analysis etc?

Thank you for trying out the Applier.As I see it, the problems is while linking to libjawt libraries.

Can you please make sure of the following:1. Do you have the JAVA_HOME set ?2. Do you have CLASS_PATH set to point to jars required to run Hadoop itself? (command ~: export CLASSPATH= $(hadoop classpath) )3. Can you please try running Hadoop and check if it runs fine?

First, thanks for the very interesting code. This will be very useful.

At the moment there are some problems, one of which is that I found a case where it is skipping the wrong table. The issue seems to be that mysqld can change the numerical table_id associated with a given table (I think that in my particular case this was associated with a restart of mysqld and a re-reading of the logs from before the restart). Anyway, looking at the code from Table_index::process_event, which processes the TABLE_MAP_EVENTs, two issues arise:

1) If the table_id associated with the map event is already registered, the code ignores the update. Should the update cause the map to, well, update?

Thank you for trying out the applier! It is very encouraging to see you take interest in the code base. This shall make us overcome the shortcomings faster.

Regarding the question:

1. Only committed transactions are written into the binary log. If the server restarts, we don't apply binary logs. Therefore, the table-id will not change, or updated, for a map event written in the binary log.

2. Yes, there is a memory leak in the code here. This is a bug, thank you for pointing it out.

You may report it on http://bugs.mysql.com , under the category 'Server: Binlog'. Or, I can do it on your behalf. Once it is reported, we will be able to commit a patch for the same.

I am not an expert on MySQL, but I've been reading the documentation and the source code you provided.

Looking at this URL:http://www.mysqlperformanceblog.com/2013/07/15/crash-resistant-replication-how-to-avoid-mysql-replication-errors/

It apparently is the responsibility of the replication server to keep track of where it is in the bin-logs. Otherwise, if the replication server is restarted, it will begin reading as far back as it possibly can.

The overloads to Binary_log_driver::connect seem to provision for this, but this feature does not seem to be used in the example code.

I am sorry about the delay in the reply.Thank you for trying out the Applier and looking into the code base as well as the documentation, it will be very helpful to improve the Applier!

Yes, it is the responsibility of the replication server to keep track of where it is in the bin-logs. As you mention correctly, if not kept track, the server, if restarted will begin reading from the start.

The Applier currently suffers from this issue. If restarted, the Applier reads again from the first log. This is a feature enhancement, and should be addresssed. Thank you once again for pointing this out!

You may please feel free to report it on http://bugs.mysql.com , under the category 'Server: Binlog', marking the severity as 4(feature request). Or, I can do it on your behalf. Once it is reported, we will be able to commit a patch for the same.

From the errors, it seems that the applier is not able find the shared library, 'libhdfs.so'.Which Hadoop version are you using? The library comes pre compiled for 32 bit systems with Hadoop, but you need to compile it for 64 bits.

You may please try locating libhdfs.so on your system (inside HADOOP_HOME) and make sure the path to it is in LD_LIBRARY_PATH.

You may also check the contents of the file CMakeCache.txt, to see at what location is the applier trying to search the library.

Thanks for trying the applier.1) Yes it works with Hadoop 2.2.X, but you might need to change the library and include path in FindHDFS.cmake file.2) We have considered adding update and delete, but there are no concrete plans yet.3) I am sorry but we have not decided on that yet.

I am trying to understand and use hadoop applier for my project. I ran through all the steps. However i am having some problems. I don't have a strong software background in general. So my apologies in advance if something seems trivial To make it easier i will list all the steps concisely here in regards to Install and configure tutorial.

1) & 2) Hadoop is downloaded. I can run and stop all the hdfs and mapred daemons correctly. My hadoop version is 1.2.1. My $HADOOP_HOME environment variable is set in .bashrc file as "/home/srai/Downloads/hadoop-1.2.1"

3) & 4) I downloaded a FindHDFS.cmake file and modified it according to the patch which was listed. I placed this file under the following directory "/usr/share/cmake-2.8/Modules". I thought if i place this under the module directory the CMAKE_MODULE_PATH will be able to find it. I am not sure if this is correct or how do i update CMAKE_MODULE_PATH in the CMAKELists.txt and where?

5) FindJNI.cmake was already present in the directory /usr/share/cmake-2.8/Modules so i didn't change or modify it. My JAVA_HOME env variable is set in bashrc file as "/usr/lib/java/jdk1.8.0_05". I didn;t modify or touch LD_LIBRARY_PATH.

6) Downloaded hadoop applier and mysql-connector-c. Since the tutorial says " 'mysqlclient' library is required to be installed in the default library paths", i moved the files of mysqlconnector-c to /usr/lib/mysql-connector-c. I also declared a variable $MYSQL_DIR to point to "/usr/local/mysql"

Could not find a package configuration file provided by "CLucene" with any of the following names:

CLuceneConfig.cmake clucene-config.cmake

Add the installation prefix of "CLucene" to CMAKE_PREFIX_PATH or set "CLucene_DIR" to a directory containing one of the above files. If "CLucene" provides a separate development package or SDK, be sure it has been installed.

-- Architecture: x64-- HDFS_LIB_PATHS: /c++/Linux-amd64-64/lib-- HDFS includes and libraries NOT found.Thrift support will be disabled (, HDFS_INCLUDE_DIR-NOTFOUND, HDFS_LIB-NOTFOUND)-- Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH) CMake Error: The following variables are used in this project, but they are set to NOTFOUND.Please set them or make sure they are set and tested correctly in the CMake files:HDFS_INCLUDE_DIR (ADVANCED) used as include directory in directory /home/srai/Downloads/mysql-hadoop-applier-0.1.0/examples/mysql2hdfsJAVA_AWT_LIBRARY (ADVANCED) linked by target "happlier" in directory /home/srai/Downloads/mysql-hadoop-applier-0.1.0/examples/mysql2hdfsJAVA_JVM_LIBRARY (ADVANCED) linked by target "happlier" in directory /home/srai/Downloads/mysql-hadoop-applier-0.1.0/examples/mysql2hdfs

I don't understand how cmake is using my environment variables to find these files. As i mentioned i am a newbie so please if someone can help me compile the hadoop applier i will really appreciate it.

Thank you for the detailed message, and thank you for trying out the applier!

Everything looks fine except for one issue.The error is that cmake is unable to find the libraries correctly.The HDFS_LIB_PATHS is set as "/c++/Linux-amd64-64/lib", but it should be"/home/srai/Downloads/hadoop-1.2.1/c++/Linux-amd64-64/lib".

This implies that the variable HADOOP_HOME is not set, on the terminal where you are running cmake.

Before executing the cmake command can you runecho $HADOOP_HOMEand see that the output is /home/srai/Downloads/hadoop-1.2.1 ?

Hope that helps. Please notify us in case you are still having an erro.

I double checked my hadoop_home path by doing echo $HADOOP_HOME and i see my output is /home/srai/Downloads/hadoop-1.2.1.

I am not sure why $ENV{HADOOP_HOME}/src/c++/libhdfs/ does not prefix my hadoop home to this path. Can it be because CMAKE_MODULE_PATH cannot find FindHDFS.cmake and FindJNI.cmake? I put the FindHDFS.cmake in the modules under /usr/share/cmake-2.8/Modules and FindJNI.cmake was already there. Also I don't define or use LD_LIBRARY_PATH anywhere.

I modified the FindHDFS.cmake as suggested in step 4 of the tutorial. This might seem silly , however i am not sure what this means or where this modification will go :

--- a/cmake_modules/FindHDFS.cmake+++ b/cmake_modules/FindHDFS.cmake

Also if you can elaborate a bit more on steps 7 and 8, i will really appreciate it.

Thanks for your response. I was able to run the 'cmake' command on the parent directory of the Hadoop Applier source. In my case i ran "sudo cmake . -DENABLE_DOWNLOADS=1 mysql-hadoop-applier-0.1.0 " and then "make happlier" from the terminal.

Could not find a package configuration file provided by "CLucene" with any of the following names:

CLuceneConfig.cmake clucene-config.cmake

Add the installation prefix of "CLucene" to CMAKE_PREFIX_PATH or set "CLucene_DIR" to a directory containing one of the above files. If "CLucene" provides a separate development package or SDK, be sure it has been installed.

CMake Warning at examples/mysql2hdfs/CMakeLists.txt:3 (find_package): By not providing "FindHDFS.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "HDFS", but CMake did not find one.

Could not find a package configuration file provided by "HDFS" with any of the following names:

HDFSConfig.cmake hdfs-config.cmake

Add the installation prefix of "HDFS" to CMAKE_PREFIX_PATH or set "HDFS_DIR" to a directory containing one of the above files. If "HDFS" provides a separate development package or SDK, be sure it has been installed.

-- Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY) CMake Error: The following variables are used in this project, but they are set to NOTFOUND.Please set them or make sure they are set and tested correctly in the CMake files:JAVA_AWT_LIBRARY (ADVANCED) linked by target "happlier" in directory /apps/abm/install/mysql-hadoop-applier-0.1.0/examples/mysql2hdfsJAVA_JVM_LIBRARY (ADVANCED) linked by target "happlier" in directory /apps/abm/install/mysql-hadoop-applier-0.1.0/examples/mysql2hdfs

Looking at the error that you have pasted here, I am guessing that the file FindHDFS.cmake was not found at the locations given in CMAKE_MODULE_PATH. Can you please check that once more, and let me know if the error persists.

There will be a lot of difference in attending hadoop online training center compared to attending a live classroom training. However, websites like this with rich in information will be very useful for gaining additional knowledge.

In file included from /home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/my_sys.h:26:0, from /home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/hash.h:22, from /home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/sql_common.h:26, from /home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/protocol.h:27, from /home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/binlog_driver.h:25, from /home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/access_method_factory.h:24, from /home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/src/access_method_factory.cpp:20:/home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/mysql/psi/mysql_thread.h:88:3: error: ‘my_mutex_t’ does not name a type my_mutex_t m_mutex; ^/home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/mysql/psi/mysql_thread.h:116:3: error: ‘native_rw_lock_t’ does not name a type native_rw_lock_t m_rwlock; ^/home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/mysql/psi/mysql_thread.h:132:3: error: ‘rw_pr_lock_t’ does not name a type rw_pr_lock_t m_prlock; ^/home/hamdi/Downloads/mysql2hadoop/mysql-hadoop-applier-0.1.0/include/mysql/psi/mysql_thread.h:173:3: error: ‘native_cond_t’ does not name a type native_cond_t m_cond;

Is there any way to know the column name (as String) in the insert transaction? For example:INSERT INTO runoob_tbl1 (runoob_title, runoob_author, submission_date)VALUES("Learn PHP", "John Poul", NOW());

We can get "Learn PHP", "John Poul" from mysql::Row_of_fields, how can we get the corresponding column name - runoob_title, runoob_author? thanks

Really good article. Thanks for sharing very detailed information with us. Here we are in the field of providing a detailed and complete support for those who wanna complete the certification in cca 175 certification and cca 500 certification and more for more info visit us at cca175certification.comcca 175 certificationcca 500 certification