Now let’s try to import that data into Hive ! Without any specific parameters Sqoop will try to find the primary key to make a mapreduce job based on that key to parallelize the transfer, but you can specify which column Sqoop use with the –split-by parameter.
You can transfer only specific columns with –columns, specify target-dir (if you don’t want table to be in the default warehouse dir) with –target-dir, etc

16/10/23 13:48:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
16/10/23 13:48:12 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@38009ade is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@38009ade is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931)
at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2735)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1899)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)

That’s should be caused by an old version of mysql-connector. Upgrade to a 5.1.30+ version (should be available in /usr/share/java), if you already have that version in /usr/share/java then make the hadoop lib a symbolic link to that.

Comment : you can see the query for selecting boundaries, which will make the splits occuring to parallelize the import.
Here it sees ids going from 1 to 20, deciding (default) to use 4 splits so first map job will copy from id 1 to 5, second job from 6 to 10, etc.

We’ll submit our first workflow, a shell action example, but first we have to modify some parameters in the job.properties file.
The 2 main (and mandatory) files are job.properties and workflow.xml : the former including parameters, the latter the definition itself.

Here we have to modify the jobTracker to point to the ResourceManager, so it goes from localhost:8021 to sandbox.hortonworks.com:8050

The nameNode goes from hdfs://localhost:8020 to hdfs://sandbox.hortonworks.com:8020, the other parameters doesn’t need any change.

Now to be submitted, the workflow.xml needs to be put on HDFS, because Oozie server works only with files on HDFS (and this is an important point since it can lead you to further mistakes : for example custom hdfs-site.xml or hive-site.xml will need to be put somewhere on HDFS for Oozie to know them)

In the job.properties example we put that path to NN/user/ambari-qa/examples/apps/shell, so let’s make that :

put the jar in the lib : either you want to put in for “global” Hadoop, or maybe here just for Hive, so put it in /usr/hdp/2.2.4.2-2/hive/lib/

now adjust log4j properties to use rolling.RollingFileAppender instead of DRFA (Daily Rolling File Appender) using Ambari (for the example, in Advanced hive-log4j of the Hive service configs) or in Hive log4j.properties