File hive-site.xml was copied from the /etc/hive/conf to /conf directory of spark. Then the cluster was started by sbin/start-all.sh and I see in the logs that there are two workers – everything seems fine:

When I am trying to connect to Hive’s address (port 10000), I see this:

but I would like to connect also to Spark SQL and when connecting to port 10001, I am having problems:

So, the first thing that I am trying to eliminate is to avoid external components (I mean Windows), and run Spark’s beeline JDBC client with command

bin/beeline

and then

beeline> !connect jdbc:hive2://sandbox:10001

scan complete in 5ms

Connecting to jdbc:hive2://sandbox:10001

Enter username for jdbc:hive2://sandbox:10001: pdi

Enter password for jdbc:hive2://sandbox:10001: ***

Connected to: Spark SQL (version 1.3.1)

Driver: Hive JDBC (version 0.13.0.2.1.1.0-385)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://sandbox:10001>

So, I am connected.

In my hive I have a table logsjava, so if I select it with

select * from logsjava limit 10;

I get the result:

It means that Spark SQL is working fine: the thrift server is functioning, and it cooperates with hive metastorage to fetch the data and output the result to beeline client (which connected to it using JDBC). Nice.

We get some java exception – the same one which was obtained when we tried to connect from Windows ODBC. The conclusion is next: client side is not guilty; it is worth looking at the logs of Spark SQL thrift server.

Now let us open the file logs/spark-pdi-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-sandbox.hortonworks.com.out (in the Spark’s directory) and see what happens:

Spark’s thrift server got configuration of Hive from conf/hive-site.xml, and if we open this file – we can see this:

<property>

<name>hive.exec.pre.hooks</name>

<value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>

</property>

Now the question is to find out “what is this hook, do we really need it, and can we unplug it? If we cannot – shall we do so that Spark’s thrift server see’s the jar with this hook in the classpath?”.

NICE! I am so happy that I know what does this hook do, but an obvious question is next: HOW DOES THIS HOOK CAME INTO CONFIGURATION FILE HIVE-SITE.XML, IF AT THE MOMENT OF RELEASE 0.13 IT DID NOT EXIST???

The answer is next: when the guys from Hortonworks created the sandbox virtual machine, they played with hive “by hands”, instead of applying some standard installation procedures.

It is called “WELCOME TO THE OPEN SOURCE, GUYS!”.

The solution is next: open the file conf/hive-site.xml, and in (!) three places disable this block of XML:

<property>

<name>hive.exec.failure.hooks</name>

<value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>

</property>

by putting XML comments or by removing it at all.

Then restart the thrift server by running sbin/stop-thriftserver.sh and then sbin/start-thriftserver.sh.