The performance is largely improved when setting scanner caching to 10000But I still encounter a problem.

When loading data to a hbast table via hive, I got a NullPointrExecption:

java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:199) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:696) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:758) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:713) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:758) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:713) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:685) at org.apache.hadoop.hive.hbase.HBaseSerDe.serializeField(HBaseSerDe.java:648) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:560) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568) at shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:73) at shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:72) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399) at shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:72) at shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:133) at shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:138) at shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:138) at spark.scheduler.ResultTask.run(ResultTask.scala:77) at spark.executor.Executor$TaskRunner.run(Executor.scala:98) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724)

Le 01/08/2013 21:00, lars hofhansl a ï¿½crit :> Need to set scanner caching, otherwise each call to next will be an network RTT.>>>> ________________________________> From: Hao Ren <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]> Sent: Thursday, August 1, 2013 7:45 AM> Subject: Why HBase integation with Hive makes Hive slow> >> Hi,>> I have a cluster (1 master + 3 slaves) on which there Hive, Hbase, and> Hadoop.>> In order to do some daily row-level update routine, we need to integrate> Hbase with hive, but the performance is not good.>> E.g. There are 2 tables in hive,> hbase_table: a hbase table created via HiveHao RenClaraVistawww.claravista.fr

> java.lang.NullPointerException> at org.apache.hadoop.hive.serde2.**objectinspector.primitive.**> WritableIntObjectInspector.**get(**WritableIntObjectInspector.**java:35)> at org.apache.hadoop.hive.serde2.**lazy.LazyUtils.**> writePrimitiveUTF8(LazyUtils.**java:199)> at org.apache.hadoop.hive.hbase.**HBaseSerDe.serialize(**> HBaseSerDe.java:696)> at org.apache.hadoop.hive.hbase.**HBaseSerDe.serialize(**> HBaseSerDe.java:758)> at org.apache.hadoop.hive.hbase.**HBaseSerDe.serialize(**> HBaseSerDe.java:713)> at org.apache.hadoop.hive.hbase.**HBaseSerDe.serialize(**> HBaseSerDe.java:758)> at org.apache.hadoop.hive.hbase.**HBaseSerDe.serialize(**> HBaseSerDe.java:713)> at org.apache.hadoop.hive.hbase.**HBaseSerDe.serialize(**> HBaseSerDe.java:685)> at org.apache.hadoop.hive.hbase.**HBaseSerDe.serializeField(**> HBaseSerDe.java:648)> at org.apache.hadoop.hive.hbase.**HBaseSerDe.serialize(**> HBaseSerDe.java:560)> at org.apache.hadoop.hive.ql.**exec.FileSinkOperator.**> processOp(FileSinkOperator.**java:568)> at shark.execution.**FileSinkOperator$anonfun$**> processPartition$1.apply(**FileSinkOperator.scala:73)> at shark.execution.**FileSinkOperator$anonfun$**> processPartition$1.apply(**FileSinkOperator.scala:72)>-- Best regards,