SQLContext and HiveContext Query Performance

SQLContext and HiveContext Query Performance

This post has NOT been accepted by the mailing list yet.

SQLContext and HiveContext Query Performance
Hi,
We are using SQLContext and HiveContext (spark-1.0.0-bin-cdh4) to query data present in our hdfs and are facing some performance issues. The size of the parquet file is 30MB with 150 columns, it takes around 10 sec to execute a simple select query with count(*). Here is the code I am using,
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val pf = sqlContext.parquetFile("hdfs://sandbox.cloudera.com:8020/user/cloudera/dtstoprq")
pf.registerAsTable("pft")
sql("select manufacturer, count(*) as examcount from pft group by manufacturer order by examcount desc").collect()
The performance won’t change even after caching the table,
sqlContext.cacheTable("pft")
sql("select manufacturer, count(*) as examcount from pft group by manufacturer order by examcount desc").collect()

Additional to this we have also tried Shark 0.9.1 with Spark 0.9.0-incubating, SharkServer2 using HiveJDBC in custom app as well as Beeline but for each of this the query execution time seams around 10 Sec.
It will be great if we can get any pointers to improve the execution time, any changes in the configuration if required to query 50MB parquet files and whats the best way to do it.

Info Logs when before the table is cached and after the table is cached are attached