Note: there is an updated version of this here with steps for Kylin 1.5.4+

Overview: Why Kylin on MapR?

Apache Kylin™ is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, originally contributed from eBay Inc.

Query the Hive tables using SQL and get results in sub-seconds, via Rest API, ODBC, or JDBC. (From Kylin docs)

After hearing significant interest from our customers, we worked with the Kylin support team to find a successful integration path. Kylin, post 1.5.1 release, will work out-of-the-box with MapR. In the meantime, you can follow the steps here for the specified versions and they should be sufficiently adaptable for older release issues.

Note: This article describes how to run Kylin on HBase, not using the HBase APIs to connect to MapR-DB.

Preparation Steps

Please set your $HCAT_HOME environment variable as shown, if it's not already set:

[mapr@ ~]$ echo $HCAT_HOME

[mapr@ ~]$ export HCAT_HOME=/opt/mapr/hive/hive-1.2/hcatalog/

[mapr@ ~]$ echo $HCAT_HOME

/opt/mapr/hive/hive-1.2/hcatalog/

Kylin Install Process

Update 8/5/16: Kylin 1.5.2+ should not require the patch noted below.

To begin, you'll have to retrieve the Kylin 1.5.1 for HBase 1.1.3 binary file and unzip it. It's important to have this directory be owned and accessible by a user with MapReduce job permissions (ex. 'mapr'):

Next, you need to set $KYLIN_HOME to point to the new directory, for example:

[mapr@ ~]$ export KYLIN_HOME=/home/mapr/apache-kylin-1.5.1-bin

Before running Kylin, we'll have to patch it. This will not be necessary in later releases, but has been provided by Kylin support to help with 1.5.1 and older versions to account for path differences in Hive versions. Please see this "diff" to identify the changes that you will have to make to $KYLIN_HOME/bin/find-hive-dependency.sh:

echo "Couldn't locate hcatalog installation, please make sure it is installed and set HCAT_HOME to the path."

exit 1

--

We've also attached a working version of the whole find-hive-dependency.sh file with the correct changes to this document. It should be relatively easy to adapt older versions using this but we do not recommend substituting the file in case there have been changes in the versioning.

Starting Kylin for the First Time

To start Kylin, run the following:

[mapr@ ~]$ $KYLIN_HOME/bin/kylin.sh start

On the first start, it may take a few minutes to create the initial Hive and HBase tables. When it's done, visit the Kylin Web UI by replacing <host> in this web address with your hostname for the server you've installed Kylin on:

Building a Sample Cube

Once you've confirmed that you have access to the Kylin WebUI, you can load the provide sample data by running the following (taken from Kylin docs):

[mapr@ ~]$ $KYLIN_HOME/bin/sample.sh

KYLIN_HOME is set to /home/mapr/apache-kylin-1.5.1-bin

Going to create sample tables in hive...

...

Sample cube is created successfully in project 'learn_kylin'; Restart Kylin server or reload the metadata from web UI to see the change.

To restart Kylin, please run the following and then log into the WebUI again to continue:

[mapr@ ~]$ $KYLIN_HOME/bin/kylin.sh stop

[mapr@ ~]$ $KYLIN_HOME/bin/kylin.sh start

In the WebUI, select "learn_kylin” from the project drop-down list:

Select "build" from the Action/s menu for the kylin_sales_cube and then set the end date to today to load the entire data set (10,000 records):

You can follow the progress of this build process in the Monitor tab. When it reaches 100%, we can move on to running a sample query.

Queries are run from the Insight tab. Below is a test query with expected results that you can run:

select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt

Possible Issues

Problems Querying the Resource Manager: error check status

If your build fails at some step in the process but you can see in the Resource Manager that this step/job completed, it's possible that Kylin isn't able to reach the Resource Manager to query the job status. The error in $KYLIN_HOME/logs/kylin.log will look something like this:

This problem is noted, and a build patch supplied in KYLIN-1319. But, steps are also provided for a configuration patch for this issue here. To manually set how Kylin finds your Resource Manager, add the following to $KYLIN_HOME/conf/kylin.properties:

Note: this is not recommended for High Availability situations. Please watch the noted JIRA for resolution.

Coprocessor Support: java.lang.UnsupportedOperationException: coprocessorService is not supported for MapR

If you have a query failing to run, and you see something similar to the error below, you are running Kylin on MapR-DB tables using the HBase API.

java.lang.RuntimeException: Error when visiting cubes by endpoint:

at org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$1.run(CubeHBaseEndpointRPC.java:324)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.UnsupportedOperationException: coprocessorService is not supported for MapR.

Unfortunately, MapR-DB does not support coprocessors, so this will not work. Fortunately, MapR supports HBase in standalone mode as well. So, first you'll have to remove the mappings that are set to map your Kylin tables to HBase (or any wildcard mappings) from /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml (will need to be root or a sudoer to do this).

Getting "java.lang.UnsupportedOperationException: coprocessorService is not supported for MapR" . Do you mean we need to install hbase in standalone mode ? How it is going to scale for big cubes? Or do we have to install hbase in cluster mode parallel to maprdb.