Loading sample data from different file formats

You will use Apache Zeppelin to create tables and load sample data from files in different file formats already created in HDFS in SAP Vora, developer edition, on CAL.

You will learn

You will learn how to load sample data from Parquet and ORC file formats.

SAP Vora support loading data not only from CSV format, but as well Hadoop-specific Parquet and ORC file formats.

For this tutorial SAP Vora, developer edition, has already sample files preloaded into HDFS. You can see them by executing following statements in the host’s operating system as user vora.

hdfs dfs -ls *.orc
hdfs dfs -ls *.parquet

Similarly to loading sample CSV files you will use Zeppelin with predefined notebook here as well. To open Zeppelin web UI click on Connect in your SAP Vora instance in CAL, and then pick Open a link for Application: Zeppelin.

Once Zeppelin opens up in a new browser window, check it is Connected and if yes, then click on 2_DataTypes notebook.

First you will create a table SALES_P using Apache Parquet file format.

Click on Run this paragraph play button first on CREATE TABLE statement and then on SELECT one. Please note format "parquet" option in the first statement.