Mastering Big Data with MapR and Syntelli

Using Big Data the right way can give your business the competitive edge it needs to succeed. Unfortunately, it’s all too easy to get lost in the complex Big Data jungle. Before you know it, you’re stuck between Hadoop infrastructure, software, install, and configuration issues that pop up every step of the way. If this happens, you can lose the path and never see its true benefits.

Even when you succeed in creating an enterprise data lake—combining structured, semi-structured, and unstructured data—it may not be available to end-users for analytics and self- service reporting until it has been cleaned up and massaged.

The Internet of Things (IoT) is impacting this as well. It is critical for organizations to use data efficiently to be competitive. IoT data from sensors and social media data are typically in the form of a complex JavaScript Object Notation (JSON) format which cannot be utilized by business users directly with ANSI SQL. In order to make it usable, IT has to maintain expensive Extract, Transform and Load (ETL) cycles and maintain schemas. Anytime a schema changes or a new attribute needs to be added, the full cycle of development happens again.

Spotfire provides the analytics on top of unstructured data using Apache Drill. Drill supports a variety of NoSQL databases and file systems, including:

HBase

MongoDB

MapR-DB

HDFS

MapR-FS

Amazon S3

Azure Blob Storage

Google Cloud Storage

Swift

NAS

Local files

A single query can join data from multiple data stores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.

Spotfire and Apache Drill also offer:

Self-service raw data exploration: You can explore and analyze raw data sets of any complexity as they arrive on Hadoop using Spotfire. There is no need for expensive ETL cycles by IT to maintain schemas; you can do instant joins across newly ingested data and find insights.

Insights on structured/semi-structured data: You can now use SQL to natively query and manipulate complex/semi-structured JSON data originating from NoSQL applications, such as web/mobile and sensor-equipped Internet of Things (IoT) devices. Apache Drill allows instant flattening and native querying of complex nested data.