Splunk Spawns Hunk Hadoop Tool

Splunk brings its analytic know-how to the multi-structured data living in HDFS with Hunk, a new stand-alone system aimed at Hadoop users.

Hunk is the catchy code name for Splunk Analytics for Hadoop, a new beta product introduced by Splunk on Wednesday. As the formal name suggests, the product brings the company's machine data analytics capabilities to data residing in Hadoop.

Splunk made its name (and hundreds of millions of dollars in an IPO) gaining insight from machine data. It does so with an ad hoc query language expressly designed to make sense of the highly variable data streaming out of server log files, sensors and other machine data sources that bring complexity to data centers.

Splunk's existing customers -- more than 5,600 -- use the proprietary, high-scale back end of Splunk Enterprise to store data, but with many companies now dumping all their data into Hadoop clusters, it only made sense to bring Splunk's analytic capabilities to Hadoop.

"Companies are trying to extract value from Hadoop, but the work is quite low-level and technical, and it takes lots of services and highly specialized resources to do the work," Sanjay Mehta, Splunk's VP of product marketing, told InformationWeek. "Hunk gives them an easy way to interact with and get value out of that data."

Splunk caught on as a tool for IT departments to track operational problems in high-scale systems such as e-commerce sites. But customers like Expedia that initially used Splunk to keep Web sites up and running are also answering business-relevant questions such as how many inquiries and searches are we getting, and is our traffic coming from unpaid search, advertisements or keyword buys?

Hunk makes sense out of massive data stores on Hadoop first by applying a Splunk Virtual Index that provides metadata. The index supports the same Splunk Search Processing Language used in the company's Splunk Enterprise product. Users can then explore, detect patterns and anomalies and drill down on terabyte- and petabyte-scale Hadoop clusters.

Users can also uncover correlations with structured data using Splunk DB Connect to link data in relational databases to an analysis. Hunk also has reporting, data visualization and dashboarding tool, so you can turn valuable correlations and reports into always-on, production analyses.

Hunk offers query acceleration, stored statistics, scheduling and access-control features that aren't purpose-built in Hadoop. And while it's certainly possible to code analyses from scratch in Hadoop, Splunk says Hunk offers a shortcut around the hard work of inventing and coding each and every inquiry.

"Splunk is a command-based search language with more than 100 technical commands, and it's designed explicitly for this kind of data," said Mehta. "Whether you're a data architect, a data scientist or a data analyst, we make it easier to analyze data without having to work at the low level of MapReduce and HDFS."

Hunk will be priced and packaged separately from the company's standard, Splunk Enterprise product. Pricing has yet to established, as the general release isn't expected until year end, but it's likely to be on a per-node basis.

Will companies still have a reason to buy Splunk Enterprise when they can use Hunk to exploit Hadoop as a general-purpose big-data repository? That decision will come down to an economic analysis of what it costs to do analyses with Splunk Enterprise versus what it costs to do them with Hunk on top of Hadoop, according to Gartner big data analyst Merv Adrian.

"These data streams aren't necessarily being put in HDFS today, and they may not go there in the future unless the customer has made a clear investment in Hadoop and they have the cluster set up and they want to throw these analyses in there, too," Adrian told InformationWeek.

With SQL options like Cloudera Impala and improvements to the Hive query interface also in the works, it's clear that the analytic possibilities on top of Hadoop are only going to get richer.

To be effective, business technology pros gather information and interact with peers in a variety of ways. InformationWeek and its parent company, UBM Tech, are looking to discover what information you want and how you like to receive it, as well as your feelings on interactive communities, online content and live events. The results will help our editors develop products and services that best meet your needs. Take this survey and tell us how you like your tech content: Digital, live, opinionated? Tell us and enter to win a 32-GB Google Nexus 7 tablet.

Thanks for such a great post. Hunk is a full-featured platform for rapidly exploring, analyzing and visualizing data in Hadoop and NoSQL data stores. Based on years of experience building big data products deployed at thousands of Splunk customers, Hunk drives dramatic improvements in the speed and simplicity of getting insights from raw, unstructured big data. Simply point Hunk at your MapR cluster and start exploring and analyzing data immediately. Drive deep analysis, detect patterns and find anomalies across terabytes or petabytes of raw data without the specialized training or fixed schemas required by alternate approaches using Apache Hive or SQL on Hadoop.

Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.

Why should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.