Splunk Extends Analysis To NoSQL Databases

Splunk keeps rolling along, well ahead of an open-source threat that some thought might flatten it. The company last week sprinted ahead yet again, introducing advances to its Splunk Enterprise flagship product and Hunk platform for Hadoop, both of which are designed to "search, monitor, analyze, and visualize machine-generated big data."

Splunk is a successful commercial vendor thriving in a big data market that is otherwise dominated by open-source products including Hadoop and various NoSQL databses. Splunk's "search" capabilities include algorithms for clickstream analytics, machine-data analysis, IT operational analytics, risk analysis, and customer-service usage and patterns of behavior.

The majority of Splunk's 7,000-plus customers use Splunk Enterprise, which gained several significant upgrades with last week's 6.1 release. Expanding on high-availability clustering in 6.0, the product added support for multisite clustering, ensuring continuous availability across geographically distributed deployments. Splunk's analysis algorithms also take advantage of multisite clustering though a feature called Search Affinity.

"If you're in, say, Europe, Search Affinity has the smarts to only go to the local instance, even if the data of interest originated in North America, so it's going to reduce latency, improve performance, and decrease network usage," said Sanjay Mehta, Splunk product marketing VP, in a phone interview with InformationWeek.

Designer improvements in Splunk Enterprise 6.1 make it easier to deliver advanced dashboards without coding.

Splunk is used mostly by IT types who can handle technical interfaces, but the 6.1 analytical interfaces have been simplified to help users. A new dashboard editor eliminates XML coding that used to be required to build advanced dashboards. Charting capabilities have also been improved, with pan-and-zoom controls and new chart types and overlays.

Splunk Enterprise 6.1 makes visualizations embeddable, so if you want to add updating charts or reports to Salesforce.com, NetSuite, or SharePoint sites, you can embed an object for broad business-user consumption.

Alerting capabilities have also been improved in 6.1, adding contextual insight into patterns of interest. Instead of just sending an alert that a website outage is imminent, for example, the product can also include insight into what's causing the condition so users can take appropriate actions.

"We've always been able to send alerts, but now we can also relay details on activities that are hitting extreme levels or that have crossed a certain ratio so you can immediately react and ensure that systems stay up and running," Mehta said.

Introduced last October, Hunk brings Splunk's analysis capabilities to the data in Hadoop clusters. It's still "early days" in terms of adoption, according to Mehta. Upgrades in Hunk 6.1 include the ability to cache search results. Hunk applies Splunk algorithms to data in Hadoop Distributed File System (HDFS), but you don't have to wait for results, as you do with MapReduce, and you don't have to create schemas in order to query, as you do with Hive. Hunk automatically brings structure to the data and identifies the fields of interest, claims Splunk.

With the caching added in 6.1, frequent analyses are returned much more quickly, Mehta said. The upgrade also lets Hunk analyze data beyond Hadoop, as an API and connectors are provided for Apache Accumulo and NoSQL databases including Cassandra, MongoDB, and Neo4j.

"The goal is to offer ubiquitous access to multiple platforms, with Hunk providing a virtual index that abstracts the analytics layer from where the data is stored," Mehta explains.

Yes, Hadoop and NoSQL vendors and their communities are working on deeper and easier ways to do analytics. But for now, Splunk's commercial platforms offer a head start in monitoring, measuring, and mining big data, and many customers seem more than willing to pay for that edge.

Could the growing movement toward open-source hardware rewrite the rules for computer and networking hardware the way Linux, Apache, and Android have for software? Also in the Open Source Hardware issue of InformationWeek: Mark Hurd explains his "once-in-a-career opportunity" at Oracle.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

I went splunking (a.k.a. "caving") back in college. The idea of Splunk is to provide an analytical carbide lamp to illuminate that big, dark machine data formerly hidden deep underground. Easier dashboarding and embeddable visualizations are highlights here, but doing analysis across multiple big-data platforms -- Hadoop, Cassandra, MongoDB -- is going to appeal to any shop already using this tool.

Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.

Why should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.