Machine data is one of the fastest growing and most complex areas of Big Data; with the outlook of IoT it’s also one of the most valuable. SanDisk InfiniFlash System and Tegile IntelliFlash enable an Operational Intelligence Data Platform that delivers breakthrough performance, scale and TCO. The machine-generated data 5V challenge is this: how do you optimize Volume, Velocity, Veracity, Variety and Value?

All IT applications, systems and technology infrastructure generate data every millisecond of every day. This machine data is one of the fastest growing and most complex areas of Big Data. It’s also one of the most valuable, containing a definitive record of all user transactions, customer behavior, sensor activity, machine behavior, security threats, fraudulent activity and more.

Making use of this data, however, presents real challenges. Traditional data analysis, management and monitoring solutions are not engineered for this high-volume, high-velocity and highly diverse data. Consider traditional information management systems, such as business intelligence and data warehouse tools. These systems are batch-oriented and designed for structured data with rigid schemas. IT management, security information and event management (SIEM) tools, on the other hand, provide a very narrow view of the underlying data and are hardwired for specific data types and sources. They also don’t provide historical context.

Finding a better way to sift, distill and understand the vast amounts of machine data can transform how IT organizations manage, secure and audit IT. A better way to use data can also provide valuable insights for the business on how to innovate and offer new services, as well as a view into trends and customer behavior.

Splunk as an Operational Intelligence Data Platform

Splunk started as a log-structured analysis system and has since evolved into a full-blown, machine-generated data processing platform. In fact, Splunk isdescribed as “Google for visual analytics.”

Splunk has quickly moved from Predictive Analytics for IT Operations to more broader use cases, such as Security Incident and Event Management (SIEM), Business Analytics with HUNK using virtual indexes, and now also Industrial Internet and the Internet of Things (IoT).

Sizing the Opportunity of IoTAlthough Splunk has established itself as a leader for other use cases, IoT is by far the largest market, and here are a few reasons why:

Cisco estimates that 50 billion devices and objects will be connected to the internet by 2020. Yet today, more than 99% of things in the physical world remain unconnected.

Gartner estimates that by 2020 IoT product and service suppliers will generate incremental revenue exceeding $300 billion, mostly in services.

Infrastructure, volume, bandwidth, security and battery life–all these will change as IoT will make a significant impact on data centers—here’s how.

More companies are making strategic moves into this space. Google made a long-term bet with the $3.2 billion acquisition of Nest, bringing Google into the IoT revolution and into our smart homes.

We at SanDisk are also working on innovation and technology that brings us closer to making IoT pervasive. We recently expanded our commitment to the connected device market with a strategic investment in Altair Semiconductor and I will share more in this blog about our Splunk solution with Tegile.

Splunk Architecture for an Operational Intelligence PlatformIf you are not familiar with Splunk, its tiered architecture is built from various blocks that can be described as follows:

Search Head: used for Searching and Reporting

Indexers: used for Indexing and Search Services

Forwarders: used for Data Collection and Forwarding

Data Management

Indexer Cluster Master, Search Head Cluster Deployer

Distributed Management / Deployment Server

License Master, Distributed Management Console

The search heads allow querying of data sets either using Splunk SPL (Structured Programming Language) or using several applications from the rich eco-system.

The Indexers serve three primary roles:1. Data Storage: processing and parsing at index-time as well as indexing2. Data Management: rotation of data and data tiers (hot / warm / cold) and the aging and removal of data

Both Indexers and Search heads can provide clustered deployments on-prem or in a geo-distributed configuration.

Splunk Enterprise can be deployed in single instance or distributed deployments and has a very broad set of Forwarder support, ranging from network devices to IoT devices. It also has a variety of of connectors which allow indexing data from a number of structured sources (like Enterprise Data Warehouse systems and Operational Systems), as well as from HDFS and S3 based data lakes using HUNK and virtual indexes and an HTTP Event Collector to support DevOps and IoT data analysis.

InfiniFlash-Based Data Grids for Operational Intelligence Platforms

Operational Intelligence Platforms have significant system requirements for operation. The challenges of these new data-driven architectures is to deliver a scalable, resilient and distributed, enterprise-grade platform. InfiniFlash-based data grids can deliver dramatic advantages for building Splunk data platforms as they can support the following:

Faster IngestWith InfiniFlash, you can capture millions of events per second without losing events. New data available is ready for analysis in the shortest timeframe and you can easily scale.

Indexing: Any Performance You NeedThe various formats and availabilities of flash solutions can easily be matched to Splunk tiered pipelines for hot, warm, cold and frozen indexes and for both sequential and random I/Os.Flash can support high throughput batch jobs and low-latency real-time queries while handling disparate data sources and bursty workloads, with the ability to store data in a schema-free way.

Ease of Scalability and Peace of Mind ReliabilityThe superior performance of flash means that IT requires far less hardware to deploy and manage. You can scale easily from terabytes to petabytes with rackscale architectures that require only minimal investment in infrastructure.With InfiniFlash you can safely store multi-terabyte data pools for long periods and have predictable performance with very low annual failure rate (AFR).

Splunk Flash Tiering with SanDisk Big Data Flash and Tegile IntelliFlashFlash has countless benefits over its spinning media, magnetic counterparts. The best way to leverage flash is to deploy it with intelligent software that can take advantage of its benefits and maximize its advantages for the use case.

Our partner Tegile’s patented IntelliFlash OS accelerates metadata handling, which provides key differentiators with its ability to ingest Splunk data into a bucket of memory that extends to a large read-write bucket on SSD. The cache pool is dynamically allocated in real time as data is written to and read from the Tegile array. Metadata and cache can also be increased non-disruptively to meet scaling performance needs. Once the storage pools are attached to the Splunk buckets, no further administration is required and Splunk handles the placement of data.

Splunk and Tegile IntelliFlash automatically place hot, warm and cold data to the appropriate storage tier within the same storage array and filesystem based on the stage of the Splunk pipeline: Ingest, Search, Index, Query, and Visualize. This unique and converged architecture meets Splunk’s application SLAs and negates the need to traverse data over a network between separate storage arrays or filesystems.

Best-in-Class Flash Stack: SanDisk InfiniFlash and Tegile IntelliFlash HDSanDisk InfiniFlash system, which IDC defined as “Big Data Flash,” delivers massive capacity with extreme performance and breakthrough economics. A single InfiniFlash system features up to 512 terabytes (TB) of flash using a new form factor in a 3U enclosure. The solution was designed to take on the massive capacity requirements of Big Data, and deliver accelerated performance at unprecedented economics.

By deploying our joint solution customers can take advantage of accelerating search and index performance and reporting using a flash-based architecture. They can optimally handle both sequential and random I/O requirements during Ingest, Search, Index, Query phases, onboard and analyze larger datasets, optimize resource utilization and use vertical scaling to maximize the use of CPU power.

ConclusionBy dramatically reducing TCO, hardware and energy footprint, SanDisk and Tegile make operational intelligence data platform for machine-generated data and Internet of Things, an accessible reality for more organizations, helping them take advantage of new insight to transform organizations through machine-generated data.

To learn more about the SanDisk InfiniFlash on SanDisk.com and Tegile IntelliFlash solution at Tegile.com. I welcome your questions in the comments section below.