EMC Unplugs Data Lake 2.0

EMC made a big splash with last month’s announcement it was going to be acquired by Dell, and now it’s continuing its aquatic adventures with its third data-lake deep dive this year (i.e. storage – Isilon HD400 and analytics – Federation Business Data Lake) with its vision for Data Lake 2.0, and three new offerings: IsilonSD Edge, Isilon OneFS.NEXT and Isilon CloudPools. These new products are driven by the explosive growth of unstructured data, which is expected to increase at least 500% over the next five years, said Sam Grocott, SVP, Marketing & Product Management, EMC.

Over 84% of data being created is unstructured, which is the segment Isilon and its 6,000 customers are concerned with, he said. EMC introduced the data lake concept 18 months ago, he told IT Trends & Analysis, focused on optimizing for multiple workloads in a single data center experience. Data Lake 2.0 is a ‘global entity and connects edge locations beyond a single data center, including the cloud.’

To be generally available in early 2016, the new products are: IsilonSD Edge, a software-defined storage, file-based solution customers can deploy on commodity hardware to extend the edge of the Data Lake in 36TB increments; Isilon OneFS, the new operating system that provides enterprise-grade continuous service for the data center by expanding non-disruptive upgrade and rollback capabilities; and, Isilon CloudPools, which extend unstructured data storage to public clouds (i.e. Amazon Web Services, Microsoft Azure and Virtustream), providing new efficiency gains by embracing the cloud as an archiving storage tier for cold data, while still making it accessible for on-demand analytics.

Dealing with data closer to the edge, and facilitating cloud access were key customer demands, said Grocott. Over the last 5 years, the number of enterprises with more than 100 branch locations has grown from 30% to 53%, and cloud growth is also exploding. Our customers and partners have been asking for these capabilities for quite some time, he said. It was also important to speak to “EMC’s vision of how the data lake needs to evolve to the cloud” as well.

The problem with Hadoop – [and, at least by my extension, Big Data and the data lake] – is that it’s easy to get started but eventually becomes somewhat hard to monetize, control, and generally operate according to enterprise requirements, blogged Nik Rouda, Senior Analyst, Enterprise Strategy Group. ‘I’ve been arguing lately that the new sexiest job in IT isn’t the data scientist, it’s the data steward.’ Then he also – tongue firmly planted in cheek – mentioned the ‘one-pile method’ of dumping all data in one location, i.e. a data lake, so you knew where everything and anything you might ever need could be found.

Gartner is also somewhat leery of the data lake concept, calling it ‘murky at best.’ Even with the multiple benefits that data lakes provide, there are substantial risks, noted Gartner research director Nick Heudecker.

“There is always value to be found in data, but the question that has to be addressed is this — do we allow or even encourage one-off, independent analysis of information in silos or a data lake, bringing said data together, or do we formalize to a degree that effort, and try to sustain the value-generating skills we develop?” he said. “If the option is the former, it is quite likely that a data lake will appeal. If the decision tends toward the latter, it is beneficial to move beyond a data lake concept quite quickly in order to develop a more robust logical data warehouse strategy.”

Whether you want to dive into the data lake, pile on, or whatever, you will have to deal with ever-increasing ridiculous amounts of data. IDC sees the Big Data technology and services market growing at a compound annual growth rate (CAGR) of 23.1% over the 2014-2019 forecast period with annual spending reaching $48.6 billion in 2019. Infrastructure, which consists of computing, networking, storage infrastructure, and other datacenter infrastructure-like security – will grow at a 21.7% CAGR; software, which consists of information management, discovery and analytics, and applications software – will grow at a CAGR of 26.2%; and services, which includes professional and support services for infrastructure and software, will grow at a CAGR of 22.7%.

Grocott expects EMC to move forward aggressively on the data lake concept, its software-defined strategy and cloud focus. “This is a key move for us… to announce data lake 2.0.”