In-Memory Statistics for Hadoop Ups SAS Game

If you consider yourself among the new breed of data scientist, somebody great at manipulating data and at applying advanced analytics while living and breathing in the Hadoop ecosystem, then SAS has you in its sights.

Right about now, you may be wondering, What?! SAS is known more among the analytics and statistical traditionalists than the newbie data-science and Hadoop crowd. But the expansion of the targeted SAS user base naturally flows from all the work it's been doing to bring Hadoop and advanced analytics together, as evidenced most recently in the SAS In-Memory Statistics for Hadoop environment the company has been showing off at this week's Strata 2014 in Santa Clara, Calif.

In-Memory Statistics for Hadoop is an analytics programming environment for the Hadoop framework. As the name indicates, it takes advantage of in-memory technology. This is the same in-memory technology that comes out of SAS's work on high-performance analytics and goes into play for
products like Visual Analytics, an interactive and highly dynamic data-visualization tool that has been shown to power through a billion rows of data nearly instantaneously.

In-Memory Statistics for Hadoop moves analytical processing away from the "blocking and tackling" of old, where one procedure ends and the next begins. Rather, being able to do the statistical work at the speed allowed by in-memory technology means the ability to string all those processes together as a series of actions, said Mike Ames, director of data science and emerging technology at SAS, whom I talked with by phone yesterday.

"Since we don't have to flush data from memory, we can keep it there for the entire session and intermix the data management and the exploratory analysis with the statistical analysis," he added.

Wayne Thompson, SAS chief data scientist, was on the call as well. He emphasized SAS's goal of providing a framework for supporting the complete analytical lifecycle -- from the data wrangling (or preparation), to the exploration and the modeling, and then through to deployment. "This provides the ability to not have to use something like MapReduce, but to be able to co-locate and compute across the cluster, never dropping data back down to disc," said Thompson, adding that Hadoop provides a "fire hydrant" of data to consume and feed into predictive models and prescriptive models.

Ames said he believes SAS is way ahead of the competition with its ability to support distributed computing and interactivity in this manner. "Today, you'd have to write most of the code yourself, but with this, you get pre-built libraries of statistical and machine learning methods."

All this built-in machine learning brings us back to those data scientists and Hadoopsters. As Thompson said, "We didn't develop this environment to target traditional inferential statistics."

In-Memory Statistics for Hadoop enables the iterative approach that data scientists thrive on. They can submit models and get models back in a continuous flow, engaging in work that, in a lot of cases, hadn't before been computationally feasible, Ames said. All of this is changing the way people work, shifting the focus from individual effort to data sciences teamwork.

"Wayne can build his models on the same set of data that I'm using and share them with me. We don't need to replicate effort," Ames said.

Of course, as SAS targets the new breed of Hadoop-inspired data scientist, it's not leaving its traditional users behind. Case in point, Thompson talked of one analytics director at a global hospitality company who stopped by at Strata to check out the product's support for hotel load analysis. Like others, "He sees Hadoop as a low-cost and powerful, flexible and scalable storage environment. To have the analytics co-located with that -- well, that gives us the ability to take existing customers like him to the next level."

With In-Memory Statistics for Hadoop, Ames said, analysts can build hundreds if not thousands of models in a concurrent run of the software. That sounds like a game-changer to me. What would you do with that kind of power? Jump in with your ideas below.

Beth Schultz, Editor in Chief

Beth Schultz has more than two decades of experience as an IT writer and editor. Most recently, she brought her expertise to bear writing thought-provoking editorial and marketing materials on a variety of technology topics for leading IT publications and industry players. Previously, she oversaw multimedia content development, writing and editing for special feature packages at Network World. In particular, she focused on advanced IT technology and its impact on business users and in so doing became a thought leader on the revolutionary changes remaking the corporate datacenter and enterprise IT architecture. Beth has a keen ability to identify business and technology trends, developing expertise through in-depth analysis and early adopter case studies. Over the years, she has earned more than a dozen national and regional editorial excellence awards for special issues from American Business Media, American Society of Business Press Editors, Folio.net, and others.

"Of course, as SAS targets the new breed of Hadoop-inspired data scientist, it's not leaving its traditional users behind."

SAS naturally is quck to adapt to the changing environment. Hooking up the Hadoopsters is crucial to SAS and should provide as explained by SAS, some real benefits. Combining the SAS experience to open souce solutions will probably end with a good marriage for all.