Big Data Analytics: A Spotlight Q&A with Mark Troester of SAS

BeyeNETWORK Spotlights focus on news, events and products in the business intelligence ecosystem that are poised to have a significant impact on the industry as a whole; on the enterprises that rely on business intelligence, analytics, performance management, data warehousing and/or data governance products to understand and act on the vital information that can be gleaned from their data; or on the providers of these mission-critical products.

Presented as a Q&A-style article, these interviews with leading voices in the industry including software vendors, end users and independent consultants are conducted by the BeyeNETWORK and present the behind-the-scene view that you won’t read in press releases.

Mark, one of the major trends in data management today centers on big data. We are hearing various definitions of big data, but what is the SAS definition of big data?

Mark Troester: That's a great way to start. We have a fairly standard definition in terms of what the industry defines as big data and the factors involved in that relative to volume, velocity, and variety. As other organizations, we see those three factors increasing and maybe at an exponential rate. What is different from a SAS perspective is that we don't really look at it as big data. It's not what data you have; it's really what you do with it. So we have extended the definition of “big data” to encompass big data analytics, and that brings a whole set of solutions and technologies to bear to allow customers to make better and faster decisions based on a larger data set. The other thing I'd note is that it's not just a Hadoop story. We have been working with big data basically for the last decade or so. We have very large implementations that have existed for almost 20 years, and we have a variety of technologies that we use versus just a Hadoop-based approach.

Let’s take that a little further. Obviously, SAS has long been a leading analytics company working with large unstructured databases and extremely large data sets. You mentioned Hadoop. We’re also seeing alternative databases such as columnar and NoSQL. How do those technologies change the landscape for SAS?

Mark Troester: SAS looks at that as yet another set of tools that we can actually apply to big data implementations. We are using those technologies to complement what we are already doing with high-performance computing, and our high-performance computing umbrella really encompasses our grid technologies so that we can distribute processing across a small or a very large grid; our in-database technologies where we can push processing whether it's a data preparation or analytics processing into the database engine so we can leverage the MPP capabilities of the large data warehouses; and then in-memory analytics where we can actually process a wide range of analytics directly within the memory footprint of the machine, which really drives a lot more performance and scale.

When we look at Hadoop and some of the other technologies that you mentioned, those are other technologies that we can bring to the solution. A lot of what we're seeing today with regard to Hadoop is that people are leveraging Hadoop to actually capture data that they weren’t able to capture before. So they’ve got Hadoop that's capturing Web stream data, maybe data from different devices. There are lots of different sources now, as you mentioned, that could be structured or unstructured. And then once they have the data in Hadoop, then they might use some type of business intelligence tool to actually start looking at the data in Hadoop and determine what data they can leverage in more of an analytical perspective. And then in many cases, they're pulling the relevant data out and putting it into their data warehouse.

We see that as the first step in a progression, and SAS provides the capabilities that are necessary to do data management and data preparation with Hadoop so you can bring data in and out of Hadoop into virtually any type of data source.

But what's more interesting is the next step. We will be providing capabilities that leverage MapReduce, which is a programmatic logic that can be used with Hadoop. So we're pushing processing for analytics into Hadoop, and that's where we think there'll be a huge leap in terms of the benefit that you can derive using Hadoop.

You mentioned MapReduce, and I immediately thought of Teradata and their acquisition of Aster Data. You recently introduced several appliances for analytics in partnership with Teradata and then also with EMC Greenplum. Why were these partnerships strategic or necessary for SAS?

Mark Troester: That's a good question. I talked about what we're doing with our high-performance computing umbrella and I mentioned grid, in-database and in-memory. The relationships that we have with Teradata and EMC Greenplum are vital in that they allow us to create and bring to market an appliance that is specifically created to drive the high-performance analytics on huge data sets. It's critical in terms of performance and scalability, and it makes it very easy to drop in an appliance versus trying to manage all of this on your own set of hardware. So there's a variety of benefits that are derived from this relationship.

From a business value perspective, we're seeing people are able to make better, informed decisions that really drive competitive advantage, and a lot of that is based on the processing speed that we can actually now apply to analytics. So in the past, where you may have had an analytics routine that took hours or days to run, with the combination of high-performance computing (HPC) in concert with what we're doing with the appliance vendors, we can shrink the window and in some cases do processing that took days in just seconds. And so, it's instrumental in terms of very large, high-volume, scalable solutions.

Well that's always good. I know most of our audience is very interested in shrinking that batch window or the ability to respond in seconds versus days so that's really great.

Mark Troester: The other thing I would add to that Ron is that not only are we shrinking the time window, but we're increasing the amount of data that you can actually factor into the analysis. So for example, we have situations where if you're doing a credit card swipe, the regular process for checking for fraud detection would be to maybe check 10% of the transactions. What we're able to do with our transaction processing is basically score every transaction so the validation is much higher. We're able to score every transaction, and we're able to do it within a very small millisecond window because, as you can imagine, the transactional system when you're swiping your card can't wait 10 to 20 seconds to verify the fraud detection.

The other thing that we're seeing too is that not in all cases, because certainly in some situations using a sample of your data can derive very good results, but with this new capability, you at least have the system capability to actually leverage an entire data set instead of using a sample of the data. We see situations where we have large retail partners that, in the past, did pricing optimization at a category level, such as determining what to do in terms of pricing for a category of shirts. Now they can actually do optimization at the SKU level, and they can do that on a nightly or weekly basis and be able to reprice their entire inventory versus being able to do pricing optimization by category.

Well that should give most retailers a tremendous amount of lift and really increase their ROI with regard to merchandising.

Mark Troester: Absolutely.I can only imagine what the return on investment is if you were only scoring 10% of the transactions that were being swiped and now can score 100%. I would think that there would be almost a tenfold reduction in credit card fraud just based on the statistical side of it.

Mark Troester: Yes, that definitely makes a big impact.

Mark, another big area that we're seeing mentioned every day on our websites and in the news, is the integration and analysis of social relationships and activities – this whole social networking movement that's going on. Is SAS going to be integrating the analysis of social relationships and activities as part of your business analytics strategy for your customers?

Mark Troester: it's clearly a hot topic now and one that's gaining a lot of attention. As many people know, we have a full set of business solutions that are basically prepackaged solutions that are delivered with a data model, with ETL flows, with prebuilt reporting and analytics capabilities. One of those solutions that we actually deliver as part of our customer intelligence products suite is social media and analytics capabilities. That suite has been a great complement for marketing optimization, campaign management, and other solutions that we provide. To your point, what we see is people are using our customer intelligence products suite to sift through mounds of data that's in the context of social media or social media plus other data that they have to do things like customer sentiment analysis. We're seeing a lot of that taking place. In some cases, depending on the data volume, then you're talking about big data being part of that as well.

But the other thing that we're seeing is the companies that can really tie that together with other forms of analytics are the ones that are seeing the most benefit. Obviously there are a lot of different analytics capabilities ranging from BI [business intelligence] and reporting, to text to analytics, to predictive analytics, to forecasting, to optimization, and we're a very strong proponent of being able to pull that together. We have a solution set that pulls those things together. With that solution set, if you're doing social media analytics and you're doing customer sentiment based on things that you're seeing in Twitter, or Facebook, or whatever, once you’ve determined what's happening in terms of the trends relating to your customer sentiment, the output of that analytics should be input into your forecasting analytics. You don't think of your forecasting and optimization as separate processes. You weave everything together.

And the other thing that I would note on that angle is the importance of being able to take the results of your analytics and embed those into operational systems. Now we talked about the credit card swipe. That's a perfect example where the credit card process is really more of an operational or transactional system, but it's leveraging analytics – rich, deep, robust analytics – as an important part of what they're doing. That could be extended to what you would see in a call center where you have an agent on the phone and based on analytics that are being driven based on churn management or next best offer, up pops a script for the call agent that allows that agent to drive the conversation based on analytics. So we're excited to see the tying together of multiple analytical paradigms and then integrating the analytics within the operational systems. Those things coming together is really exciting for SAS.

Great Mark. I recently read a survey that that SAS sponsored with Bloomberg BusinessWeek Research Services to determine the state of business analytics and enterprises today. When I look at survey results, I always like to ask what was the most unexpected finding in that survey?

Mark Troester: That's a great question. I don't know that I have something that's the most unexpected, but the thing that I found interesting was that they explored the notion of what level of intuition versus analytics is the right level of split in terms of trying to make decisions. They found that it was kind of a 60/40 split where people are more heavily using intuition versus analytics. Obviously, we think that that should probably go the other way, and there is some research in there that shows the companies or organizations that are more advanced in terms of how they use analytics have a higher percentage allocated to analytics. I thought it was very interesting that this difference was identified as analytics become more pervasive and people move up the food chain from simple BI and reporting to predictive analytics and optimization forecasting.

I think part of what hurts the analytics industry is that there's not a common, consistent understanding or definition about analytics. We see this a lot when we talk to both IT and business. Both sides of the house indicate they understand analytics, but when you get into an in-depth conversation, a lot of times their understanding of analytics is limited to BI and reporting. I think that's all changing now, and SAS is doing a great job of being able to provide the broader picture of analytics across both business and IT. I think over time we'll see a broader acceptance. We'll see more proof in terms of how analytics can drive the decisions, and we'll see that split from intuition to analytics go 60/40 the other way.

Well Mark, when I look at that number I'm actually very impressed with 40% right now. Going back five, ten years, most of the surveys had what we call intuition or gut-level decision making as well over 80%. I think we've come a long way. I think now that we're going to be incorporating social media into analytics, I think we should definitely be able to get this over 50% where we're relying more on analytics than instinct.

In what industries will big data analytics be the most valuable? Were they identified in that survey as well?

Mark Troester: What we're finding is it really pushes across a lot of different industries. I think some people think of big data analytics in the context of large organizations, but that's not always the case. You could have a web-based startup that has mounds of data coming in from the website that could be big data. The other thing is big data is relative. Sometimes people want to define it in terms of number of terabytes, but what might be big data to one organization might not be big data to another. It's all about their capacity to be able to manage and leverage that data.

I would say that in some cases with the explosion of Web data and social media data typically hits most any industry. That is going to make it pervasive across industries. We do see more interest in big data analytics from some of the traditional industries like financial services, retail, telco or organizations where there are smart meters. In telco, for example, we've gone from meters that maybe collect information about what happens at a residence on a monthly basis to meters that are collecting things on a minute-by-minute basis. Obviously, that just explodes the amount of data. Another area is healthcare where they are trying to leverage data more than they have in the past, and that's another area we think big data analytics is going to have a big impact.

Great. Well Mark, it's been a pleasure talking with you today. It's great to learn how SAS is addressing the big data analytics needs of your customer base, and I think these are very exciting times. I think we have a lot to look forward to in the next five to ten years.

Ron is an independent analyst, consultant and editorial expert with extensive knowledge and experience in business intelligence, big data, analytics and data warehousing. Currently president of Powell Interactive Media, which specializes in consulting and podcast services, he is also Executive Producer of The World Transformed Fast Forward series. In 2004, Ron founded the BeyeNETWORK, which was acquired by Tech Target in 2010. Prior to the founding of the BeyeNETWORK, Ron was cofounder, publisher and editorial director of DM Review (now Information Management). He maintains an expert channel and blog on the BeyeNETWORK and may be contacted by email at rpowell@powellinteractivemedia.com.