Thursday, January 5, 2012

The Other BI: EMC Greenplum and Embedded Analytics

This blog post highlights a software company and technology that I view as potentially useful to organizations investing in business intelligence (BI) and analytics in the next few years. Note that, in my opinion, this company and solution are not typically “top of the mind” when we talk about BI today.

The Importance of the Greenplum Software Technology to BI

I am stretching a point when I say that EMC’s Greenplum is not “top of the mind” today. EMC has done an extensive and effective job of marketing Greenplum’s virtues in dealing with Big Data. However, what I am talking about here is embedded analytics – and there, neither Greenplum nor any other vendor solution is “top of the mind” with IT today.

More specifically, I am talking about middle-tier analytics, the area that most embedded analytics will aim for in the next three years. This is not massive-data-store, in-depth-analytics BI like the data warehouse; nor is it the “smart sensor,” small-form-factor analytics that will increasingly come to the fore with the arrival of the sensor-driven Web (e.g., analytics on your iPhone). No, I am talking about medium-sized data stores, moderately in-depth analytics, and in-enterprise BI applied at the level of the department, local office, loosely-coupled storage array, or server network. This analytics does best when it is embedded in other software or in firmware, and operates semi-automatically to pick up business-process flows and alert the business before they get out of whack, or offloads load balancing from a central server. Unlike systems management software, embedded analytics not only monitors and “fixes” but also analyzes what is going on, and reports this analysis either to the top-tier data warehouse or a specific set of software, end users, and/or administrators.

Up to now, the fledgling beginnings of embedded analytics have begun to show up in the systems management software of folks like CA; but they are not separable pieces usable by other distributed software. Increasingly, the major vendors like IBM are now talking about taking analytic software from BI and analytics software suites and applying it to organization operations across the board.

However, these often involve databases retrofitted to BI in general and decision support in particular. What Greenplum represents is the obvious next step: applying a database designed from the ground up and optimized for querying and analytics. The point is that these will inevitably be better suited than data management approaches intended to handle updates as well as queries and result massaging.

This is not to say that an embedded analytics database is the end point of embedded-analytics evolution. Because most if not all available analytics databases were designed for the top tier, they are too “heavyweight” for their intended purpose: they perform more slowly, because they are tuned for much higher data-store sizes. However, whether the next turn of the market crowns a slimmed-down top-tier database or a new ground-up-designed middle-tier analytics database as the winner, either one will really do.

Over the next 2-3 years, it is reasonable for IT buyers to expect some of this technology to arrive on their doorsteps embedded in upgrades of existing solutions – but far from all of it. At some point in this period, separable analytics solutions will show up that will allow the user to go far beyond what a particular vendor is offering – if, of course, IT wants to.

Why would IT want to do this? Answer: to handle areas in which one-size-fits-all vendors are simply not moving fast enough. Take, for example, carbon accounting. Vendors have been very proactive in this area, but some of the market is moving faster still, towards monitoring that picks up on and alerts to excess emissions as they happen, and connects with the carbon accounting software when necessary. Likewise, as health care providers grapple with government mandates and Electronic Health Records, they can see coming a day in which they will need to perform damage control on breaches of privacy; but today’s tools are much slower than they could be to detect such a problem. In either case, customizable middle-tier embedded analytics that goes beyond most likely vendor offerings is needed.

The primary organization benefit of this technology, therefore, is deeper real-time understanding of in-enterprise problems that leads to better decision-making –a very cost-effective application of analytics’ general ability to improve gross margins. Embedded analytics via an analytics-adapted database may take longer to arrive than most of the Other BI that I talk about, but its advent and benefits are just as sure.

The Relevance of EMC to BI

While EMC has continued its tradition of “hands off the technology, add our markets” in the Greenplum acquisition, it has also continued another tradition: adding the technology where appropriate to its core storage software/firmware. That is, according to EMC (and I see no reason to doubt them), Greenplum technology is being put in storage controllers to offload querying from the server to the storage array. Obviously, that has a major positive implication for storage and large-BI performance. Less appreciated is the fact that this embedding of Greenplum requires that it “slim down” into a form that can operate not only on storage but also, in a middle-tier fashion, on loosely-coupled LANs serving local offices, departments, and so on. In other words, embedding on storage should mean that embedding on all other middle-tier form factors is within reach. And the acquisition of Greenplum also should mean that EMC is finally beginning to add database and BI smarts to its DNA, ensuring reasonable long-term service and support for its embedded-analytics solutions.

EMC’s market strength and apparent relative freedom from threat in the scale-out market mean that in the 2-3 year time frame I am talking about, and probably in the medium term as well, Greenplum is in no danger of going away. No, the real question for IT buyers of embedded analytics is whether EMC will have Greenplum take the next step, abstracting its slimmed-down form for embedded analytics on all vendor platforms. I can offer no guarantees of this, since it is not apparent that EMC has done such a thing before. All I can say is, if they do so, at least some sort of market will be there.

Potential Uses of Greenplum-Type Analytics for IT

It is time to point out that embedded-analytics technology is unusual in that vendors have relative freedom to delay delivering, say, multivendor or open-source middle-tier analytical databases, since it’s not high on IT wish lists. It could happen next week, or it could happen 3 years from now. So any IT acquisition of, and use of, this kind of embedded analytics will just have to wait until the vendors get around to it.

At that point, the obvious application is per-project – improving a specific business process or case-management implementation. More than other technologies, embedded analytics does not require full, integrated organizational implementation to be maximally effective. Rather, it does just fine applied to a task, a process, a function, a locality, or a local or strategic initiative. IT simply looks down the list of mission-critical projects and picks the one that benefits most from risk management or analytical automation.

The critical success factor in such projects is rapid implementation and upgrade, caused by automation of the implementation/upgrade process, allowing strategic projects a head start. Right now, while most vendors do well at this, high-end vendors like EMC seem to be setting the pace. And so, choosing EMC Greenplum (assuming it fits) in all likelihood means a better chance of rapid implementation and a database better fitted to a broad range of embedded-analytics tasks – not to mention better ongoing support for tricky cases.

The Bottom Line for IT Buyers

The IT buyer should view embedded analytics as a technology that may take a while to materialize. However, when it does, Greenplum-type embedded analytics will deliver analytics-type benefits at least equal to the whiz-bang high-end analytics now being sold – although those benefits will arrive in smaller per-project chunks. And that, in turn, means that this technology is definitely worth the IT buyer’s ongoing attention.

More specifically, the IT buyer might consider a “pre-pre-short-list” type of approach. That would involve identifying solutions such as EMC Greenplum that may wind up as part of the embedded-analytics short list in the next 2 years, and steadily moving those products in the pre-pre list over to the “pre-short list” as their technology reaches the point of usefulness (that is, it can be applied by IT rather than being embedded in another vendor solution, and it’s optimized for middle-tier analytics). Today, I would say that it appears Greenplum is probably among the closest to that take-off point. So, put it on the pre-pre short list, and get ready to put it on the short list. If everything goes right, and your CEO hits you with an urgent requirement that really demands embedded analytics, you will definitely be glad you had EMC’s Greenplum embedded analytics solution in your back pocket.

6 comments:

Another vendor focused on providing 'focused' data analytics is Vertica. Unlike Greenplum, Vertica is less storage dependent due its best-of-breed compression technology, and provides some of the fastest query responses out there. When the EDW no longer gets to the right data in a timely fashion, Vertica can provide a 'data mart' solution to provide the speed and scalability required for any critical search.

Perfect timing, Marc. I'm writing a post on Vertica right now. I'm focusing on columnar technology there, and why, despite the fact that HP seems slow off the mark, there's still the same great long-term potential in columnar in general and Vertica in particular.

Wayne Kernochan

About Me

I have recently retired. Before retirement, I was a long-time computer industry analyst at firms like Aberdeen Group and Yankee Group, and before that a programmer at Prime Computer and Computer Corp. of America. Sloan/MIT MBA, Cornell Computer Science Master's, and Harvard college degrees. Used to play the violin, and have written unpublished books about personal finance, violin playing, and the relationship between religion and mathematics, as well as three plays, two musicals, a screenplay on climate change, short stories, and poetry. I intend to use this blog in future both to continue to enjoy the computing field and to pursue my interests in many other areas (e.g., climate change, history, issues of the day).