Sunday, November 22, 2009

Business is all about placing bets and knowing if the odds are in your favor.

As I noted in my most recent Forrester report, business success depends on your company being able to visualize likely futures and take appropriate actions as soon as possible. You must be able to predict future scenarios well enough to prepare plans and deploy resources so that you can seize opportunities, neutralize threats, and mitigate risks.

Clearly, predictive analytics can play a pivotal role in the day-to-day operation of your business. It can help you focus strategy and continually tweak plans based on actual performance and likely future scenarios. And, as I noted in a recent Forrester blog post, the technology can sit at the core of your service-oriented architecture (SOA) strategy as you embed predictive logic deeply into data warehouses, business process management platforms, complex event processing streams, and operational applications.

The grand promise of predictive analytics—still largely unrealized in most companies—is that it will become ubiquitous, guiding all decisions, transactions, and applications. For the technology to rise to that challenge, organizations must move toward a comprehensive advanced analytics strategy that integrates data mining, content analytics, and in-database analytics. Already, we’ve sketched out a vision of “Service-Oriented Analytics,” under which you break down silos among data mining and content analytics initiatives and leverage these pooled resources across all business processes.

You may agree that this is the right vision but have doubt about whether there is a practical, incremental roadmap for taking your company in that direction. In fact there is, and it starts with re-assessing the core of most companies’ predictive analytics capability: your data mining tools. As you plan your predictive analytics initiatives, you should avoid the traditional approach of focusing on tactical, bottom-up project-specific requirements. You should also try not to shoehorn your requirements into the limited feature set of whatever modeling tool you currently happen to use.

To become a fully predictive enterprise, you will need to take both a top-down and bottom-up approach to your data mining initiatives. From the top-down, it’s all about building and integrating alternate models of how your business environment is likely to evolve internally and externally. In our recent report on advanced analytics, Boris Evelson, Leslie Owens, and I sketched out the many business processes that can be enriched by predictive analytics.

So how do you instrument your company to become more predictive? For starters, assess whether your analytics tools support the following capabilities for developing, validating, and deploying predictive models:

Model multiple business scenarios: You should be able to build complex models of multiple, linked business scenarios across different business, process, and subject-area domains, using such key features as strategy maps, ensemble modeling , and champion-challenger modeling.

Incorporate multiple information types into models: You should be able to develop models against multiple information types, including unstructured content and real-time event streams, while leveraging state-of-the-art algorithm in sentiment analysis and social network analysis.

Leverage multiple statistical algorithms and approaches in models: You should be able to develop models using the widest, most sophisticated range of statistical and mathematical algorithms and approaches, including regression, constraint-based optimization, neural networks, genetic algorithms, and support vector machines.

Apply multiple metrics of model quality and fitness: You should be able to score and validate model quality using multiple metrics and approaches, including quality scores, lift charts, goodness-of-fit charts, comparative model evaluation, and auto best-model selection.

Employ multiple variable discovery and assessment approaches: You should be able to build and validate models using various approaches for variable discovery, profiling, and selection, including decision trees, feature selection, clustering, association rules, affinity analysis, and outlier analysis.

How is this different from predictive analytics as usual? Traditionally, most predictive modeling specialists focus on the latter three capabilities: statistical algorithms and approaches, model quality and fitness, and variable discovery and assessment. Most models are built in narrowly scoped business or subject domains—such as customer analytics for marketing campaign management—and only against structured data sources (such as relational tables). Traditionally, few predictive analytics projects have entailed modeling of multiple business scenarios across diverse domains--such as sales, marketing, customer service, manufacturing, and supply chain-- though in the real world these business processes are often quite interconnected. Also, many data mining initiatives fail to incorporate information from unstructured sources—such as text in call-center logs—though this content may be as important as what comes relational databases and other structured sources.

It’s very important to build multi-scenario predictive models against complex information sets, but becoming a fully predictive enterprise demands much more. To instrument your organization for maximum predictive power, you should also tool your advanced analytics to support the following capabilities:

DW-integrated data preparation: To speed up and standardize the most time-consuming predictive modeling project tasks, you should be able to leverage your existing data warehouse, extract transform load, data quality, and metadata tools to support a full range of data preparation features. These features include the ability to discover, acquire, capture, profile, sample, collect, collate, aggregate, deduplicate, transform, correct, augment, and load analytical data sets.

Deep application and middleware integration: To deliver models deeply into whatever heterogeneous SOA-enabled platform you happen to use, your predictive analytics tool should deploy on and/or integrate with a wide range of enterprise applications, middleware, operating platforms, and hardware substrate. You should be able to deploy models seamlessly into your data warehouse, business intelligence, online analytical processing, data integration, complex event processing, data quality, master data management, and business process management environments. And to play well in your SOA, your predictive modeling tools should support application programming interfaces, languages, tools, and approaches such as Web services, Java, C++, and Visual Studio, as well as emerging languages such as SQL-MapReduce and R.

Consistent cross-domain model governance: To avoid fostering an unmanageable glut of myriad models, your predictive analytics solution should support a wide range of tools, features, and interfaces to support life-cycle governance of models created in diverse tools. At the very least, your tools should enable model check in/check-out, change tracking, version control, and collaborative development and validation of models. To realize this promise, it should support a full range of tools, standards, and interfaces for import and embedding of models from other tools, as well as export and sharing of models to other environments.

Flexible model deployment: To execute modeling functions--such as data preparation, regression, and scoring—on the widest range of data warehouses and other platforms, your tools should support in-database or embedded analytics. And to scale to the max, your predictive analytics tools should deploy models to massively parallel data warehouses, software-as-a-service environments, and cloud computing fabrics. Your advanced analytics tools should also support development of application logic in open frameworks—such as MapReduce and Hadoop—to enable convergence of data mining and content analytics in the cloud.

Rich interactive visualization: To deliver their precious payload—actionable intelligence—your advanced analytics tools should support interactive visualization of models, data, and results. Ideally, you should be able to visualize all of this in your preferred business intelligence tool, or in the predictive modeling vendor’s integrated visualization layer. Of course, you have every right to expect the full range of visualization techniques--histograms, box plots, heat maps, etc.—regardless of who provides the visualization layer.

As you can see, this goes well beyond data mining as usual. Forrester has a slightly different perspective on the development of the predictive analytics market than you’re likely to get from other sources. We see a robust, flexible, SOA-enabled data mining tools as the centerpiece of advanced analytics for fully predictive enterprises. The competitive stakes are too great for businesses to take the traditional silo-mired approach when implementing this mission-critical technology.

JK2—We’re always trying to rebuild the brain. Now we’re ratcheting down our ambition: mimicking carnivore gray matter because we don’t have the computer power to do justice to our own wetware circuitry. I recommend we start by replicating the most primitive brains in the animal kingdom: insects (do they even have brains?). Why start there? Well, because they’re such an incredibly successful category of organisms...they might have something to teach us, if we can learn to think like them. Cats? They’re recently evolved camp followers of homo sapiens. From an evolutionary standpoint, they’ll teach us what we already know: if you protect the food supply from rodents, generally keep to yourself, and provide a passive object of comfort and companionship, you have a warm place in the human hearth—as long as humans themselves survive. Insects are something entirely different: they’ll survive whether or not we do, and they might contribute to our downfall. Keep your friends close, your enemies closer.

JK2—Let’s not imagine that everybody everywhere wants to spend every day experiencing the world through reports, dashboards, and other other visualization containers we associate with specialized business intelligence (BI) solutions. Most of us want all of these contextualizers, but embedded in all the apps and services we use. And let’s not imagine that everybody wants to see every scrap of information packaged in a BI-like experiences: with prebuilt visualizations, context, and insights. So it’s not productive to view the world through purely BI-colored glasses. What I love most about the Web is the passing parade of people, situations, events, images, information, trends, and experiences—arbitrary, complex, confusing, sprawling, stimulating, open-ended. The masses are happy to derive their own meanings from these messes.

JK2—Will enterprises evolve toward hybrid BI environments hosted partly on-premises and partly in the SaaS/cloud. Will the departments be allowed to mashup their own BI reports and dashboards on outsourced SaaS/cloud services, while the enterprise as a whole uses a premises-based platform? Won’t one approach crowd out the other over time as corporate IT looks to consolidate on a single platform? Either SaaS/cloud will become the dominant BI deployment approach for companies of all sizes, or the dominant approach for one segment, such as the midmarket. Or the dominant approach for deployment of one category of BI capability—such as predictive analytics against cloud-sourced data—while the core of BI is still deployed on on-premises platforms.

JK2—Nobody truly knows the future. Some of us have models that have proven quite good at predicting futures with a reasonable degree of confidence, based on observations. That’s what predictive modeling is all about. Where analytics is concerned, there has never been a “next big thing.” Instead, all the old things (and data mining is certainly an old established discipline) just keep evolving aggressive new marketing messages to justify customers’ continued loyalty.

JK2—The key gating factor on predictive analytics’ adoption has always been the specialized statistical and mathematical knowledge required to use these tools effectively. That constraint is beginning to ease, thanks to the development of more automated visual tooling for data discovery, exploration, preparation, and modeling. But this is still a math-geek-intensive discipline—much more than, say, core BI. Let’s be honest with ourselves. No true “next big thing” demands that you first go back for college-level training in statistics.

JK2—I’m a bit fatalistic about speaking to the press. Even when they quote me correctly, and place that quote in the right context in a well-written article, a misleading headline can screw it all up. Jeff did a good job on this one except for the headline. There’s no mention anywhere in this article of a free DW appliance (software plus hardware in a complete, no-charge package) being offered by any vendor. If there were, that would definitely be news.

RT @stheath:"SAS models can be scored in various DWs" JK--Yep, as can models created in other PA/DM tools--through PMML & other imp/exp. 2:32 PM Nov 18th from TweetDeck

JK2—In-database analytics is a capability that most DW/DBMS platforms support, as do most predictive analytics and data mining tools. It’s all about tools exporting models as PMML, or as native SAS code, or as Java archives, or any of various other approaches—and DW/DBMSs’ importing them and executing those models as user-defined functions (UDFs) or some other approach. Of course, vendors vary widely in the range of data mining functions—such as data preparation, regression analysis, and scoring—that can be done on which tools’ models by which DW/DBMS vendors’ platforms.

Data miners seem to have the same love-hate relationship with their tools that old-fashioned miners have with their pickaxes and dynamite. 2:12 PM Nov 18th from TweetDeck

JK2—No matter how I score the various vendortools on my forthcoming Forrester Wave for Predictive Analytics and Data Mining Solutions, I’m going to face a boatload of ire from devotees of the lower-scored tools. And even from users of the higher-scored tools who will point out the myriad feature their vendor has never got quite right—but which are not showstoppers that would cause these users to abandon the stat tools they’ve been using since their college days.

JK2—I’m more than happy covering the myriad not-quite-that-big things that loom large in the daily nitty-gritty of enterprise computing. A more sustainable career than riding the wave of fast-rising bubble technologies that may bebig next year but obsolete the year after.

PA/DM is a discrete product segment, but converging into BI, blurring into content analytics, fed from and deploying to CEP, embedded in SOA 1:04 PM Nov 18th from TweetDeck

JK2—There are many ways to skin the “self-service operational BI” cat, and almost every vendor in this arena is doing it by a blend of these and other approaches. Everybody’s trying to take this technology out of IT’s hands and put the users in the driver’s seat. Very little of it is rocket science. Most of it is well-established and well-understood, has stable usage and integration patterns, and can be automated to a greater degree than we like to admit.

Core of "analytics cloud" is a scale-out DW platform. MSFT SQL Azure may be there by 2011, but not in 2010's v1. Why not embed SS08R2 PDW? 7:35 AM Nov 18th from TweetDeck

JK2—Microsoft’s strategic error on cloud DW was building one stovepipe analytic database environment for Azure, and another one for SQL Server. They’ll spend several years converging them, and it won’t be pretty. And it won’t be in time to make much headway against Amazon, Google, IBM, Teradata, and others who are getting there first with more integrated cloud DBMS/DW solutions—in IBM and Teradata’s cases, with the same core database in premises-based and public clouds.

JK2—See tweet explaining this, earlier in this aweekstweets. Lots of DW/DBMSs can import models in native SAS code. Aster can even execute those models, without conversion, in a SAS executable runtime container in the new nCluster v4.

JK2—Milestone in the ordinary product-management sense that it’s one step closer to go-live for Microsoft. Will Azure represent an industry milestone in the maturity, sophistication, and adoption of cloud computing in corporate environments? 2010 will be the year we learn.

MS-DOS was originally "QDOS" (quick and dirty operating system--not making this up). Was that an omen? 9:41 PM Nov 17th from TweetDeck

JK2—Windows—with its schizoid “wait forever for your latest goddamn mouse-click to advance the cursor a millimeter on screen” GUI—has rarely been quick, and—with the blue screens of death, malware infestations, and general look-and-feel madness—has often been dirty.

JK2—In a commoditized market with a couple dozen competitors, re-startups had better have some awesomely innovative verge-of-commercialization technology in the labs to have a snowball’s chance.

Social network analysis: I'd like to see operational definitions of "groupthink," and content analytics that detect when it's emerging. 8:49 PM Nov 17th from TweetDeck

JK2—For example: Can social network analysis detect the outlines of the GOP agenda and presidential candidate shortlist for 2012 even though it’s 3 years from now? And can these algorithms outdo the human pundits in this regard? That’d be like a football coach having a mole in the opposition’s huddle.

JK2—Quite frankly, dreams are often a distraction from the main business of life. Dreams are just rapid eye movements and herky-jerky unconscious muscle spasms. Sell people quiets, comfy pillows, and firm mattresses.

JK2—Mostly just an internal private cloud at IBM. The commercialized cloud will be for IBM mainframe customers. I’m still waiting for an IBM smart analytics public and private cloud that will be virtualized across DB2 and Informix, and across all OS and hardware platforms. I don’t know yet where IBM is going with this, or whether in fact they plan to go that all-encompassing.

JK2—The point of this tweet was that SAS is the largest predictive analytics and data mining vendor by market share—hence many predictive models have been built with its tools—hence DW vendors that do in-database analytics should be able to integrate with and execute the full range of procedures, including scoring and regression-, on SAS models—hence it’s good that Netezza has a SAS partnership. Netezza’s partnership with Fuzzy Logix, vendor of the DB Lytix in-db enabling tool, is important for Netezza in-database analytics on a wide range of third-party PA/DM tool vendors’ models. Whew—hard point to make without lots of detail and nuance. Thank goodness for aweekstweets (assuming anybody actually reads this).

JK2—Here I’ll simply re-post the full response from a few tweets above: “JK2—The point of this tweet was that SAS is the largest predictive analytics and data mining vendor by market share—hence many predictive models have been built with its tools—hence DW vendors that do in-database analytics should be able to integrate with and execute the full range of procedures, including scoring and regression-, on SAS models—hence it’s good that Netezza has a SAS partnership. Netezza’s partnership with Fuzzy Logix, vendor of the DB Lytix in-db enabling tool, is important for Netezza in-database analytics on a wide range of third-party PA/DM tool vendors’ models. Whew—hard point to make without lots of detail and nuance. Thank goodness for aweekstweets (assuming anybody actually reads this).”

JK2—It’s funny that the headline writer put “fluff up” in there, as if these vendor’s announcements were insubstantial. They weren’t insubstantial, but they didn’t begin to address the pricing issues that will determine whether any of the new services are cost-effective for the mass market anytime soon. My hunch is that we’re due for a nasty price war among the cloud app/platform vendors in 2010-2012, with packaged software license revenues (watch out Microsoft!) taking a huge hit.

JK2—What’s Twitter? A cloud of colloquial noise. You can hide juicy tweetborne content in plain sight—i.e, won’t pass through the filters of many sophisticated text analytics NLP engines, because essentially written in ad-hoc arbitrary you-and-your-friends-specific code language.

Teaching myself to stop worrying and love the calendar. Starting to call next year "twenty-ten" and back-fit "twenty-oh-nine" to this one. 11 minutes ago from TweetDeckFrom itd.daily@it -director.com: "'It is wonderful to be here in the great state of Chicago.' Dan Quayle:" JK--Huh? Why pick on him anymore? 6:56 AM Nov 19th from TweetDeck

About Me

James Kobielus is IBM's
Big Data Evangelist. He is an industry veteran who spearheads IBM's thought
leadership activities in big data, data science, enterprise data
warehousing, advanced analytics, Hadoop, business intelligence, data management,
and next best action technologies. He works with IBM's product management
and marketing teams across the big data analytics portfolio. Prior to
joining IBM, he was a leading industry analyst, with firms including
Forrester Research, Current Analysis, and Burton Group. He has spoken at
such leading industry events as IBM Information On Demand, IBM Big Data
Integration and governance, Strata, Hadoop Summit, and Forrester Business
Process Forum. He has published several business technology books and is a
very popular provider of original commentary on blogs, podcasts, bylined
business/technology press publications, and many social media.