Driving Analytic Value From New Data

One of the best ways to improve the power of your analytics is to include some totally new information. The use of new information can enable huge leaps in the effectiveness, predictive power, and accuracy of your analytics. Most of the time, effort is spent trying to incrementally improve results by using existing data and information in a more effective manner. This isn’t as much because analytic professionals don’t realize that new data can be powerful as it is because new data only becomes available occasionally. As soon as a new and different data source is available, however, you’ll be much better off to shift your focus to the new data immediately.

To me, this gets to the heart of why big data is so powerful and is getting so much attention. I believe that the volume, variety, and velocity aspects of big data, which get so much attention, are secondary. As I have discussed in prior blogs and articles, the most important ‘V’ associated with big data is value. The other ‘V’s’ are only relevant in the presence of value. So what drives that value for big data? Keep reading.

The fact is that many big data sources contain information that was either not available in the past, or was available only to a much lesser extent through means requiring much more effort. For example, information from your web browsing activity is easy to capture and analyze today. In the past, the only way to get similar data was through very expensive research projects executed on a very small scale. In practice, the information just wasn’t available because it was too expensive.

Let’s fast forward to an analytic professional attempting to address a common business problem today, such as churn or next best offer. When the data sources available are fixed, most effort goes into trying new modeling methods, new variable definitions, and new ways to handle sparse or missing data. These efforts can result in increased power, but typically only provide small, incremental gains. In cases with a lot of money on the line, such gains aren’t anything to sneeze at. However, the fact is that the likelihood of blowing your last results out of the water is pretty low.

Now let’s imagine that the same analytic professional uses the exact same modeling methods, variable definitions, and data preparation today as he or she used yesterday. However, added into the analysis are new variables from a new data source that contains totally new information. Let’s assume that browsing history is now available to help identify customers’ next best offer, for example. Given that browsing history provides information on preferences and future purchase intent that isn’t available with traditional data sources, the analytic professional can achieve tremendous gains in analytic power. This is true even when using the same old methods, but with new data.

My point is that for all the fuss about what the best analysis methods are and how to best handle missing and dirty data, the really big gains come from finding new information to include. Think back to statistics 101 and the idea of Principal Components Analysis and orthogonal vectors. While dozens of variables may be available to an analysis, the variables often contain widely overlapping information. A new variable with substantially the same information as is already known won’t add much value. However, anytime you can add variables that are completely or mostly distinct in terms of the information contained, there is the potential for a lot of value.

The action I recommend for readers is to constantly seek out new data sources. Instead of putting all your effort into tuning your existing modeling methods with existing data, focus effort on a new data source every chance you get. That’s where you’ll find the big gains. After you realize your initial gains from the new data you can go back to tuning, but I believe that makes sense only when you’ve exhausted your ability to include additional data sources.

This is the core of the value proposition for big data. Many organizations suddenly have multiple new, untested sets of data available for incorporation into their analytic processes. Used correctly, this data can provide a huge competitive advantage and a veritable gold mine of value. Don’t miss your chance to get ahead.

Let’s close with a thought experiment. Assume I offer you a world class analytic professional with access to every tool available, but who will be limited to using only existing data. Your other option is a solid, but not world class, analytic professional with access to just standard tools. This person, however, will be allowed to incorporate some new data sources that appear to hold value.

I hope you’ll take the 2nd option over the 1st. Ideally, you’ll have a world class analytic professional working with the new data, of course, but the thought experiment illustrates the point. No matter how good an analytic professional is and how fancy the tools, the inherent value in new and different data will win in most cases.

Bill Franks is Chief Analytics Officer for Teradata, providing insight on trends in the analytics & big data space and helping clients understand how Teradata and its analytic partners can support their efforts. In addition, Bill is a faculty member of the International Institute for Analytics and the author of the books Taming The Big Data Tidal Wave (John Wiley & Sons, Inc., April, 2012) ...

Great post Bill. Note that the 3Vs as I first defined them over 13 years ago (ref: http://goo.gl/wH3qG) were meant only to define the challenges and opportunities of Big Data. Value is important of course (along with a dozen other dimensions Gartner has identified), but it is not a defining characteristic of Big Data. That is, you can have Big Data but not be generating value from it. "Value" also is a vague, slipery word that's thrown around too casually: enterprise assets that are unutilized have probable value recorded on balance sheets, and deployed assets have realized value recorded in income statements. And "benefits" are often an unfortunate, insufficient proxy for actual value. Notwithstanding the fact that accounting principles *still* do not allow for information assets to be recognized, they meet all the criteria. Recognizing this and information's growing economic importance, I developed and have been teaching information economics (infonomics), including information valuation methods for some years (ref: http://en.wikipedia.org/wiki/Infonomics). Happy to connect on this w you.

Also, it's great that you point out the importance of "new data". This is one of the fallacies/limitations of Moneyball that people don't realize: new statistics were developed using old measurements. New ways of measuring player performance (e.g. Sportsvision's Field/fx system of capturing 2M datapoints per game) and similarly corporate/individual/process/machine performance are critical.