Saturday, February 02, 2013

Data mining is an important tool whose benefits have been demonstrated in diverse fields, among business, government and non-profit organizations. Its application areas continue to grow, especially given the ever-shrinking cost of gathering and organizing data. Yet, there are problems for which data mining is wholly unsuited as a solution.

To understand when data mining is not applicable, it will be helpful to define precisely when it is applicable. Data mining (inferential statistics, predictive analytics, etc.) requires data stored in a machine format of sufficient volume, quality and relevance so as to permit the construction of predictive models which assist in real-world decision making.

Most of our time as data miners is spent worrying over the quality of the data and the process of turning data into models, however it is important to realize the usual context of data mining. Most organizations can perform basic decision making competently, and they have done so for thousands of years. Whether the base decision process is human judgment, a simple set of rules or a spreadsheet, much performance potential is already realized before data mining is applied. Consultants' marketing notwithstanding, data mining typically inhabits the margin of performance, where it tries to bring an extra "edge".

So, if the above two paragraphs describe conditions conducive to data mining success, what sorts of real-world situations defy data mining? The most obvious would be problems featuring data that is too small, too narrow, too noisy or of too little relevance to allow effective modeling. Organizations which have not maintained good records, which still rely on non-computer procedures and those with too little history are good examples. Even within very large organizations which collect and store enormous databases, there may be no relevant data for the problem at hand (for instance, when a new line of business is being opened, or new products introduced). It is surprising how often business people expect to extract value from a situation when they have failed to invest in appropriate data gathering.

Another large area with minimal data mining potential is organizations whose basic business process is so fundamentally broken that the usual decision making procedures have failed to do the usual "heavy lifting". Any of us can easily recall experiences in retail establishments whose operation was so flawed that it was obvious that the profit potential was not nearly being exploited. Data mining cannot fine tune a process which is so far gone. No amount of quantitative analysis will fix unkept shelves, weak product offering or poor employee behavior.