Login

Using Data Mining for Business Intelligence

Business intelligence is information about a company’s past performance that is used to help predict the company’s future performance. It can reveal emerging trends from which the company might profit. Data mining allows users to sift through the enormous amount of information available in data warehouses; it is from this sifting process that business intelligence gems may be found. Jagadish Chatterjee explains the process, and describes how Microsoft SQL Server can help.

NOTE: This article is completely dedicated to the persons who requested an article on data mining, when my previous article “Planning a Data Warehouse” was published. This is a complete introductory article on how to extend our data warehouse’s implementation to benefit business intelligence using data Mining.

Data mining is the process of extracting hidden knowledge from large volumes of raw data. It can also be defined as the process of extracting hidden predictive information from large databases.

Data mining is not an “intelligence” tool or framework. Business intelligence, typically drawn from an enterprise data warehouse, is used to analyze and uncover information about past performance on an aggregate level. Data warehousing and business intelligence provide a method for users to anticipate future trends from analyzing past patterns in organizational data. Data mining is more intuitive, allowing for increased insight beyond data warehousing. An implementation of data mining in an organization will serve as a guide to uncovering inherent trends and tendencies in historical information. It will also allow for statistical predictions, groupings and classifications of data.

Most companies collect, refine and deduce massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they become part of the system. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to many different types of predictive questions.

Data mining software allows users to analyze large databases to solve business decision-making problems. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time-consuming to resolve. Data mining is, in some ways, an extension of statistics, with a few artificial intelligence and machine learning twists thrown in. Like statistics, data mining is not a business solution, it is just a technology.

{mospagebreak title=When did data mining begin?}

Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature:

Massive data collection

Powerful multiprocessor computers

Data mining algorithms

The core components of data mining technology have been under development for decades, in research areas such as statistics, artificial intelligence, and machine learning. Today, the maturity of these techniques, coupled with high-performance relational database engines and broad data integration efforts, make these technologies practical for current data warehouse environments.

Data mining techniques can yield the benefits of automation on existing software and hardware platforms, and can be implemented on new systems as existing platforms are upgraded and new products developed. When data mining tools are implemented on high performance parallel processing systems, they can analyze massive databases in minutes. Faster processing means that users can automatically experiment with more models to understand complex data. High speed makes it practical for users to analyze huge quantities of data. Larger databases, in turn, yield improved predictions.

The Enterprise Data Warehouse as a Data Mining Source

An enterprise data warehouse is an excellent source for locating data to mine. Because of the nature of a data warehouse, most pertinent data that has been selected by analysts and business users should be located within the warehouse structure. In addition, this data is organized and stored for the explicit purpose of reporting. Through the data warehouse, further processing of OLAP data can occur. This processing can take the form of additional aggregations into multidimensional cubes (i.e., SQL Server 2000 Analysis Services Cubes) or undergo further segregation into organizational data marts.

The data mining process will utilize the data in the enterprise data warehouse, based on user selection and location of pertinent data, to test and validate a data mining model. It is important that the data be granular enough to analyze. Data that is characterized by significant aggregations beyond the original grain of the data will not produce significant results when used to create or test against a mining model.

An enterprise data warehouse is a prime source for data mining data because the data housed within the warehouse has already undergone significant data additions, modifications and cleansing based on business rules and processes. Refined Extraction Transformation and Loading (ETL) processes are required for reliable OLAP and enterprise data warehouse reporting. It is the ETL process which is responsible for cleansing bad data from the OLTP source, reclassifying or aggregating granular transactions from the operational system, and enriching the data with more readable and comprehensible data as opposed to the operational codes and abbreviations used in an OLTP system. Once the data has been sufficiently cleansed and refined, it is ripe for data mining.

Typical data warehousing implementations in organizations will allow users to ask and answer questions such as “How many sales were made, by territory, by sales person between the months of May and June in 1999?” Data mining will allow business decision makers to ask and answer questions, such as “Who is my core customer that purchases a particular product we sell?” or “Geographically, how well would a line of products sell in a particular region and who would purchase them, given the sale of similar products in that region?”

{mospagebreak title=How does data mining work?}

The technique that is used to perform these feats in data mining is called modeling. Modeling is simply the act of building a model in one situation where you know the answer and then applying it to another situation where you don’t. For instance, if you were looking for a sunken Spanish galleon on the high seas the first thing you might do is research the times when Spanish treasure had been found by others in the past. You might note that these ships often tend to be found off the coast of Bermuda and that there are certain characteristics to the ocean currents, and certain routes that have likely been taken by the ship captains in that era. You note these similarities and build a model that includes the characteristics that are common to the locations of these sunken treasures. With these models in hand you sail off looking for treasure where your model indicates it most likely might be given a similar situation in the past. Hopefully, if you have made a good model, you find your treasure.

The process of creating the data mining model is directly dependent on the methodology used to feed the entire data mining process. In essence, the method used to make data available to be mined governs the process used to create the data model. If a solutions architect designed a specialized OLAP data cube in Analysis Services to serve as the primary source of data mining data, then an OLAP data mining model would be created, as opposed to a relational data mining model.

This act of model building is thus something that people have been doing for a long time, certainly before the advent of computers or data mining technology. What happens on computers, however, is not much different from the way people build models. Computers are loaded up with lots of information about a variety of situations where an answer is known, and then the data mining software on the computer must run through that data and distill the characteristics of the data that should go into the model. Once the model is built it can then be used in similar situations where you don’t know the answer.

In what areas is data mining profitable?

A wide range of companies have deployed successful applications of data mining. While early adopters of this technology have tended to be in information-intensive industries such as financial services and direct mail marketing, the technology is applicable to any company looking to leverage a large data warehouse to better manage their customer relationships. Two critical factors for success with data mining are: a large, well-integrated data warehouse, and a well-defined understanding of the business process within which data mining is to be applied (such as customer prospecting, retention, campaign management, and so on).

Each of these examples has clear common ground. They leverage the knowledge about customers implicit in a data warehouse to reduce costs and improve the value of customer relationships. These organizations can now focus their efforts on the most important (profitable) customers and prospects, and design targeted marketing strategies to best reach them.

{mospagebreak title=Is Microsoft SQL Server helpful for data mining?}

Yes, of course. SQL Server 2000 includes Analysis Services with data mining technology which examines data, in relational data warehouse or data mart star schemas, as well as SQL Server 2000 Analysis Services OLAP cubes to uncover areas of interest to business decision makers and other analysts. Housed within this new segment of Analysis Services are data mining algorithms which enable the creation and analysis of data mining models.

OLAP Data Mining with SQL Server allows architects to reuse the results from data mining and incorporate the information into an OLAP Cube dimension for further analysis. Users can now browse and investigate data while finding the answers to questions that only data mining can answer. PivotTable Services support data mining through emulating a similar interface to that of OLAP pivot tables. In essence, users can utilize a specially designed Excel spreadsheet to interface with data mining services and run prediction queries against a data mining model, just as they do for multidimensional analysis of OLAP data cubes.

In addition, the Decision Support Objects (DSO) library has been extended in order to accommodate direct programmatic access to the data mining functionality present within OLAP services, as well as continued programmatic access to the OLAP cube resources. DSO now includes the MiningModel object as well as other extended support objects. Coupled with the new OLE DB for Data Mining provider, organizations wishing to implement custom applications which provide predictive data mining data to meet business processes, or those organizations that wish to access data mining data for the purpose of creating an operational data mining application, can now do so using the DSO library.