Data Mining

Unearthing untold trends and exploring future outcomes

What is data mining?

Some of today’s greatest explorers aren't charting the stars or visiting far-flung locales. Instead, data miners are helping map the future by looking at the glut of enterprise information - and how that data can inform a smarter, more effective enterprise.

How? They use data mining and predictive analytics to discover previously unknown patterns and to predict future outcomes. The most significant and unexpected results might change the very course of an organization.

Data mining is a critical method for dealing with bigger and more complex data. Data mining uncovers patterns in a sample set of data and then looks for the same pattern across a much larger universe of data. A final step applies predictive modeling to forecast outcomes.

Common data mining techniques include:

Descriptive modeling. Descriptive models classify elements into groups based on patterns and relationships. These models are used to support the development of predictive models.

Connect with the latest insights on analytics through related articles and research.

More on data mining

Data mining in action

Learn how you can use data mining to identify trends, patterns and relationships, while predictive analytics can be used to predict future outcomes.

Everyone has data to explore. And every industry can benefit from data mining. Data mining methods can be applied to a variety of issues in any industry. Many use cases center on analyzing the data available on customers and prospective customers to maximize sales, marketing and support opportunities, including:

Profitability and lifetime value - discovers the drivers for future value and how to appeal to customers in the future.

Application

What is Predicted

Resulting Business Decision

Credit scoring (banking)

Creditworthiness of new and existing sets of customers

How to assess and control risk within existing (or new) consumer portfolios.

Asset maintenance (utilities, manufacturing, oil and gas)

The real drivers of asset and equipment failure.

How to minimize operational disruptions and maintenance costs.

Health and condition management (health insurance)

Patients at risk of chronic, treatable/preventable illness

How to reduce healthcare costs and satisfy patients.

Fraud management (government, insurance, banks)

Unknown fraud cases and future risks.

How to decrease fraud losses and lower fals positives.

Drug discovery (life sciences)

Compounds that have desirable effects.

How to bring drugs to the marketplace quickly and effectively.

We had one customer who was spending about five and a half hours building an attribution model. With high-performance data mining, they’re now building it in about three minutes. Plus, we were able to get a factor of about two times more lift, meaning millions of dollars for the customer in terms of return on investment.

Wayne Thompson
Analytics Product Manager, SAS

Perspective: Kelley Blue Book

For years, Kelley Blue Book collected data for one specific goal: the annual publication of its blue book. Data was summarized and aggregated, algorithms applied, all on what amounted to a 12-month cycle.

As the company's kbb.com site evolved, its needs changed – and data mining played a large role in that change. "We needed to change the DNA of our company by moving from a traditional book publisher to an analytics powerhouse," said Dan Ingle, Vice President of Analytic Insights Technology at Kelley Blue Book. "Fact-based decisions have become our competitive strength. Whether or not to utilize analytics was no longer an option."

Kbb.com is now the most visited automotive website among new and used vehicle researchers, with more than 17 million monthly visits. The website automatically generates more than 27 million pricing reports each month.

According to Shawn Hushman, Vice President of Analytic Insights, SAS has played a big role in that success. "As needs arise, analytic experts contribute ideas and answer questions. They also quickly bring in SAS predictive analytics and data mining to develop, deploy and evaluate analytic models. We have slashed our time to results and enhanced our analytic insight."

Steps to successful data mining

Explorers use many different methods to uncover riches in their data. Typically, data mining is part of a broader analytic life cycle that includes data exploration, model building, model deployment and other steps. Defining the business problem is a critical step in the overall success of any data mining project. Once the business problem is defined, we recommend a five-step data mining process that works well for our customers:

Sample the data by creating a target data set large enough to contain the significant information.

Explore the data by searching for anticipated relationships, unanticipated trends and anomalies – to gain deeper understanding and ideas.

Modify the data by creating, selecting and transforming the variables to focus your model selection process.

Model the data by using analytical tools to search for a combination of data that reliably predicts a desired outcome.

Assess the data and models by evaluating the usefulness and reliability of the findings from the data mining process.

Best practices for data mining involve comparing different analytical techniques to determine which will produce the best model and therefore the best prediction. Some of these modeling techniques include decision trees, neural networks, gradient boosting, logistic regressions, memory-based reasoning and rule induction. Data mining also plays a role in the growing field of machine learning SAS Enterprise Miner is designed to handle these and many other techniques.