Data Mining Digs In

The underwriters at Farmers Insurance Group know a few things
about drivers. They know young people crash their automobiles more
than older drivers, and that sports cars are involved in more
accident claims than station wagons. But with 10 million auto
policies in its database, Farmers suspected there was a lot more to
be learned about its customers. It's easy to run statistics by a
driver's age or the type of vehicle, says Tom Boardman, assistant
actuary for personal lines pricing at Farmers in Los Angeles, but
you can't define a customer by any one characteristic. "It's
trickier than that," he says. "We wanted to see the interaction of
five or six variables."

Not an easy task, given that each Farmers policy has about 200
individual pieces of information tied to it-everything from the
type of car to the number of kids in the policy holder's household.
Enter data mining. With help from the specialists at IBM, Farmers
pulled 2 million policies from its database to run a pilot test.
Then Boardman and his crew stepped aside, letting IBM's
DecisionEdge software go to work, detecting interesting patterns
among the records on its own.

Early findings, including the fact that young people file more
claims than older drivers, left Farmers unimpressed. Then, Boardman
says, came the "good stuff." Exhibit A: sports car owners. Think of
one and you probably imagine a twentysomething single guy flaming
down the highway in his hot rod. In fact, there were plenty of just
those types in Farmers' database, but there was another, previously
unnoticed, niche of sports car enthusiasts: married boomers with a
couple of kids and a second family car, maybe a minivan, parked in
the driveway. Claim rates among these customers were much lower
than other sports car drivers, yet they were paying the same
surcharges. Armed with this information, Farmers relaxed its
underwriting rules and cut rates on certain sports cars for people
who fit the profile. Today the company is planning another data
mining expedition, this time to identify car insurance customers
who are good prospects for Farmers homeowner policies. "We're
sitting on a gold mine," Boardman says of the company's database.
"It can do much more than just send out the bills."

Farmers Insurance isn't the only company prowling in the mining
pits. >From banks to e-commerce players, health care providers
to telecoms, many companies today are plunging into their databases
to determine who their best customers are and how better to market
products and services to them. Most are wading into their data,
hoping to substantiate hypotheses or hunches with hard numbers. But
others, like Farmers, are venturing into the realm of automated
discovery, relying on sophisticated, computer-driven methods that
use artificial intelligence-such as neural networks, association
rules, and genetic algorithms-to flesh out meaningful, actionable
information that can make a difference to the bottom line. "It's
the same algorithm, whether you're trying to predict volcanoes on
Venus or a customer's propensity to buy high heels," says Usama
Fayyad, senior researcher in the decision theory and adaptive
systems group of Microsoft Research.

In a new survey of corporations with data warehouses conducted
by consulting firm META Group, 54 percent of respondents said they
plan to purchase data mining and other knowledge discovery tools
this year. That's a 20 percent jump since 1996. But there's still a
long way to go: Only 8 percent of respondents currently use data
mining software. Meanwhile, two Internet powerhouses that have been
prescient in the ways of the Web have gobbled up entire companies
with data mining expertise. In April, Amazon announced it would buy
Alexa Internet, a three-year-old ad-supported Web navigation
service that tracks Internet users as they surf, provides
information about the pages they view, and suggests other sites
they might like to visit. Alexa now boasts 13 terabytes of
clickstream data -roughly equivalent to 722 million copies of this
18-kilobyte article-and the ability to detect patterns buried in
it. And in March, Yahoo purchased HyperParallel, a developer of
data mining software. The mission at both Amazon and Yahoo is
clear: to earn value from the bits and bytes of data collected on
Netizens every day. By 2002, companies will spend $113 billion to
analyze such customer data, including mining it, estimates Palo
Alto Management Group, a California-based research firm.

What's driving the growth? For one thing, the sheer amount of
data in the back room. According to the META Group, data warehouses
with more than 1 terabyte will increase from 19 percent of all
installations to 30 percent, becoming the largest segment this
year. As a result, companies will be able to capture more customer
information and retain it longer.

Catalog retailer Fingerhut Corp. in Minnetonka, Minnesota, began
scrutinizing its database when its first customer placed an order
back in 1948. Today, the direct marketer puts out some 130
different catalogs and touts a 6-terabyte data warehouse with
information on more than 65 million customers. Hundreds of in-house
users from merchandising, marketing, and analytics query the
database daily and can crunch more than 3,000 variables on the
company's most active 12 million customers (those who've purchased
in the last four years). That list of variables includes everything
from specific product transactions to demographic data collected
from customer surveys and outside research vendors like The Polk
Company.

More than 300 predictive models are now used to scour the
terabytes at Fingerhut. One model predicts the likelihood of
someone responding to a targeted electronics catalog, while another
scores the chances of a customer returning her merchandise. One or
more models is run on each potential recipient of every catalog
that goes out the door, says Bill Flach, Fingerhut's director of
corporate research and analysis.

Data mining has also led to the creation of new catalogs. In one
analysis, researchers found that customers who changed their
residence tripled their purchasing in the three months after their
move. Their product choices also followed a pattern, with furniture
and decorations topping the shopping list. That may seem like a
no-brainer, but to Fingerhut, it was a valuable nugget to
capitalize on. The company developed a new "mover's catalog" filled
with targeted products for this consumer segment. At the same time,
it saved money by not mailing other catalogs to these folks right
after they relocated. Flach declines to give numbers on the mover's
catalog, but says it's one of the most successful products from
their data mining efforts.

Clustering customers into segments is a top business objective
in data mining, says Mark Brown, global product manager at SAS
Institute in Cary, North Carolina. Others include customer
retention, acquisition, and cross-selling opportunities. "You need
to get that customer-centric view into your data warehouse," he
says. "Many companies don't have a scope that's narrow enough when
they get started."

In other words, don't expect the analysts in IT to zoom in on
the business strategy. "Data mining projects that start on the IT
side are a recipe for disaster," says Gregory Piatetsky-Shapiro,
director and chief scientist of Boston-based Knowledge Stream
Partners, a data mining consulting firm for the financial services
industry. "The problems they're trying to solve may not be the
important ones."

That's not a concern at Eddie Bauer in Redmond, Washington. No
longer do the analysts in IT hold the magic keys to unlocking
hidden patterns in the retailer's database. Director of circulation
Kevin Hillstrom and others in the company's marketing department
now run queries right from their desktops to determine which
promotions to offer in their stores and which catalog to send to a
customer who hasn't bought in a while. Simplicity is key to end
users, says Microsoft's Fayyad. "The user wants a solution, they
want something that talks their language," he says.

For such a user-friendly desktop system to work, however, the
data has to be ready to be mined. Data preparation, analysts
contend, can eat up the most time-and create the most headaches-in
any mining project. SAS Institute's Brown estimates that in any
given analysis, companies spend 80 percent of their time just
readying the data.

Stephen Coles, assistant director of research and development at
American Century Investments, a financial-services firm in Kansas
City, Missouri, knows this firsthand. To start data mining, ACI
pulled 2 million customer records out of its 25 million master
file, each with roughly 800 variables attached. Then came a snag:
Not every record contained all 800 pieces of information. Some
lacked occupational data; others had empty blanks on valuable
credit card information. Throwing out these records before mining
the data would have eliminated too much of the base-and skewed the
results, Coles says. To fix the problem, the company used SPSS
Missing Value Analysis to analyze patterns within the large data
sets and impute the missing values. Estimated time to fill in the
blanks? Thirty minutes. ACI's data mining has led to
better-targeted direct mail campaigns. One mailed to high-end
customers drew a 7 percent response rate, in part because the
company could identify the right recipients in its database.

But when you have hundreds-or thousands-of demographic and other
variables to choose from, how do you know which ones will help you
understand customers better? Many analysts agree that past behavior
is more predictive than age, sex, and income, but that broad
definition doesn't narrow the number of fields very much. Isolating
just the right handful of variables for making forecasts is still a
work in progress, says Dr. Ashok Srivastava, chief technologist of
IBM's knowledge discovery consulting group. "Let's say you're
trying to predict whether the price of IBM stock is going to go up
or down in the next ten minutes," he says. "There are a huge number
of variables that you could use to make that forecast, including
recent news, past behavior of IBM stock, and the behavior of
similar stocks, to name a few. Data mining gives you the ability to
sift through thousands of potential variables to isolate a few key
variables that are highly predictive." In some cases, Srivastava
says, the most predictive variables may not even exist in the
database but need to be created using data that is already
available. "This is a significant research activity which I think
holds the key to making good predictions," he says.

Still missing from data mining models is the element of time,
adds Srivastava. "There's a lot of emphasis on analyzing static
databases," he says. "But the real world has many dynamic aspects
to it, just like the stock market moves minute by minute. You need
to build models that characterize things that change in time." A
large bank, he suggests, has many outlets to reach customers-the
telephone, ATMs, the Internet, branches inside grocery stores, and
so on. The bank may run an advertising campaign to promote its
branches in grocery stores, but then discover a few months later
that business in the branches is still slow. Meanwhile, its
Internet site is booming. Srivastava believes that one day, methods
will be able to forecast such business shifts. If the bank can
predict that business in the grocery stores will shrivel up in a
few months, it can invest its ad dollars in promoting outlets with
more potential, like the Internet.

Some companies hope data mining can help them anticipate
customer needs. Bank of America recently completed a
proof-of-concept project with IBM's data mining tools to delve into
its database of corporate accounts. "Each corporate relationship
can involve potentially millions of transactions," says Fritz
Offensend, vice president of financial engineering systems at Bank
of America. "There's probably no one in the bank who knows every
single deal we've cut with large clients. We want to figure out
what products they might need next, what deals we should start to
discuss in the coming months." In a pilot test, Bank of America
pulled together the records of thousands of corporate clients, each
with roughly 400 variables. Offensend declined to comment on the
trends and correlations uncovered in the study but says they
definitely proved the worthiness of data mining. "The real payoff
will be when we're chalking up profits based on our insights," he
adds.

Still, not everyone is convinced that finding less-than-obvious
patterns in a database breeds business value. Skeptics love to tell
the story of a grocery chain that found a correlation between
purchases of diapers and beer. An interesting discovery that
probably would have gone undetected without data mining, but
ultimately, it was deemed unactionable. For one thing, the company
decided it would be too expensive to move inventory around in its
stores. "Data mining produces too many answers that may not be
causal. How do you know which are actionable and which are not?"
asks Mike Duffy, director of analytics and database services at
Kraft Foods. "There is no judgment factor."

Of course, data mining requires a kind of explorer's
mentality-you never know what you're going to find and what changes
that might lead to in other areas of the business. Consider retail:
In 1974, a pack of Wrigley's gum was waved over a glass panel and
became the first product ever scanned at a supermarket checkout.
Thanks to frequent-shopper programs and ongoing advances in scanner
technology, grocery chains have been swimming in customer data for
several years. Trouble is, real-world applications are only now
catching up with the innovations. One analyst recalls a grocer who
threw out reams of customer data because the information was
outdated. The grocer could have used that data when it was fresh
off the scanner to offer targeted promotions to customers. Instead,
the information gathered dust in the back office. Today, a few
companies are truly leveraging that knowledge to boost business.
Dick's Supermarkets, an eight-store chain in Wisconsin, uses
transaction data culled from its loyalty-card program to
personalize shopping lists it mails to nearly 30,000 members. "Many
retailers don't have the infrastructure to mine their data," says
Steven Kingsbury, president of Promotion Decisions, a
Cincinnati-based company that analyzes coupon promotions based on
household scanner panel data. "There's a real opportunity for
manufacturers to help them build those techniques. That's going to
be part of the overall competitive situation."

But even its strongest advocates believe data mining is only
part of the solution when it comes to building customer
relationships. Another critical ingredient: people with solid
knowledge of the marketplace. "The effective competitors in the
future will be the ones who consolidate theiranalytical insights
with business judgment that's not captured in any database," says
Offensend from Bank of America. "You need to tap the knowledge of
the computer-and the knowledge of the human."