The blog of data mining and Analytical CRM, by Antonios Chorianopoulos

Thursday, 29 December 2011

Designing the Mining Datamart

The success of a
data mining project strongly depends on the breadth and quality of the
available data. That’s why the data preparation phase is typically the most
time consuming phase of the project.Data mining
applications should not be considered as one-off projects but rather as
continuous processes, integrated in the organization’s marketing strategy. Data
mining has to be ‘operationalized’. Derived results should be made available to
marketers to guide them in their everyday marketing activities. They should
also be loaded in the organization’s front line systems in order to enable
‘personalized’ customer handling. This approach requires the setting-up of well-organized
data mining procedures, designed to serve specific business goals, instead of
occasional attempts which just aim to cover sporadic needs.

In order to achieve this and become a
‘predictive enterprise’ an organization should focus on the data to be mined.
Since the goal is to turn data into actionable knowledge, a vital step in this
‘mining quest’ is to build the appropriate data infrastructure. Ad-hoc data
extraction and queries which just provide answers to a particular business
problem may soon end up to a huge mess of unstructured information. The proposed approach is to design and build
a central mining datamart that will serve as the main data repository for the
majority of the data mining applications. All relevant information should be
taken into account in the datamart design. Useful information form all
available data sources, including internal sources such as transactional,
billing and operational systems, and external sources such as market surveys
and third party lists, should be collected and consolidated in the datamart
framework. After all, this is the main idea of the datamart. To combine all
important blocks of information in acentral repository that can enable the
organization to have a complete a view of each customer.

The mining data
mart should:

nIntegrate data from all
relevant sources.

nProvide a complete view of the
customer by including all attributes that characterize each customer and
his/hers relationship with the organization.

nContain pre-processed
information, summarized at the minimum level of interest, for instance at a
product account or at a customer level. To facilitate data preparation for
mining purposes, preliminary aggregations and calculations should be integrated
in the building and updating process of the datamart.

nBe updated on a regular and
frequent basis to contain the current view of the customer.

nCover a sufficient time period
(enough days or months, depending on the specific situation) so that the
relevant data can reveal stable and non-volatile behavioural patterns.

nContain current and historical data
so that the view of the customer can be examined in different moments in time.
This is necessary since in many data mining projects analysts have to examine
historical data and analyze customers before the occurrence of a specific
event, for instance before purchasing an additional product or before churning
to competition.

nCover the requirements of the
majority of the upcoming mining tasks, without the need of additional
implementations and interventions from the IT. The designed datamart could not
possibly cover all the needs that may arise in the future. After all there is
always the possibility of extracting additional data from the original data
sources or for preparing the original data in a different way. Its purpose is
to provide fast access to commonly used data and to support the most important
and the most common mining tasks. There is a thin red line between
incorporating too much or too little information. Although there isn’t a rule-of
thumb suitable for all situations, it would be useful to have in mind that raw
transactional / operational data may provide depth of information but they also
slow down performance and complicate the data preparation procedure. On the
other end, high level aggregations may depreciate the predictive power hidden
in detailed data. In conclusion, the datamart should be designed as simple as
possible with the crucial mining operations in mind. Falling into the trap of
designing the ‘mother of all datamarts’ will most probably lead to a
complicated solution, no simpler than the raw transactional data it was
supposed to replace.