Thursday, 12 December 2013

DATA MINING

Data mining, an
interdisciplinary subfield of computer science, is the computational process of
discovering patterns in large data sets involving methods at the intersection
of artificial intelligence, machine learning, statistics, and database systems.
The overall goal of the data mining process is to extract information from a
data set and transform it into an understandable structure for further use
Aside from the raw analysis step, it involves database and data management
aspects, data pre-processing, model and inference considerations,
interestingness metrics, complexity considerations, post-processing of
discovered structures, visualization, and online updating.

The term is a buzzword, and is
frequently misused to mean any form of large-scale data or information
processing (collection, extraction, warehousing, analysis, and statistics) but
is also generalized to any kind of computer decision support system, including artificial
intelligence, machine learning, and business intelligence. In the proper use of
the word, the key term is discovery, commonly defined as "detecting
something new". Often the more general terms "(large scale) data
analysis", or "analytics" – or when referring to actual methods,
artificial intelligence and machine learning – are more appropriate.

The actual data mining task is
the automatic or semi-automatic analysis of large quantities of data to extract
previously unknown interesting patterns such as groups of data records (cluster
analysis), unusual records (anomaly detection) and dependencies (association
rule mining). This usually involves using database techniques such as spatial
indices. These patterns can then be seen as a kind of summary of the input
data, and may be used in further analysis or, for example, in machine learning
and predictive analytics. For example, the data mining step might identify
multiple groups in the data, which can then be used to obtain more accurate
prediction results by a decision support system. Neither the data collection,
data preparation, nor result interpretation and reporting are part of the data
mining step, but do belong to the overall KDD process as additional steps.

The related terms data
dredging, data fishing, and data snooping refer to the use of
data mining methods to sample parts of a larger population data set that are
(or may be) too small for reliable statistical inferences to be made about the
validity of any patterns discovered. These methods can, however, be used in
creating new hypotheses to test against the larger data populations.

Data mining uses information
from past data to analyze the outcome of a particular problem or situation that
may arise. Data mining works to analyze data stored in data warehouses that are
used to store that data that is being analyzed. That particular data may come
from all parts of business, from the production to the management. Managers
also use data mining to decide upon marketing strategies for their product.
They can use data to compare and contrast among competitors. Data mining interprets
its data into real time analysis that can be used to increase sales, promote
new product, or delete product that is not value-added to the company.