What is Data Mining? Basics and its Techniques.

The foundation of the fourth industrial revolution will largely depend upon Data and Connectivity. Analysis Services capable of developing or creating data mining solutions will play a key role in this regard. It could assist in analyzing and predicting outcomes of customer purchasing behavior for targeting potential buyers. Data will become a new natural resource and the process of extracting relevant information from this unsorted data will assume immense importance. As such, proper understanding of the term – Data Mining, its processes, and application could help us in developing a holistic approach about this buzzword.

Data Mining Basics and its Techniques

Data mining, also known as Knowledge Discovery in Data (KDD) is about searching large stores of data to uncover patterns and trends that go beyond simple analysis. This, however, is not a single step solution but a multi-step process and completed in various stages. These include:

1] Data gathering and Preparation

It starts with data collection and its proper organization. This helps in significantly improving the chances of finding the information that can be discovered through data mining

2] Model Building and Evaluation

The second step in data mining process is the application of various modeling techniques. These are used to calibrate the parameters to optimal values. Techniques employed largely depend on analytic capabilities required to address a gamut of organizational needs and to arrive at a decision.

Let us examine some data mining techniques in brief. It is found that most of the organizations combine two or more data mining techniques together to form an appropriate process that meets their business requirements.

Data Mining Techniques

Association – Association is one of the widely-known data mining techniques. Under this, a pattern is deciphered based on a relationship between items in the same transaction. Hence, it is also known as relation technique. Big brand retailers rely on this technique to research customer’s buying habits/preferences. For example, when tracking people’s buying habits, retailers might identify that a customer always buys cream when they buy chocolates, and therefore suggest that the next time that they buy chocolates they might also want to buy cream.

Classification – This data mining technique differs from the above in a way that it is based on machine learning and uses mathematical techniques such as Linear programming, Decision trees, Neural network. In classification, companies try to build a software that can learn how to classify the data items into groups. For instance, a company can define a classification in the application that “given all records of employees who offered to resign from the company, predict the number of individuals who are likely to resign from the company in future.” Under such a scenario, the company can classify the records of employees into two groups that namely “leave” and “stay”. It can then use its data mining software to classify the employees into separate groups created earlier.

Clustering – Different objects exhibiting similar characteristics are grouped together in a single cluster via automation. Many such clusters are created as classes and objects (with similar characteristics) are placed in it accordingly. To understand this better, let us consider an example of book management in the library. In a library, the vast collection of books is fully cataloged. Items of the same type are listed together. This makes it easier for us to find a book of our interest. Similarly, by using the clustering technique, we can keep books that have some kinds of similarities in one cluster and assign it a suitable name. So, if a reader is looking to grab a book relevant to his interest, he only has to go to that shelf instead of searching the entire library. Thus, clustering technique defines the classes and puts objects in each class, while in the classification techniques, objects are assigned into predefined classes.

Prediction – The prediction is a data mining technique that is often used in combination with the other data mining techniques. It involves analyzing trends, classification, pattern matching, and relation. By analyzing past events or instances in a proper sequence one can safely predict a future event. For instance, the prediction analysis technique can be used in the sale to predict future profit if the sale is chosen as an independent variable and profit as a variable dependent on sale. Then, based on the historical sale and profit data, one can draw a fitted regression curve that is used for profit prediction.

Decision trees – Within the decision tree, we start with a simple question that has multiple answers. Each answer leads to a further question to help classify or identify the data so that it can be categorized, or so that a prediction can be made based on each answer. For example, We use the following decision tree to determine whether or not to play cricket ODI: Data Mining Decision Tree: Starting at the root node, if the weather forecast predicts rain then, we should avoid the match for the day. Alternatively, if the weather forecast is clear, we should play the match.

Data Mining is at the heart of analytics efforts across a variety of industries and disciplines like communications, Insurance, Education, Manufacturing, Banking and Retail and more. Therefore, having correct information about it is essential before apply the different techniques.

The author Hemant Saxena is a post-graduate in bio-technology and has an immense interest in following Windows, Office and other technology developments. Quiet by nature, he is an avid Lacrosse player. Creating a System Restore Point first before installing a new software, and being careful about any third-party offers while installing freeware is recommended.