Transcription

1 Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor CSE, Sri Shakthi Institute Of Engineering and Technology, Coimbatore Abstract Due to high competition in the business field, it is essential to consider the customer relationship management of the enterprise. Here analyze the massive volume of customer data and classify them based on the customer behaviours and prediction. The classifier will predict the customers belongs to which class that should have highest posterior probability. The valuable customer information accumulated by commercial banks, which is used to identify customers and provide decision support. The data preprocessing techniques like data cleaning and data reduction can be applied for data preparation and the dates were converted into a numerical form. A data model is generated based upon the history of the customers in the bank. Then the sample data is classified by using the Naïve Bayesian classification algorithm and placed them into the appropriate class based upon the posterior probability and based upon the posterior probability the percentage of loan sanction risk for the customers can be predicted. Keywords CRM, Data Cleaning, Data Preprocessing, Data Reduction, Naive Bayesian Classification I. INTRODUCTION Due to high competition in the business field, it is essential to consider the customer relationship management of the enterprise. Here analyse the massive volume of customer data and classify them based on the customer behaviours and prediction. Customer relationship management is mainly used in sales forecasting and banking areas. Data mining provides the technology to analyse mass volume of data and/or detect hidden patterns in data to convert raw data into valuable information. Data mining has attracted a great deal of attention in the information industry and in society as a whole in recent years, due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. The information and knowledge gained can be used for applications ranging from market analysis, fraud detection, and customer retention, to production control and science exploration. Data mining is a step in the knowledge discovery process consisting of particular data mining algorithms. It is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining is primarily used today by companies with a strong consumer focus retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Data mining consists of five major elements: Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table. Data mining is the extraction of required data or information from large databases. The key ideas are to use data mining techniques to classify the customer data according to the posterior probability. Here the data mining concept is used to perform the classification and prediction of loan. Data mining commonly involves four classes of tasks: Clustering - is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data. Classification - is the task of generalizing known structure to apply to new data. For example, an program might attempt to classify an as legitimate or spam. Common algorithms include decision tree learning, nearest neighbour, naive Bayesian classification, neural networks and support vector machines... Regression - Attempts to find a function which models the data with the least error. 314

2 Association rule learning- Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. Additional data cleaning can be performed to detect and remove redundancies still occur in the results obtained after data integration. II. RELATED WORK Here initially need to create the account for each customers in the bank and they should enter their personal details, income details, insurance details, loan details and the account information of the corresponding customer in other banks. The validation and authentication of the customers were done by the bank manually. Need to store these details in the database for further accessing and for making decisions. After creating the account for each customer need to prepare the details of customer for data mining. Before performing data mining need to perform the processes like data preparation and data cleaning. In data preparation module the details collected from the customers are converted into a format that is suitable for data mining. In this module the data reprocessing techniques like data cleaning and data reduction were applied for conversion. In data preparation need to select only the wanted fields from each table in order to perform the data mining. After this combine the needed fields into a common table and convert all continuous data into numerical data. In data cleaning need to remove the noise data from the common table. Data cleaning procedure is used to clean the data by filling the missing values, smoothing noisy data, identifying or removing outliers and resolving inconsistencies. If the user is believe that the data are dirty, and then they will not trust the results of the data mining process that has been applied to this data.furthermore, dirty data can cause confusion for the mining procedure, resulting in unreliable output. Although most mining routines have some procedures for dealing with incomplete or noisy data, they are not always robust. Instead, they may concentrate on avoiding over fitting the data to the function being modeled. Therefore, before performing the data mining we need to run data through some data cleaning routines. In addition to data cleaning, step must be taken to help and avoid redundancies during data integration. Typically, data cleaning and data integration are performed as a data preprocessing step when preparing for a data mining. Fig1.Block Diagram The customer data may contain certain attribute that will take larger values. Therefore if the attributes are left UN normalized, we need to normalize that. Furthermore, it would be useful for analysis to obtain aggregate information. The data transformation operations, such as normalization and aggregation, are additional data preprocessing procedures that would contribute toward the success of the mining process. Data reduction produces a reduced representation of the data set that is much smaller in volume and that should produce the same result. There are many methods used for data reduction they are data aggregation, attribute subset selection, dimensionality reduction and numerosity reduction. In data aggregation made a data cube corresponding to data. Attribute subset selection is used to remove the irrelevant attributes from table through correlation analysis. Dimensionality reduction makes use of encoding schemes such as minimum length encoding or wavelets encoding. In numerosity reduction, replacing the data with alternate or smaller representations such as clusters or parametric models. 315

3 In bank it is necessary to analyze the customer data in order to learn which loan applicants are safe and which are risky for the bank. The process of analyze the data is known as data classification, here a model or classifier is constructed to predict categorical labels such as safe or risky for the loan application data. Consider there are m classes, C1, C2,, Cm. Given a tuple, X need to be classified and predict, the classifier will predict that X belongs to the class having the highest posterior probability, conditioned on X. That is, the naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and only if III. CLASSIFICATION AND PREDICTION Data classification is a two step process. First step is the learning process. In this the training data are analyzed by a classification algorithm here, the class label attribute is loan decision. The second process is classification; here test data are used to estimate the accuracy of the classification algorithm. If the accuracy of the classification is better than that algorithm can be applied to the classification of new data tuples. Data prediction is a two-step process, similar to that of data classification. However, for prediction, we lose the terminology of class label attribute because the attribute For which values are being predicted is continuousvalued (ordered) rather than categorical (discrete-valued and unordered). The attribute can be referred to simply as the predicted attribute. The accuracy of a predictor is estimated by computing an error based on the difference between the predicted value and the actual known value of y for each of the test tuples, X. A. Naive Bayesian Classifier Algorithm Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given tuple belongs to a particular class. Bayesian classification is based on Baye s theorem. Naïve Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence. It is made to simplify the computations involved and, in this sense, is considered naïve. Bayesian belief networks are graphical models, which unlike naïve Bayesian classifiers allow the representation of dependencies among subsets of attributes. Bayesian belief networks can also be used for classification. This algorithm will works as follows Consider let D be a training set of tuples and their associated class labels. As usual, each tuple is represented by an n-dimensional attribute vector, X = (x1, x2, xn), here n measurements made on the tuple from n attributes, respectively, A1, A2, An. Then we need to maximize the value of P (Ci X). The class Ci for which P (Cij X) is maximized is called the maximum posteriori hypothesis. By Baye s theorem As P(X) is constant for all classes, only P (Xj Ci) P (Ci) need be maximized. If the class prior probabilities are not known, then it is commonly assumed that the classes are equally likely, that is, P (C1) = P (C2) =. = P (Cm), and we would therefore maximize P (Xj Ci). Otherwise, we maximize P (Xj Ci) P (Ci). Given data sets with many attributes, it would be extremely computationally expensive to compute P (Xj Ci). In order to reduce computation in evaluating P (Xj Ci), the naive assumption of class conditional independence is made. This presumes that the values of the attributes are conditionally independent of one another, given the class label of the tuple. We can easily estimate the probabilities P (x1 Ci), P (x2 Ci), P (xn Ci) from the training tuples. Here in this example I have consider only few fields similarly we need to consider many fields to predict the loan sanction. Let C1 correspond to the class of loan sanction= yes and C2 correspond to the class of loan sanction= no. We need to classify and the test data X= (acctype=sb, age=40, tax=yes, customer, type=staff, qualification=u.g, income=6lakhs) 316

Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate

CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

Q2 (a) What is the difference between discrimination and classification? Discrimination differs from classification in that the former refers to a comparison of the general features of target class data

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell Leo_Pipino@UML.edu David Kopcso Babson College Kopcso@Babson.edu Abstract: A series of simulations

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are