Tag Archives: pre-processing

Before starting to build on a predictive model in R, the following assumptions should be taken care off; Assumption 1: The parameters of the linear regression model must be numeric and linear in nature. If the parameters are non-numeric like categorical then use one-hot encoding (python) or dummy encoding (R) to convert them to numeric. Assumption…

There are usually several data preprocessing steps required before applying any machine learning algorithms to data. These are required by the nature of available data and algorithms. Below are listed few common instances where data preprocessing is required. Recall in this context, attributes are variables (columns in the data spreadsheet) and each row in this column is a…

“Gini index measures the extent to which the distribution of income or consumption expenditure among individuals or households within an economy deviates from a perfectly equal distribution. Thus a Gini index of 0 represents perfect equality, while an index of 100 implies perfect inequality.

In clustering one of major problem a researcher/analyst face are two question. First, does the given dataset has any clustering tendency?And second, how to determine an optimal number of clusters in a dataset validate the clustered results. In this post, I have attempted to answer this using R

Theoretical concepts are essentially the building blocks towards a bigger picture. In this post I have mentioned the fundamental blocks of data clustering which any data scientist would need to begin a data mining process. Although, its not complete but it should get you started.

Today, I will discuss on how to configure variables and excel connection manager in BIDS 2008. The same can be applied to BIDS 2012 and later too. I spent over a week trying to find out a method by which I could read multiple files into a database. Initially, I began with trying to read…

Today, I will discuss and elaborate on data processing in Weka 3.6 (it’s the same in version 3.7 too). This post is the second part in the series of “Data pre-processing with Weka”. If you have not seen my earlier post, you are directed to see that first. Continuing further, assuming that you have cleaned…