Check out my compact and minimal book “Practical Machine Learning with R and Python:Third edition- Machine Learning in stereo” available in Amazon in paperback($12.99) and kindle($8.99) versions. My book includes implementations of key ML algorithms and associated measures and metrics. The book is ideal for anybody who is familiar with the concepts and would like a quick reference to the different ML algorithms that can be applied to problems and how to select the best model. Pick your copy today!!

While coding in R and Python I found that there were some aspects that were more convenient in one language and some in the other. For example, plotting the fit in R is straightforward in R, while computing the R squared, splitting as Train & Test sets etc. are already available in Python. In any case, these minor inconveniences can be easily be implemented in either language.

R squared computation in R is computed as follows

Note: You can download this R Markdown file and the associated data sets from Github at MachineLearning-RandPythonNote 1: This post was created as an R Markdown file in RStudio which has a cool feature of including R and Python snippets. The plot of matplotlib needs a workaround but otherwise this is a real cool feature of RStudio!

1.1a Univariate Regression – R code

Here a simple linear regression line is fitted between a single input feature and the target variable

1.2a Multivariate Regression – R code

# Read crimes datacrimesDF<-read.csv("crimes.csv",stringsAsFactors=FALSE)# Remove the 1st 7 columns which do not impact outputcrimesDF1<-crimesDF[,7:length(crimesDF)]# Convert all to numericcrimesDF2<-sapply(crimesDF1,as.numeric)# Check for NAsa<-is.na(crimesDF2)# Set to 0 as an imputationcrimesDF2[a]<-0#Create as a dataframecrimesDF2<-as.data.frame(crimesDF2)#Create a train/test splittrain_idx<-trainTestSplit(crimesDF2,trainPercent=75,seed=5)train<-crimesDF2[train_idx, ]test<-crimesDF2[-train_idx, ]# Fit a multivariate regression model between crimesPerPop and all other featuresfit<-lm(ViolentCrimesPerPop~.,data=train)# Compute and print R Squaredrsquared=Rsquared(fit,test,test$ViolentCrimesPerPop)sprintf("R-squared for multi-variate regression (crimes.csv) is : %f", rsquared)

1.4 K Nearest Neighbors

The code below implements KNN Regression both for R and Python. This is done for different neighbors. The R squared is computed in each case. This is repeated after performing feature scaling. It can be seen the model fit is much better after feature scaling. Normalization refers to

Another technique that is used is Standardization which is

1.4a K Nearest Neighbors Regression – R( Unnormalized)

df2<-df1%>%select(cylinder,displacement, horsepower,weight, acceleration, year,mpg)df3<-df2[complete.cases(df2),]# Split train and testtrain_idx<-trainTestSplit(df3,trainPercent=75,seed=5)train<-df3[train_idx, ]test<-df3[-train_idx, ]# Select the feature variablestrain.X=train[,1:6]# Set the target for trainingtrain.Y=train[,7]# Do the same for test settest.X=test[,1:6]test.Y=test[,7]rsquared<-NULL# Create a list of neighborsneighbors<-c(1,2,4,8,10,14)for(iinseq_along(neighbors)){# Perform a KNN regression fitknn=knn.reg(train.X,test.X,train.Y,k=neighbors[i])# Compute R sqauredrsquared[i]=knnRSquared(knn$pred,test.Y)}# Make a dataframe for plottingdf<-data.frame(neighbors,Rsquared=rsquared)# Plot the number of neighors vs the R squaredggplot(df,aes(x=neighbors,y=Rsquared))+geom_point()+geom_line(color="blue")+xlab("Number of neighbors")+ylab("R squared")+ggtitle("KNN regression - R squared vs Number of Neighors (Unnormalized)")

Thanks for the code. I have been trying to import the data Boston (in Python). I have copied all your codes to my Python file and run but the message I received was “Error tokenizing data. C error: Expected 1 fields in line 31, saw 3”. Could you kindly check again. Thank you very much.