Need more Help with R for Machine Learning?

Good Background For Machine Learning in R

You can just dive into R. Go for it.

In my opinion though, I think you will get a lot more out of it if you have some background.

R is an advanced platform and you can get a lot out of it as a beginner. But, if you have a little machine learning and a little programming as a foundation, R will become a superpower for building accurate predictive models very quickly.

General Suggestions

Here are some suggestions for getting the most out of getting started with machine learning in R. I think these are reasonable for a modern developer interested in machine learning.

A developer who knows how to program. This helps because it won’t be a big deal to pick up the syntax of R, which at times can be a little odd. It is also helpful to know who to whip up scripts or script-lets (mini scripts) to do this or that task. R is a programming language after all.

Interested in predictive modeling machine learning. Machine learning is a big field that covered a variety of interesting algorithms. Predictive modeling is a subset that is only concerned with building models that make predictions on new data. Not explaining the relationships between data, nor learning from data in general. I predictive modeling is where R really shines as a platform for machine learning.

Familiar with machine learning basics. You understand machine learning as induction problem where all algorithms are really just trying to estimate and underlying mapping function from an input space to an output space. All predictive machine learning makes sense through this lens as do strategies of searching for good and best machine learning algorithms, algorithm parameters and data transforms.

Specific Suggestions

The approach I layout in the next section also makes some assumptions about your background.

You are not an absolute beginner in machine learning. You could be, and the approach may work for you, but the you will get a lot more out of it if you have some additional suggested background.

You want to use a top-down approach to studying machine learning. This is the approach I teach where rather than starting with theory and principles and eventually touch in practical machine learning if there is time, that you start with the goal of working through a project end-to-end and research details as you need them in order to deliver better results.

You are familiar with the steps in a predictive modeling machine learning project. Specifically:

You are at least familiar with some machine learning algorithms. Or you may know how to pick them up quickly, for example using the algorithm description template method. I think learning the details of how and why machine learning algorithms is a separate task from learning how to use those algorithms on a machine learning platform like R. They are often conflated in books and course at the determinant of learning.

You can learn more about how to learn any machine learning algorithm using the template method here:

How To Learn Machine Learning in R

This section lays out a process that you can use to get started with building machine learning predictive models on the R platform.

It is divided into two parts:

Map the tasks of a machine learning project onto the R platform.

Work through predictive modeling projects using standard datasets.

1. Map Machine Tasks Onto R

You need to know how to do specific tasks of a machine learning on the R platform. Once you know how to complete a discrete task using the platform and get a result reliably, you can do it again and again on project after project.

This process is straightforward:

List out all of the discrete tasks of a predictive modeling machine learning project.

Create recipes to complete the task reliably that you can copy-paste as a starting point on future projects.

Add to and maintain the recipes are your understanding of the platform and machine learning improves.

Predictive Modeling Tasks

Below is a minimum list of predictive modeling tasks you may want to map to R the R platform and create recipes. This not complete, but does cover the broad strokes of the platform:

Overview of R syntax

Prepare Data

Loading Data

Working With Data

Data Summarization

Data Visualization

Data Cleaning

Feature Selection

Data Transforms

Evaluate Algorithms

Resampling Methods

Evaluation Metrics

Spot-Check Algorithms

Model Selection

Improve Results

Algorithm Tuning

Ensemble Methods

Present Results

Finalize Model

Make New Predictions

You will notice the first task is an overview of R syntax. As a developer, you need to know the basics of the language before you can do anything. Such as assignment, data structures, flow control and creating and calling functions.

Library of Standalone Recipes

I recommend creating recipes that are standalone. That means that each recipe is a complete program that has everything it needs to achieve the task and produce an output. This means that you can copy it directly into a future predictive modeling project.

You can store the recipes in a directory or on GitHub.

2. Small Predictive Modeling Projects

Recipes for common predictive modeling tasks with machine learning are not enough.

Again, this is where most books and courses stop. They leave it to you to piece together the recipes into end-to-end projects.

You need to piece the recipes together into end-to-end projects. This will teach and show you how to actually deliver a result using the platform. I recommend only using small well understood machine learning datasets from the UCI Machine learning repository.

These datasets are available for free as CSV downloads, and most are available directly in R by loading third party libraries. These datasets are excellent for practicing because:

They are small, meaning they fit into memory and algorithms can model them in reasonable time.

They are well behaved, meaning you often don’t need to do a lot of feature engineering to get a good result.

There are standards, meaning that many people have used them before and you can get ideas of good algorithms to try and good results you should expect.

I recommend at least three projects:

Hello World Project (iris flowers). This is a quick pass through the project steps without much tuning or optimizing on a dataset that is widely used as the hello world of machine learning (more on the iris flowers dataset).

Regression end-to-end. Work through each step of the process with a regression problem (e.g. the Boston housing dataset).

Add and Maintain Recipes

Machine learning with R does not stop at working through a few small standard datasets. You need to take on more and different challenges.

Standard Datasets: You could practice on additional standard datasets from the UCI Machine Learning repository, overcoming the challenges of different problem types.

Competition Datasets: You could try working through some more challenging datasets, such as those from past Kaggle competitions or those from past KDDCup challenges.

Your Own Projects: Ideally, you need to start working through your own projects.

All the while you will be dipping into help, adapting your scripts and learning how to get more out of machine learning on R.

It is important that you fold this knowledge back into your catalog of machine learning recipes. This will let you leverage this knowledge quickly on new projects and contribute greatly to your skill and speed at developing predictive models.

Your Outcomes From This Process

You could work through this process in one weekend. By the end of that weekend, you will have the recipes and project templates that you can use to start modeling your own problems using machine learning in R.

You will go from a developer that is interested in machine learning on R to a developer who has the resources and capability to work through a new dataset end-to-end using R and develop a predictive model to be presented and deployed.

Specifically, you will know:

How to achieve the subtasks of a predictive modeling problem in R.

How to learn new and different sub tasks in R.

How to get help with R.

How to work through a small to medium sized dataset end-to-end.

How to deliver a model that can make predictions on new unseen data.

From here you can start to dive into the specifics of the functions, techniques and algorithms used with the goal of learning how to use them better in order to deliver more accurate predictive models, more reliably in less time.

Summary

In this post you discovered a step-by-step process that you can use to study and get started with machine learning in R.

The three high-level steps of the process are:

Map the steps of a predictive modeling process onto the R platform with recipes that you can reuse.

Work through small standard machine learning datasets to piece the recipes together into projects.

Work through more and different datasets, ideally your own, and add to your library of recipes.

You also discovered he philosophy behind the process and the reasons why this process is the best process for you.

Next Step

Do you want to get started in machine learning with R?

Download and install R right now.

Use the process outline above, limit yourself to one weekend and go as far as you can.

Report back. Leave a comment. I would love to hear how you went.

Do you have a question about this process? Leave a comment, I’ll do my best to answer it.