In one of my previous posts called Machine learning resources for .NET developers, I introduced a machine learning library called numl.net. numl.net is a machine learning library for .NET created by Seth Juarez. You can find the library here and Seth's blog here. When I began researching the library, I learned quickly that one of Seth's goals in writing numl.net was to abstract away the complexities that stops many software developers from trying their hand at machine learning. I must say that in my opinion, he has done a wonderful job in accomplishing this goal!
Tutorial
I've decided to throw together a small tutorial to show you just how easy it is to use numl.net to perform predictions. This tutorial will use structured learning by way of a decision tree to perform predictions. I will use the infamous Iris Data set which contains data 3 different types of Iris flowers and the data that defines them. Before we get into code, let's look at some basic terminology first.
With numl.net you create a POCO (plain old CLR object) to use for training as well as predictions. There will be properties that you will specify known values (features) so that you can predict the value of an unknown property value (label). numl.net makes identifying features and labels easy, you simply mark your properties with the [Feature] attribute or the [Label] attribute (there is also a [StringLabel] attribute as well). Here is an example of the Iris class that we will use in this tutorial.
using numl.Model;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace NumlDemo
{
/// <summary>
/// Represents an Iris in the infamous Iris classification dataset (Fisher, 1936)
/// Each feature property will be used for training as well as prediction. The label
/// property is the value to be predicted. In this case, it's which type of Iris we are dealing with.
/// </summary>
public class Iris
{
//Length in centimeters
[Feature]
public double SepalLength { get; set; }
//Width in centimeters
[Feature]
public double SepalWidth { get; set; }
//Length in centimeters
[Feature]
public double PetalLength { get; set; }
//Width in centimeters
[Feature]
public double PetalWidth { get; set; }
//-- Iris Setosa
//-- Iris Versicolour
//-- Iris Virginica
public enum IrisTypes
{
IrisSetosa,
IrisVersicolour,
IrisVirginica
}
[Label]
public IrisTypes IrisClass { get; set; } //This is the label or value that we wish to predict based on the supplied features
}
}
As you can see, we have a simple POCO Iris class, which defines four features and one label. The Iris training data can be found here . Here is an example of the data found in the file.
5.1,3.5,1.4,0.2,Iris-setosa
6.3,2.5,4.9,1.5,Iris-versicolor
6.0,3.0,4.8,1.8,Iris-virginica
The first four values are doubles which represent the features Sepal Length, Sepal Width, Petal Length, Petal Width. The final value is an enum that represents the label that we will predict which is the class of Iris.
We have the Iris class, so now we need a method to parse the training data file and generate a static List<Iris> collection. Here is the code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace NumlDemo
{
/// <summary>
/// Provides the services to parse the training data files
/// </summary>
public static class IrisDataParserService
{
//provides the training data to create the predictive model
public static List<Iris> TrainingIrisData { get; set; }
/// <summary>
/// Reads the trainingDataFile and populates the TrainingIrisData list
/// </summary>
/// <param name="trainingDataFile">File full of Iris data</param>
/// <returns></returns>
public static void LoadIrisTrainingData(string trainingDataFile)
{
//if we don't have a training data file
if (string.IsNullOrEmpty(trainingDataFile))
throw new ArgumentNullException("trainingDataFile");
//if the file doesn't exist on the file system
if (!File.Exists(trainingDataFile))
throw new FileNotFoundException();
if (TrainingIrisData == null)
//initialize the return training data set
TrainingIrisData = new List<Iris>();
//read the entire file contents into a string
using (var fileReader = new StreamReader(new FileStream(trainingDataFile, FileMode.Open)))
{
string fileLineContents;
while ((fileLineContents = fileReader.ReadLine()) != null)
{
//split the current line into an array of values
var irisValues = fileLineContents.Split(',');
double sepalLength = 0.0;
double sepalWidth = 0.0;
double petalLength = 0.0;
double petalWidth = 0.0;
if (irisValues.Length == 5)
{
Iris currentIris = new Iris();
double.TryParse(irisValues[0], out sepalLength);
currentIris.SepalLength = sepalLength;
double.TryParse(irisValues[1], out sepalWidth);
currentIris.SepalWidth = sepalWidth;
double.TryParse(irisValues[2], out petalLength);
currentIris.PetalLength = petalLength;
double.TryParse(irisValues[3], out petalWidth);
currentIris.PetalWidth = petalWidth;
if (irisValues[4] == "Iris-setosa")
currentIris.IrisClass = Iris.IrisTypes.IrisSetosa;
else if (irisValues[4] == "Iris-versicolor")
currentIris.IrisClass = Iris.IrisTypes.IrisVersicolour;
else
currentIris.IrisClass = Iris.IrisTypes.IrisVirginica;
IrisDataParserService.TrainingIrisData.Add(currentIris);
}
}
}
}
}
}
This code is pretty standard. We simply read each line in the file, split the values out into an array, and populate a List<Iris> collection of Iris objects based on the data found in the file.
Now the magic
Using the numl.net library, we need only use three classes to perform a prediction based on the Iris data set. We start with a Descriptor, which identifies the class in which we will learn and predict. Next, we will instantiate a DecisionTreeGenerator, passing the descriptor to the constructor. Finally, we will create our prediction model by calling the Generate method of the DecisionTreeGenerator, passing the training data (IEnumerable<Iris>) to the Generate method. The generate method will provide us with a model in which we can perform our prediction.
Here is the code:
using numl;
using numl.Model;
using numl.Supervised;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace NumlDemo
{
class Program
{
public static void Main(string[] args)
{
//get the descriptor that describes the features and label from the Iris training objects
var irisDescriptor = Descriptor.Create<Iris>();
//create a decision tree generator and teach it about the Iris descriptor
var decisionTreeGenerator = new DecisionTreeGenerator(irisDescriptor);
//load the training data
IrisDataParserService.LoadIrisTrainingData(@"D:\Development\machinelearning\Iris Dataset\bezdekIris.data");
//create a model based on our training data using the decision tree generator
var decisionTreeModel = decisionTreeGenerator.Generate(IrisDataParserService.TrainingIrisData);
//create an iris that should be an Iris Setosa
var irisSetosa = new Iris
{
SepalLength = 5.1,
SepalWidth = 3.5,
PetalLength = 1.4,
PetalWidth = 0.2
};
//create an iris that should be an Iris Versicolor
var irisVersiColor = new Iris
{
SepalLength = 6.1,
SepalWidth = 2.8,
PetalLength = 4.0,
PetalWidth = 1.3
};
//create an iris that should be an Iris Virginica
var irisVirginica = new Iris
{
SepalLength = 7.7,
SepalWidth = 2.8,
PetalLength = 6.7,
PetalWidth = 2.0
};
var irisSetosaClass = decisionTreeModel.Predict<Iris>(irisSetosa);
var irisVersiColorClass = decisionTreeModel.Predict<Iris>(irisVersiColor);
var irisVirginicaClass = decisionTreeModel.Predict<Iris>(irisVirginica);
Console.WriteLine("The Iris Setosa was predicted as {0}",
irisSetosaClass.IrisClass.ToString());
Console.WriteLine("The Iris Versicolor was predicted as {0}",
irisVersiColorClass.IrisClass.ToString());
Console.WriteLine("The Iris Virginica was predicted as {0}",
irisVirginicaClass.IrisClass.ToString());
Console.ReadKey();
}
}
}
And that's all there is to it. As you can see, you can use the prediction model accurately and there's no math, only simple abstractions.
I hope this has peaked your interest in the numl.net library for machine learning in .NET.
Feel free to post any questions or opinions.
Thanks for reading!
Buddy James

About the author

My name is Buddy James. I'm a Microsoft Certified Solutions Developer from the Nashville, TN area. I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband. I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data. When I'm not coding, I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).