Wisconsin Breast Cancer Classification Benchmark

One of the most promising and applicable uses of Machine Learning is medical diagnostics. For instance, many of the thousands of people that die each year in the US from Breast Cancer could potentially have been saved if they’d had better access to cheap, intelligent diagnostic tools. This article contains a tutorial showing you how this type of diagnostic tool could be built with the Knowm API.

The Dataset

The Wisconsin Breast Cancer dataset is a canonical classification benchmark for training and testing a machine learning classification tool.The task is to predict whether a particular patient has a malignant or benign tumor from 9 attributes:

Here are a few examples from the original raw data CSV file: id, [feature vector]

Shell

1

2

3

4

5

6

7

8

9

1000025,5,1,1,1,2,1,3,1,1,2

1002945,5,4,4,5,7,10,3,2,1,2

1015425,3,1,1,1,2,2,3,1,1,2

1016277,6,8,8,1,3,4,3,7,1,2

1017023,4,1,1,3,2,1,3,1,1,2

1017122,8,10,10,8,7,10,9,7,1,4

1018099,1,1,1,1,2,10,3,1,1,2

1018561,2,1,2,1,2,1,3,1,1,2

1033078,2,1,1,1,2,1,1,1,5,2

Just like in the previous Census Income tutorial we will using the open source Java Datasets project to access the raw data, as it provides an extremely convenient way to query the data in the form of POJOs (Plain ol’ Java Objects). Each Wisconsin BreastCancer object contains the relevant information with necessary getters and setters.

Java

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

publicclassBreastCancer{

privateintid;

privateintsampleCodeNumber;

privateintclumpThickness;

privateintuniformityOfCellSize;

privateintuniformityOfCellShape;

privateintmarginalAdhesion;

privateintsingleEpithelialCellSize;

privateintbareNuclei;

privateintblandChromatin;

privateintnormalNucleoli;

privateintmitoses;

privateintcellClass;

...

//getters and setters

...

}

If you haven’t already, you can access this classifier example by signing up for the Knowm Developer Community and downloading the Java code. If not you can still follow along to see how it’s done.

Building the Classifier with the Knowm API

Just like the before we will be building a classifier using the Knowm API’s LinearClassifier class which means everything we write with it will benefit from drastic improvements in speed and efficiency once ported to a physical neuromorphic chip. Let’s get to it.

Open up the BreastCancerApp class:

Java

1

org.knowm.knowmj.classifier.breastcancer.BreastCancerApp

We can see this class extends ClassifierApp. In the previous tutorial we discussed how we used this wrapper to define our Classifier by abstracting out a lot of stuff that is common to many classification problems.

Training

Let’s first check out how we implemented our learn() method:

Java

1

2

3

4

5

6

7

8

9

10

11

@Override

publicvoidlearn(intepoch){

Set truthLabels=newHashSet();

for(inti=0;i<BreastCancerDAO.getTrainTestSplit();i++){

BreastCancer breastCancer=BreastCancerDAO.selectSingle(i);

int[]spikes=encoder.encode(breastCancer);

truthLabels.clear();

truthLabels.add(breastCancer.getCellClass()==4?"malignant":"benign");

classify(spikes,truthLabels,false);

}

}

This method is pretty straight forward. We are looping through all of the training data and spike encoding it like this:

Java

1

2

3

4

5

6

int[]spikes=encoder.encode(breastCancer);

We thenconstruct the training label set:

truthLabels.clear();

truthLabels.add(breastCancer.getCellClass()==4?"malignant":"benign");

and finally pass the spikes, labels and an evaluation flag to the classify method:

Java

1

classify(spikes,truthLabels,false);

The false evaluation flag is telling the parent ClassifierApp class to not evaluate the output of the classifier, i.e. not to compute the recall, precision, etc.

Encoding

The encoding method is also an important aspect of all our classifiers. As you will learn while using the Knowm API the method of encoding will have drastic effects on the performance. For this tutorial we will be using a special encoder we’ve created called BreastCancerSpikeEncoder which we specify in the getNewEncoder() method:

Java

1

2

3

4

@Override

publicEncoder getNewEncoder(){

returnnewBreastCancerSpikeEncoder();// Full adaptive encoder.

}

You can find the encoder class here:

Java

1

org.knowm.knowmj.classifier.breastcancer.BreastCancerSpikeEncoder

This encoder has two fields, an array of A2D_Integer encoders and a SpikeStreamJoiner. The A2D encoder is a spatially adaptive encoder. You can learn more about this encoder by reading our article on it: Understanding the A2D encoder :

Java

1

2

privatefinalA2D_Integer[]encoders;

privatefinalSpikeStreamJoiner joiner;

A look at the encode method reveals that it is actually pretty simple. We just clear the SpikeStreamJoiner and then load it up with the spikes from each encoder in our array, where each encoder is assigned to a specific field of the BreastCancer class:

Great! we’ve built all the necessary parts of our classifier, now let’s test it.

Testing Phase and Primary Performance

Before we run BreastCancerApp, lets point out a few things. First, we are specifying the kT-RAM core type in the getCoreType() method:

Java

1

2

3

4

@Override

publicKTRAMd.DigitalCoreType getCoreType(){

returnDigitalCoreType.NIBBLE;

}

as well as the synaptic initialization in the getSynapticInitType() method:

Java

1

2

3

4

@Override

publicSynapticInitType getSynapticInitType(){

returnSynapticInitType.LOW_NOISELESS;

}

You may want to see how changing these methods will effect the performance of our classifier. One of the key features of the Knowm API is the notion of interchangeable core types. Interchangeable cores are our bridge between the digital kT-RAM emulators of today and the physical kT-RAM of tomorrow by letting us run our simulations of each memristor with different degrees of accuracy. With the NIBBLE core enabled and 1 epoch of training you should see results like this:

Knowm API Breast Cancer Performance

By looking at our performance chart, we can see that the classifier performed perfectly on a sufficiently low threshold. This means we were able to correctly diagnose every test patient properly. This is a very promising result for our classifier and machine assisted medical diagnostics in general.

The different evaluation metrics were previously explained on the page Primary and Secondary Performance Metrics. We see that as expected as the confidence threshold is varied the metric change accordingly. Looking at the results, we see we attained our best result with memristor core type but this is not always the case. Sometimes the non-continuos precision of our cores will increase accuracy and sometimes not.

ROC Curves

Looking at the ROC curves for both benign and malignant labels, we see that the the equal error rate or crossover error rate (EER or CER): the rate at which both acceptance and rejection errors are equal, is 0.0. This is a rare example of a perfect ROC curve!

Secondary Metrics

As stated in Primary and Secondary Performance Metrics, most machine learning benchmark studies only report primary performance metrics. Here, we also report the secondary metrics when run a a 2015 Macbook Pro Retina. Wattage is a rough estimate acquired from the iStat Menu app.

Measurement

Value

Energy Consumption

15.1 Watts

Speed

1 Second

Volume

600 cubic centmeters

Conclusion

In this article we stepped through the Wisconsin Breast Cancer dataset, the kT-RAM linear classifier, spike encoding as well as primary and secondary performance metrics. We saw that the Knowm API is very well suited for this task. If you have any comments or questions please leave them in the comment section below!