Building a Linear Classifier with the Knowm API

A Classifier is a Machine Learning tool which is able to place incoming examples into categories. On an abstract level it is the process of making a decision and is thus a fundamentally important task of intelligent systems. As we shall see, we can use it to perform the tasks in the following tutorials. Classifiers are also being used right now in real world applications that mine for data on the web, filter spam, buy and sell stocks, detect fraud and more. This is because at the root of all these tasks there is a decision e.g. “Is this asset a buy or sell ?” or “Is this email spam or not?” which requires choosing from a number of predefined categories. Answering these questions is the job of a classifier.

There are a wide variety of classifiers, and they work in many ways. We will be focusing on the Linear Classifier because (1) it is the bread and butter of Machine Learning and (2) the concept is simple enough to begin our crash course with.

A Linear Classifier is a Classifier that makes its decisions by building a number of lines which split our inputs into categories. Hence linear for line. As we shall see, on kT-RAM these lines (or decision boundaries) are encoded by the conductivities of the synapses from individual AHaH nodes. Because of this an AHaH node will output a positive value when a spike pattern is on one side of the line, and negative when you are on the other.

The Linear Classifier we’ve built can be accessed from here:

Java

1

org.knowm.knowmj.module.classifier.LinearClassifier

But first, let’s describe generally what is going here before diving under the hood and looking at the code.

To perform classification on spike patterns, the Linear Classifier allocates one AHaH node to each category. The size of the spike patterns determine the size of each AHaH node. For example, if each example were encoded into a spike pattern of size 100 and we were categorizing each of those patterns into one of five pre-defined categories, then we would need to create five AHaH nodes, each with 100 kT-Synapses for a total of 500 synapses.

During training, we will load each spike pattern from our training set onto each AHaH node and execute a FF read operation. This will give us a voltage reading at the root of each AHaH node. The highest voltage across AHaH nodes will give us our label (and an ordering of likely labels). Once we have that output we can look at the desired output ( the true output ) and apply an operation which will change our synapse so that next time around the node outputs will better match the desired outputs. Over time, each AHaH Node will begin to output high voltages when the inputs fall into its category and low voltages when they don’t. Its that simple!

To give you a sense of what this looks like, if we we’re to plot the decision boundaries for a 2 dimensional problem we might find our AHaH nodes build decision boundaries like those in the pictures below.

The first thing we notice is that the input arguments are a spike pattern and a set of truth labels, given as strings. Because this classifier is intended as a generic multi-label classifier, we must check to insure that we have allocated AHaH nodes to labels:

Java

1

createNewAHaHNodes(truthLabels);

Next we are going to get the list of AHaH nodes that we are using to perform this classification. We may have more AHaH nodes on this kT-RAM chip that are not being used.

Java

1

2

// node activations and link learning ==>

Set activatedAHaHNodeIDs=getActivatedNodes(spikes,truthLabels);

Once we have this set, we iterate over each AHaH node which had previously been associated with specific labels. We first set the spike pattern and perform a FF read operation:

The node output y represents the confidence for a particular AHaH node that the current spike pattern is associated with the nodes label. To learn this mapping, we perform the following conditional instructions depending on the node output and the supervised labels:

Depending on the condition we perform a RH, RL or RF instruction. If there are no labels then we execute the RF instruction (this is is unsupervised learning which we haven’t covered yet).

If we do have training labels, then only three conditions concern us. First, if the AHaH node is representing some label X, and the label X is given as a truth label, then we provide the RH instruction. The RH instruction will drive the node output higher i.e. increasing that AHaH nodes output next time these same spikes are set.

Java

1

2

3

if(truthLabels.contains(activatedAHaHNodeID)){

kTRAM.execute(activatedAHaHNodeID,Instruction.RH,Instruction.XX);

}

If the label X representing our current AHaH node is not given as a truth label then we will need to decrease the likelihood of this node outputting a high value when these spikes are set. There are two situation to consider in the this case.

First if the the node output was positive, meaning that it believes the current spike pattern was an example of its label–but the truth label was not present, we perform the RL instruction. In other words, if the AHaH node was a false-positive, and only if it was a false-positive, we drive the node output lower.

Java

1

2

3

elseif(y>0){// mistake

kTRAM.execute(activatedAHaHNodeID,Instruction.RL,Instruction.XX);

}

Finally, if the node output was negative and the truth label was not present (true-negative) we perform an RF instruction. This specific combination of instruction has been shown to produce a very well-behaved linear classifier.

Java

1

2

3

else{

kTRAM.execute(activatedAHaHNodeID,Instruction.RF,Instruction.XX);

}

Demonstrations

We’ve only begun to scratch the surface so far, but results look very promising. Our classifier benchmark scores for the Breast Cancer Wisconsin (Original), Census Income, MNIST Handwritten Digits, and the Reuters-21578 data sets along with results from other published studies using their respective classification methods will be shown later in this article series. Our results compare well to published benchmarks and consistently roughly match or exceed SVM and other algorithm performance. This is surprising given the simplicity of the approach, which amounts to simple sparse spike encoding followed by classification with independent AHaH nodes. We do not inflate the training set; our results are achievable with only one online training epoch. Both the training and test are completed on a standard desktop computer processor in a few minutes to less than an hour, depending on the resolution of the spike encoding.

Results to date indicate that the AHaH classifier is an efficient incremental optimal linear classifier. The AHaH classifier displays a range of desirable classifier characteristics hinting that it may be an ideal general classifier capable of handling a wide range of classification applications. The classifier can learn online in a feed-forward manner. This is important for large data sets and applications that require constant adaptation such as prediction, anomaly detection and motor control. The classifier can associate an unlimited number of labels to a pattern, where the addition of a label is simply the addition of another AHaH node. By allowing the classifier to process unlabeled data it can improve over time. This has practical implications in any situation where substantial quantities of unlabeled data exist. Through the use of spike encoders, the classifier can handle mixed data types such as discrete or continuous numbers and strings. The classifier tolerates missing values, noise, and irrelevant attributes and is computationally efficient. The most significant advantage, however, is that the circuit can be mapped to physically adaptive hardware. Optimal incremental classification can now become a hardware resource.