At Knowm, we are building a new and exciting type of computer processor to accelerate machine learning (ML) and artificial intelligence applications. The goal of Thermodynamic-RAM (kT-RAM) is to run general ML operations, traditionally deployed to CPUs and GPUs, to a physically-adaptive analog processor based on memristors which unites memory and processing. If you haven’t heard yet, we call this new way of computing “AHaH Computing”, which stands for Anti-Hebbian and Hebbian Computing, and it provides a universal computing framework for in-memory reconfigurable logic, memory, and ML. While we have shown a long time ago that AHaH Computing is capable of solving problems across many domains of ML, we only recently figured out how to use the kT-RAM instruction set and low precision/noisy memristors to build supervised and unsupervised compositional (deep) ML systems. Our method does not require the propagation of error algorithm (Backprop) and is easy to attain with realistic analog hardware, including but not limited to memristors. This blog post and the research behind it is motivated by the fact that we need to compare our new approach apples-to-apples with existing deep learning approaches, looking at both primary metrics (accuracy, error, etc.) and secondary metrics (power, time, size).

Problems with Deep Neural Networks

Today’s deep learning models are neural networks, multiple layers of parameterized differentiable nonlinear modules that can be trained by back propagation of error.

Geoffrey Hinton ML Pioneer Says We Need Another Approach

In 1986, Geoffrey Hinton co-authored a paper that, three decades later, is central to the explosion of artificial intelligence. But Hinton says his breakthrough method should be dispensed with, and a new path to AI found.

Speaking with Axios on the sidelines of an AI conference in Toronto on Wednesday, Hinton, a professor emeritus at the University of Toronto and a Google researcher, said he is now “deeply suspicious” of back-propagation, the workhorse method that underlies most of the advances we are seeing in the AI field today, including the capacity to sort through photos and talk to Siri. “My view is throw it all away and start again,” he said.

The bottom line: Other scientists at the conference said back-propagation still has a core role in AI’s future. But Hinton said that, to push materially ahead, entirely new methods will probably have to be invented. “Max Planck said, ‘Science progresses one funeral at a time.’ The future depends on some graduate student who is deeply suspicious of everything I have said.”

How it works: In back propagation, labels or “weights” are used to represent a photo or voice within a brain-like neural layer. The weights are then adjusted and readjusted, layer by layer, until the network can perform an intelligent function with the fewest possible errors.

But Hinton suggested that, to get to where neural networks are able to become intelligent on their own, what is known as “unsupervised learning,” “I suspect that means getting rid of back-propagation.”

MNIST on Many Frameworks Comparison

After broadly reviewing all frameworks we narrowed our focus down to a short list including:

Neural Networks and Deep Learning

TensorFlow

DL4J

PyTorch

CNTK

Caffe2

Torch7

In order to get a rough feeling for the various frameworks that we are interesting in leveraging for our own deep learning framework, we decided to get to know each framework from our short list by running the MNIST benchmark. We chose this benchmark because it’s one of the very first benchmarks that most people run as an intro to machine learning, the “hello world” of machine learning. There are many tutorials and help available. We will take a look the primary and secondary performance metrics, take additional notes along the way and rate each framework. We will run them on a Macbook Pro, and also on a Linux system with a GPU.

Neural Networks and Deep Learning by Michael Nielsen

This book does a wonderful job at teaching the concepts of neural networks and the back propagation of error algorithm. In later chapters it goes into deep learning. For most of the chapters there is code that you can look at and run. We used an updated version of the source code, adapted for Python 3.

Run

By default, the 3rd network, network3.py, is run. This network is the convolutional deep neural network described in the book. If you need to run other networks, you’ll have to uncomment/comment the correct sections in test.py.

Deep Learning 4 J

Deeplearning4j is a domain-specific language to configure deep neural networks, which are made of multiple layers. Everything starts with a MultiLayerConfiguration, which organizes those layers and their hyperparameters. Hyperparameters are variables that determine how a neural network learns. They include how many times to update the weights of the model, how to initialize those weights, which activation function to attach to the nodes, which optimization algorithm to use, and how fast the model should learn.

TensorFlow (99.31%)

TensorFlow is an open source software library for numerical computation using data flow graphs. The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard, a data visualization toolkit.

Microsoft CNTK (?)

CNTK the Microsoft Cognitive Toolkit, is a framework for deep learning. A Computational Network defines the function to be learned as a directed graph where each leaf node consists of an input value or parameter, and each non-leaf node represents a matrix or tensor operation upon its children. The beauty of CNTK is that once a computational network has been described, all the computation required to learn the network parameters is taken care of automatically. There is no need to derive gradients analytically or to code the interactions between variables for backpropagation.

Suffer Score comment: Git clone took forever. Turns out you cannot run CNTK on a Mac. There is a work around involving running a Linux container using Docker. The installation instructions for Linux are not straightforward. The command pip command is supposed to be pip3 for phython3.

Why does it need to be so complicated? In the end it didn’t work, so I’ve given up.

Torch (??)

At the heart of Torch are the popular neural network and optimization libraries which are simple to use, while having maximum flexibility in implementing complex neural network topologies. You can build arbitrary graphs of neural networks, and parallelize them over CPUs and GPUs in an efficient manner.

Results

Caffe2 (??)

Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.

After at least one hour of googling, I was unable to find a tutorial or coherent instructions on how to install Caffe2 and run a CNN MNIST demo. The best I could find was installation instructions and a separate tutorial with lot’s of code but no instructions on how to download or run it.

Results

Time

Accuracy

Epochs

Suffer Score

??

?? %

40

5/5

PyTorch (99.09%)

PyTorch is a Python based scientific computing package targeted at two sets of audiences: 1)A replacement for numpy to use the power of GPUs and 2)a deep learning research platform that provides maximum flexibility and speed.

Results

Suffer Score comment: The absolute least effort of all the frameworks.

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

Train Epoch:58[48000/60000(80%)]Loss:0.022173

Train Epoch:58[48640/60000(81%)]Loss:0.141494

Train Epoch:58[49280/60000(82%)]Loss:0.078569

Train Epoch:58[49920/60000(83%)]Loss:0.162332

Train Epoch:58[50560/60000(84%)]Loss:0.081903

Train Epoch:58[51200/60000(85%)]Loss:0.129946

Train Epoch:58[51840/60000(86%)]Loss:0.138492

Train Epoch:58[52480/60000(87%)]Loss:0.158638

Train Epoch:58[53120/60000(88%)]Loss:0.098830

Train Epoch:58[53760/60000(90%)]Loss:0.079310

Train Epoch:58[54400/60000(91%)]Loss:0.049244

Train Epoch:58[55040/60000(92%)]Loss:0.045119

Train Epoch:58[55680/60000(93%)]Loss:0.064007

Train Epoch:58[56320/60000(94%)]Loss:0.107020

Train Epoch:58[56960/60000(95%)]Loss:0.048211

Train Epoch:58[57600/60000(96%)]Loss:0.099237

Train Epoch:58[58240/60000(97%)]Loss:0.037267

Train Epoch:58[58880/60000(98%)]Loss:0.090165

Train Epoch:58[59520/60000(99%)]Loss:0.093787

Testset:Average loss:0.0286,Accuracy:9909/10000(99%)

MNIST Experiment Summary

Below is a summary of the short list MNIST experiments including time to run, accuracy and suffer score.

Framework

Time

Accuracy

Epochs

Suffer Score

neuralnetworksanddeeplearning

2 Hours

99.13 %

59

2/5

TensorFlow

20 Minutes

99.31 %

20

2/5

DL4J

30 Minutes

98.42 %

58

2/5

PyTorch

30 Minutes

99.09 %

58

1/5

CNTK

—

—

—

5/5

Caffe2

—

—

—

5/5

Torch7

—

—

—

5/5

After at least an hour of trying I completely gave up on CNTK, Caffe2 and Torch7. I’m sure other people with more experience in the technologies related to those frameworks could have got them running more easily than I did, but this experiment is from my perspective as a relative beginner with deep learning frameworks and limited background in Python and Lua, etc. My success or lack thereof for each framework reflects not only the code, but the documentation, cross platform compatibility and the availability of beginner tutorials to follow for MNIST. PyTorch turned out to be the absolute simplest to run, working right out of the box.

The model accuracies were more or less the same as expected. Future SWaP comparisons will probably done against TensorFlow, DL4J and PyTorch.