Well, the title is a bit of a misnomer — I'm really going to discuss the roots of neural networks. Today, most of the things we characterize as "artificial intelligence" are based around neural networks, so I'm going to let this one slide.

Even though we're starting to see the inclusion of neural networks and similar technologies in a wide range of systems today — ranging from cars to financial software — they all come from pretty humble beginnings. Much of what you see today are actually deep neural networks, made possible by advances circa 2010 with regards to training these kinds of systems. Prior to then, training deep networks was very time-consuming and tended to create brittle networks with low generality — by which I mean that those networks were fantastic for identifying the things they trained on, but not so great at identifying similar things out of the lab.

All of these networks have their roots in a simple neural model: the perceptron.

Frank Rosenblatt first designed the perceptron in hardware at Cornell University in 1957. It was a bit oversold at the time. Rosenblatt and the Navy (who funded the work) claimed that it would enable future general intelligence. Now, we don't have that yet (and may never), but today, perceptron-inspired systems are driving cars, recognizing people in pictures, and translating languages. So I guess they may not have been that far off.

The perceptron is pretty simple, as you'd expect:

So this is a model of a single perceptron neuron in a neural network. Most networks will have more than a single neuron, of course. But that can obscure the biggest problem with this neuronal model: It can only classify linearly separable classes. So what does this mean?

Well, take a look at the math in the definition of a perceptron. It only uses multiplication and addition. Both of these are linear functions. Think of how you defined a linear function when you first started algebra and calculus. It was something like:

Compare that to the perceptron definition. Pretty similar.

So why is this a problem?

Well, all of the operations we perform in a traditional perceptron maintain linearity. No matter how many layers you use, you're still using a linear transformation over the input. This means that you can't classify things that are nonlinearly separated. Ever. It doesn't matter how deep your network is, how complex it is, or how many neurons you have. If it's not linearly separable, you won't be able to do much with it.

Needless to say, this limits the perceptron to solving fairly simple problems. If the problem is appropriately linear, though, the perceptron works like a champ. Otherwise? Well, you need to look at other options — ones that introduce nonlinear transformations. Of which, today, there are plenty (thankfully).

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.