Neural networks are especially appropriate to learn patterns and remember shapes. Perceptrons are very basic but yet very powerful neural networks types. Their structure is basically an array of weighted values that is recalculated and balanced iteratively. They can implement activation layers or functions to modify the output within a certain range or list of values.

In order to create the neural network we are going to use Keras, one of the most popular Python libraries. The code is as follows:

The first thing to do is to import the elements that we will use. We will not use aliases for the purpose of clarity:

In the previous code we have imported the numpy and pandas libraries to manage the data structures and perform operations with matrices. The two scikit-learn modules will be used to scale the data and to prepare the test and train data sets.

The matplotlib package will be used to render the graphs.

From Keras, the Sequential model is loaded, it is the structure the Artificial Neural Network model will be built upon. Three types of layers will be used:

Dense: Those are the basic layers made with weighted neurons that form the perceptron. An entire perceptron could be built with these type of layers.

Activation: Activation functions transform the output data from other layers.

Dropout: This is a special type of layer used to avoid over-fitting by leaving out of the learning process a number of neuron.

The predictor variable is saved in variable X and the dependent variable in y. The two variables have values that differ several orders of magnitude; and the neural networks work better with values next to zero. For those two reasons the variables are scaled to remove their original magnitudes and put them within the same magnitude. Their values are proportionally transformed within 0 and 1.

The data is divided into two sets. One will be used to train the neural network, using 60% of all the samples; and the other will contain 40% of the data, that will be used to test if the model works well with out-of-the-sample data.

Now we are going to define the neural network. It will consist in an input layer to receive the data, several intermediate layers, to process the weights, and a final output layer to return the prediction (regression) results.

The objective is that the network learns from the train data and finally can reproduce the original function with only 60% of the data. It could be less, it could be more; I have chosen 60% randomly. In order to verify that the network has learnt the function, we will ask it to predict which response should return the test data that was not used to create the model.

Now let’s think about the neural network topology. If we study the chart, there are three areas that differ considerably. Those are the left tail, up to the 440 mark, a peak between the 440 and 465 marks approximately, and the second tail on the right, from the 465 mark on. For this reason we will use three neuron intermediate layers, so that the first one learns any of these areas, the second one other area, and the third one the final residuals that should correspond to the third area. We will have therefore 3 layers in our network plus one input and one output layer too. The basic layer structure of the neural network should be similar to this, a sequence of layers, from left to right with this topology:

INPUT LAYER(2) > [HIDDEN(i)] > [HIDDEN(j)] > [HIDDEN(k)] > OUTPUT(1)

An input layer that accepts two values X and y, a first intermediate layer that has i neurons, a second hidden layer that has j neurons, an intermediate layer that has k neurons, and finally, an output layer that returns the regression result for each sample X, y.

Now the model is trained by iterating 256 times through all the train data, taking each time two sampless.

In order to graphically see the accuracy of the model, now we apply the regression model to new data that has not been used to create the model. We will also plot the predicted values versus the actual values.

For 2K columns, I would suggest you first reduce the number of features or group them into components such as with Principal Component Analysis. Study which is the prediction loss after removing the 1000 less significative features, for instance. What happens after grouping the features into 10 principal components? Does Kernel PCA improve the model?. So, perform extensive feature engineering first.

There are other frameworks that come with pre-trained or pre-configured models such as Caffe on http://caffe.berkeleyvision.org/. You may also take advantage of it.

This was great. I’ve been looking for some example of using Keras for nonlinear regression and how to shape the model architecture. I have a question though – what are the ‘model.add(Activation(“linear”))’ lines for? You already have layers that use “activation=’relu'”, so why include additional Activation layers?

I am also confused why you set the activation in the add(Dense… and then in an additional model.add;
“` model.add(Dense(64, activation=’relu’))
model.add(Activation(“linear”)) “`
Won’t this just overwrite the activation=’relu’?