Deep learning in SAS Enterprise miner

I have a dataset that represents features from videos and I read about deep learning. It is vey hot topic in machine learning.

I need to know how sas implement or support these techniques in SAS/STAT or sas E-Miner. The HPNeural has appropriate cabablilities to define hidden layers to 10 and hidden neurons. Does these options can claim me to build deep learning model ?

what are the differences between learning in deep and learning in HPneural?

Is your data encoded video like an mpeg? If so you will need to use something besides SAS to decode your video into pixel intensity values. I suggest OpenCV. Once your data is in a standard tabular format containing numerical columns (probably with pixels as columns and frames as rows), then you can read it into SAS easily using PROC IMPORT or a DATA step. Also, remember to standardize before training a neural network.

If you are training a neural network with more than two layers, I would suggest using the FREEZE and THAW statements in PROC NEURAL to conduct layer-wise pretraining, and then training all the layers together again. In current releases, HPNEURAL does not provide protection against vanishing or exploding gradients for deep networks - two layers should be fine with HPNEURAL. I would suggest testing a large network two layer network (many hidden units per layer) trained with HPNEURAL against a deeper network trained with PROC NEURAL. I would expect HPNEURAL to be faster than PROC NEURAL, even using PROC NEURAL's multithreading capabilities.

Now for PROC NEURAL ... which is more complicated. PROC NEURAL allows for layerwise pretraining and can you help you avoid one of the most common pratfalls in training deep neural networks: vanishing/exploding gradients.

What are vanishing/exploding gradients? Prior to deep learning neural networks were typically initialized using random numbers. Neural networks generally use the gradient of the network's parameters w.r.t. to the network's error to adjust the parameters to better values in each training iteration. In back propagation, to evaluate this gradient involves the chain rule and you must multiply each layer's parameters and gradients together across all the layers. This is a lot of multiplication, especially for networks with more than 2 layers. If most of the weights across many layers are less than 1 and they are multiplied many times then eventually the gradient just vanishes into a machine-zero and training stops. If most of the parameters across many layers are greater than 1 and they are multiplied many times then eventually the gradient explodes into a huge number and the training process becomes intractable.

PROC NEURAL provides a mechanism to avoid vanishing/exploding gradients in deep networks, by training only one layer of the network at a time. Once all the layers have been initialized through this pre-training process to values that are more suitable for the data, you can usually train the deep network using gradient descent techniques without the problem of vanishing/exploding gradients. It looks like this, roughly:

Please be aware that recent advances in deep learning are hot topics at SAS R&D too and we are hoping to provide much more functionality for deep learning in coming releases ... but - as always - no promises. Enterprise grade scientific software takes time.

Is your data encoded video like an mpeg? If so you will need to use something besides SAS to decode your video into pixel intensity values. I suggest OpenCV. Once your data is in a standard tabular format containing numerical columns (probably with pixels as columns and frames as rows), then you can read it into SAS easily using PROC IMPORT or a DATA step. Also, remember to standardize before training a neural network.

If you are training a neural network with more than two layers, I would suggest using the FREEZE and THAW statements in PROC NEURAL to conduct layer-wise pretraining, and then training all the layers together again. In current releases, HPNEURAL does not provide protection against vanishing or exploding gradients for deep networks - two layers should be fine with HPNEURAL. I would suggest testing a large network two layer network (many hidden units per layer) trained with HPNEURAL against a deeper network trained with PROC NEURAL. I would expect HPNEURAL to be faster than PROC NEURAL, even using PROC NEURAL's multithreading capabilities.

Now for PROC NEURAL ... which is more complicated. PROC NEURAL allows for layerwise pretraining and can you help you avoid one of the most common pratfalls in training deep neural networks: vanishing/exploding gradients.

What are vanishing/exploding gradients? Prior to deep learning neural networks were typically initialized using random numbers. Neural networks generally use the gradient of the network's parameters w.r.t. to the network's error to adjust the parameters to better values in each training iteration. In back propagation, to evaluate this gradient involves the chain rule and you must multiply each layer's parameters and gradients together across all the layers. This is a lot of multiplication, especially for networks with more than 2 layers. If most of the weights across many layers are less than 1 and they are multiplied many times then eventually the gradient just vanishes into a machine-zero and training stops. If most of the parameters across many layers are greater than 1 and they are multiplied many times then eventually the gradient explodes into a huge number and the training process becomes intractable.

PROC NEURAL provides a mechanism to avoid vanishing/exploding gradients in deep networks, by training only one layer of the network at a time. Once all the layers have been initialized through this pre-training process to values that are more suitable for the data, you can usually train the deep network using gradient descent techniques without the problem of vanishing/exploding gradients. It looks like this, roughly:

Please be aware that recent advances in deep learning are hot topics at SAS R&D too and we are hoping to provide much more functionality for deep learning in coming releases ... but - as always - no promises. Enterprise grade scientific software takes time.

Re: Deep learning in SAS Enterprise miner

About the datset, I applied feature extraction to get detector and feature descriptor that represent each video as feature vector with length 14700 and the dataset became tabular with 600 observations (600 videos). The dataset is 600*14700.

Thanks again for your cooperation and solution and I hope that SAS release a node to deep learning in next releases.

Re: Deep learning in SAS Enterprise miner

I'm trying to run a similar program via Enterprise Guide(7.1) to an EG Server(9.3) and I can never get CPU utilization to go over 25%. I'm licensed for four cores on the EG Server. I've tried editing the sasv9.cfg and using CPUCOUNT=4 threads=yes. I have Enterprise Miner 13.

Re: Deep learning in SAS Enterprise miner

Is <500 features a hard number for PROC NEURAL? I'm currently working with a set that has 730 features. I'm using your https://github.com/sassoftware/enlighten-deep code to duplicate Hinton's and Salakhutdinov's work for dimensional reduction, by a 730-365-100-2 autoencoder.

Re: Deep learning in SAS Enterprise miner

No - not a hard number at all, but a bigger problem will take longer and at some point you may run out of resources during training if the training set is too big.

To give you some idea - I was able to roughly replicate the paper you referenced using a 300-100-2-100-300 autoencoder built with proc neural, in about 6 hrs. using 12 cores on a server with 128 GB of RAM. Less more/cores + less/more memory = less/more time.