L2-SVM is differentiable and imposes a bigger (quadratic vs. linear) loss for points which violate the margin.

If you want to dig deeper into the topic, that paper is probably a good bet.

All of these deep neural networks ultimately spit out a final feature vector representation of the input, which must then be classified (if classification is the task at hand). This is generally done using a simple linear classifier. The general impression that I’m getting from these various papers is that training the classifier using the L2-SVM objective function outperforms other methods such as L1-SVM or Softmax regression.

If you’re looking for some example MATLAB code, Adam Coates provides the code for his original CIFAR-10 benchmark implementation here:

http://www.cs.stanford.edu/~acoates/papers/kmeans_demo.tgz

and his code uses the L2-SVM objective to train the output classifier.