Beginner's Guide for Caffe2DML users

Caffe2DML is an experimental API that converts a Caffe specification to DML.
It is designed to fit well into the mllearn framework and hence supports NumPy, Pandas as well as PySpark DataFrame.

Training Lenet

To create a Caffe2DML object, one needs to create a solver and network file that conforms
to the Caffe specification.
In this example, we will train Lenet which is a simple convolutional neural network, proposed by Yann LeCun in 1998.
It has 2 convolutions/pooling and fully connected layer.
Similar to Caffe, the network has been modified to add dropout.
For more detail, please see http://yann.lecun.com/exdb/lenet/.

The solver specification
specifies to Caffe2DML to use following configuration when generating the training DML script:

To train the above lenet model, we use the MNIST dataset.
The MNIST dataset was constructed from two datasets of the US National Institute of Standards and Technology (NIST).
The training set consists of handwritten digits from 250 different people, 50 percent high school students, and 50 percent employees from the Census Bureau. Note that the test set contains handwritten digits from different people following the same split.
In this example, we are using mlxtend package to load the mnist dataset into Python NumPy arrays, but you are free to download it directly from http://yann.lecun.com/exdb/mnist/.

pip install mlxtend

We first split the MNIST dataset into train and test.

frommlxtend.dataimportmnist_dataimportnumpyasnpfromsklearn.utilsimportshuffle# Download the MNIST datasetX,y=mnist_data()X,y=shuffle(X,y)# Split the data into training and testn_samples=len(X)X_train=X[:int(.9*n_samples)]y_train=y[:int(.9*n_samples)]X_test=X[int(.9*n_samples):]y_test=y[int(.9*n_samples):]

Finally, we use the training and test dataset to perform training and prediction using scikit-learn like API.

# Since Caffe2DML is a mllearn API, it allows for scikit-learn like method for training.lenet.fit(X_train,y_train)# Either perform prediction: lenet.predict(X_test) or scoring:lenet.score(X_test,y_test)

Unlike Caffe where default train and test algorithm is minibatch, you can specify the
algorithm using the parameters train_algo and test_algo (valid values are: minibatch, allreduce_parallel_batches,
and allreduce). Here are some common settings: