Now that we have established our pipeline, it's time to enjoy the fruits of our labor.
This mainly comes in the form of easy experimentation and automatic reproducibility.

When we experiment and change the parameters of our pipeline (e.g. change hyperparameters, preprocessing, or even switching to a new dataset),
DVC automagically knows what has changed and re-runs only the relevant stages of the pipeline, building upon non-changed stages to save time.

Reproducibility means that after making your changes and telling DVC to recalculate the pipeline, you can dvc push the resulting
files to a shared remote. That way, when someone else (or you yourself 3 months from now) checks out the experiment branch,
they immediately get all the necessary context to reproduce your results. Remember, even failed experiments can be a useful
source of information!

But that's not really interesting. Let's start changing our code. For this stage we edit only the featurization stage - featurization.py.
Logically that means that we'll need to re-run the featurization stage, followed by model training and evalutation.

DVC checks all .dvc files (or stages) to see what has changed. Since the import stages didn't change they wont be re-run.
Upon reaching the featurization stage, it will run again, as well as the training and evaluation stages.

We'd like to start fresh from our original pipeline (before the introduction of PCA). For that we need to do two things.

1
2

git checkout -b CNN master
dvc checkout

The first command is the regular Git checkout that branches from the master.

After checking out the Git files, our DVC tracked files - data, model and metrics, still refer to the PCA branch.
Using the second command, dvc checkout takes care of that, as DVC looks for the appropriate hash in our cache folder and retrieves it to the working copy.

Automate dvc checkout

If you want DVC to automatically checkout whenever you switch Git branches, use the handy dvc installcommand.

To verify that we are indeed working on a copy of the last master branch commit, you can perform:

This is a neural network consisting of 2 convolutional layers and 2 fully connected layers.
Between every convolutional layer we apply a ReLU activation function, as well as 2d pooling.
The tensor is then transformed to fit the shape of the fully connected layers.
It then passes through the first fully connected layer followed by another ReLU, and finally the last fully connected layer.
We apply a log_softmax, which results in a tensor with an estimation of the current input for each class.
We will later take the maximum of all these estimates and use that as the classification result.

Next, let's modify the code for our model training.
The complete code can be found in this link.

The new code has a lot of changes, so here is a view of the whole train_model.py after the changes: