I had to use the functional API in order to finetune the model as such. Is there a better way???

model.layers.pop()
for layer in model.layers: layer.trainable = False
# recover the output from the last layer in the model and use as input to new Dense layer
last = model.layers[-1].output
x = Dense(train_batches.num_class, activation="softmax")(last)
model = Model(model.input, x)
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

If you use keras 2 vgg you can exclude top layers by setting include top=False

The problem with this is that you're not fine tuning the FC layers now, but are training them from scratch. If you don't have plenty of data, this results in a significant accuracy loss (and increase in training time).

@jeremy, Thank you for pointing this out it's the case for BN version, but weights='imagenet' loads the weights if available locally else loads it from the keras site by default. So, I think it should be fine.

Thank you for pointing this out it's the case for BN version, but weights='imagenet' loads the weights if available locally else loads it from the keras site by default.

I just checked the keras code, and the pretrained weights aren't loaded into any FC layers with include_top=False. So you'll need to make it True and then split the model afterwards, as we've done in the lessons.

I believe the approach I outlined above is correct. It uses the Functional API to do the finetuning as such:

model.layers.pop()
for layer in model.layers: layer.trainable = False
# recover the output from the last layer in the model and use as input to new Dense layer
last = model.layers[-1].output
x = Dense(train_batches.num_class, activation="softmax")(last)
model = Model(model.input, x)
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

The Keras VGG implementations are built using the functional API, so I'm assuming this or else a very similar approach is needed to finetune.

Looks right to me. Although it's likely to be much faster to precompute the penultimate layer and then just train the linear classifier at the end. The approach you've shown still has to do the forward pass through the whole network.

Converting convolution kernels from Theano to TensorFlow and vice versa:

If you want to load pre-trained weights that include convolutions (layers Convolution2D or Convolution1D), be mindful of this:Theano and TensorFlow implement convolution in different ways (TensorFlow actually implements correlation, much like Caffe), and thus, convolution kernels trained with Theano (resp. TensorFlow) need to be converted before being with TensorFlow (resp. Theano).Here's how.github.com