I'm doing the homework assignment, and I'm getting scores for the Kaggle competition above 15 (not good). I've spent some time debugging this. At first, I realized that the ids in the file were incorrect due to how get_batches iterates through the directory of test images, but even after correcting for that, I'm still in the 15s.

As I dug around, I noticed that there were lots of 1.0 probabilities in my results file. I thought to myself "that doesn't make sense -- the chances of getting a 1.0 on the validation data should be low, let alone on the test data"

But sure enough, when I run a prediction on a small set of both the test data and the validation data, I get tons of 1.0 probabilities.

Jeremy mentioned in one of the lessons that the classifier net used for that example tends to produce overconfident results. One way to avoid it is to increase the temperature of the softmax (if you were using one) (cf.https://www.cs.toronto.edu/~hinton/absps/distillation.pdf ). But more easily, just np.clip() the probabilities between 0.05 and 0.95. The crossentropy loss function hits you hard if you are maximally confident but you happen to get the prediction wrong.

Error when running get_batches on test images

I'm running into errors when trying to run get_batches on my test directory containing unlabeled dog and cat images. I'm working on step 11 of the homework for lesson 1. The structure of /test1 is simple: it doesn't contain any subdirectories and only contains the set of unlabeled images.

Looking at the error messages, it seems like the error is occurring image.pyc in the flow_index method which has the function signature:

_flow_index(self, N, batch_size, shuffle, seed)

I'm having a hard time seeing why N is getting the value 0. Has anyone else run into this problem?

Edit: this issue doesn't happen for /train which does have two subdirectories...my suspicion is that the issue has to do with directory structure, but I'm not sure.

Thanks a lot Mikkel, your post made me realize that the error message was misleading. Turns out the download of the models was interrupted the first time i ran the notebook, which led to the file being corrupted. If somebody else runs into the same problem, the solution is to clear the keras cache (rm ~/.keras/models/*) then re-run the code. It will download the file again.

Use the new vgg16.py script from github and also remove the cache for any previously saved models.

Thank you very much for that, Jose! I want to add some color here so anyone else seeing similar scores can learn from my mistakes. While it's true that bounding probabilities does improve performance, my problem was a typo. Notice in my screenshot above how none of the probabilities were close to 0? Well that should have been my first clue. I had simply typo'd my python code that took the output from vgg.predict.

With the typo, my score was 15.41851. With the typo fixed, my score dropped to 0.20384. Using Jose's clipping technique, I got all the way down to 0.10064!