Who's working on the fish competition? Rachel and I have both started looking at it. Let's talk about the competition here. I'll start with a question: my test set result is quite a bit worse than my validation set result. I see on the kaggle forums that others have seen this too. Any ideas why? I haven't had a chance to look into it yet - I'm guessing that the test set is somehow different to the training set...

I just tried clipping my maximum probabilities to 0.7 and that moved me from about 70th to 12th - so it seems that taking account of the very different test set is important! That's a simple dense model on vgg convolutional layer, without even any data augmentation or pseudo labeling yet.

My submission score (currently 1.15881) was also much worse than my val_loss of 0.1609. So far, I've only fine-tuned Keras' VGG16 model with the Nadam optimizer and a sufficiently low learning rate. I haven't yet applied dropout, ensembling, or pseudo-labels. Nor have I handled the class imbalance issue. Data augmentation, even with tiny increments, did not improve val_loss for me. A few other things I noticed:

I again tried Keras' ResNet50 on this Kaggle competition but could not tame it enough to converge to a respectable validation score, despite trying lower and lower learning rates and other optimizers. VGG16 seems to give respectable results relatively quickly. I've not tried other achitectures like Inception, however.

Despite training on various low learning rates with pre-calculated inputs into VGG16's fully connected layers, I actually got better results training on the entire model (while still freezing the base convolutional layers). Does anyone know why that could happen?

Since this competition has images that are significantly higher than the 224x224 inputs into the pre-trained ImageNet architectures we're familiar with (e.g., VGG16, ResNet), I can't help but wonder whether we should pre-pend our model with a convolutional layer that accepts a large image size (e.g. 2048x2048) that outputs 224x224 images to VGG16. Is that a worthy approach? I can't find a definitive answer on how pre-trained ImageNet architectures can be used with higher res images.

In my search for attention models, I came across Google DeepMind's paper on a relatively new type of layer -- the Spatial Transformation Layer (https://arxiv.org/pdf/1506.02025v3.pdf). I found someone's Keras implementation on GitHub and successfully ran their sample notebook using it on cluttered MNIST data. I was amazed to see it auto-focus on the correct part of the image! I don't know why this type of layer hasn't become standard yet. It's been published for over a year.

I know of lots of people that have tried to use it, but no-one that's successfully used it for their own real world data. I'm still excited about the idea - so it would be great if you could try it for something like the fisheries competition (where I think that focusing on the fish is important).

I've tried the Keras implementation (https://github.com/EderSantana/seya) with both the fish image data set and the mammography dataset, but it didn't work out. I used a single spatial transformer layer as the first layer before the rest of the CNN with input image size of 4096x4096 and lowered my batch size due to the increased memory usage. Training time was very slow (as might be expected), and I didn't even get a validation accuracy above zero after 2 epochs on the mammography data set. Is there a code sample for a more established attention model that I should be using instead? R-CNNs perhaps?