I have experimented with a number of things and when something improves the final val loss, I am happy to see that but am no smarter than before as to why it helped.

A big challenge for me with this dataset is the inability to look at an image and recognize what type it is (unlike the dogs vs cats, for example). So I can't look at the examples the model got wrong and figure out how I can make it better. All my experimenting has really been 'brute-force search' than anything more thoughtful.

Precompute vgg16bn features for all(initial_train, initial_valid, additional_train, additional_valid, test) with image size 448x448 .Be sure to remove useless and corrupted images(check for them in forums)

Use this as input to simple (MaxPool-(Dense(4096)-Dropout(0.6)-Batchnorm)x2-Dense(3,softmax)). Train this for 3 epochs on the entire data.

Build 5 such models and average their predictions.

I tried data augmentation but it doesn't seem to converge as fast non-augmentation approach.(Maybe I need to experiment it more)

Hi @rteja1113, thanks for sharing! I just did one submission and ranked 600+ out of the 661 team:flushed: There are lots more for me to try. The ensemble method you used seems to be very promising.

I just realized that, when I row-wise sum up the predicted probabilities, the sum is not always exactly 1.0000. Sometimes it is 0.999XXX and sometimes 1.000XXX. I was wondering why this happens. Do you have a guess that how much would this affect the LB score?

Hi @shushi2000, the reason you are getting like that is because floating point operations are not exact in computers.for ex: 0.1 + 0.2 may get you 0.300001 . Everyone are limited by this problem.So, I don't think it effects the LB too much.

I also did the same the network on resnet features. I averaged all models and now I'm able to get around 0.76471.EDIT: Forgot to mention that I did a clipping of 0.15

Fascinating! Again, thank you very much! I am wondering how different it is between the average of 5 sets of predictions from vgg and that from resnet. Just curious.Can we make this generalization: the more sets of predictions we use, the better LB result will we get?