I tried to reduce the learning rate, change optimizer from SGD to Adam, and use different types of initializer for parameters. None of these solved the problem. Then I realized it would be a hard job to find the cause of the problem. Thus I began to print the value of ‘loss’, then the values of ‘loss_location’ and ‘loss_confidence’. Finally, I noticed that ‘loss_location’ firstly became ‘nan’ because of the value of in the equation below (from paper) is ‘nan’:

‘loss_location’ from paper ‘SSD: Single Shot MultiBox Detector’

After checked the implementation in the ‘layers/box_utils.py’ of code:

1

2

3

4

5

6

7

8

9

10

11

def encode(matched,priors,variances):

...

# dist b/t match center and prior's center

g_cxcy=(matched[:,:2]+matched[:,2:])/2-priors[:,:2]

# encode variance

g_cxcy/=(variances[0]*priors[:,2:])

# match wh / prior wh

g_wh=(matched[:,2:]-matched[:,:2])/priors[:,2:]

g_wh=torch.log(g_wh)/variances[1]

# return target for smooth_l1_loss

returntorch.cat([g_cxcy,g_wh],1)# [num_priors,4]

I realized the (matched[:, 2:] – matched[:, :2]) has got a negative value which never happend when using CUB-200 dataset.

Now it’s time to carefully check the data pipeline for CUB-200-2011 dataset. I reviewed the bounding box file line by line and found out that the format of it is not (Xmin, Ymin, Xmax, Ymax), but (Xmin, Ymin, Width, Height)! Let’s show the images for an incorrect bounding box and correct one: