Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

What should I infer from regularization loss going up? Model is not able to capture the underlying information using the current embedding size, or the scale of regularization loss is too less compared to the base loss?

1 Answer
1

There is no problem with the fact that your regularization loss is going up.

The cost function of your model is a weighted sum of the regularization loss and the base loss, so during training the model looks to minimize them together, but eventually it comes to a point where it has to choose and minimize one on the expense of the other. The fact that it is prefers the base loss means that your regularization hyper-parameter is not too big (eventually your goal is to reduce the base loss).

In general terms: The goal of the regularization loss is to simplify the model during training - demanding smaller weights forces the model to increase the size of just the important weights, and not all of them. So some wights converge to very small and insignificant values, effectively reducing the number of weights in the model. Smaller model means less overfitting and more generalization.

In your case: During the training of your model, it starts from reducing the regularization loss (because it is an easy task), but at some point it starts to notice crucial features which it wants to emphasize so it begins to increase the weights of those features, which results in higher regularization loss but a smaller total loss (a good thing, it is learning but is also under control).