As the title says… I’m training a transformer for 3 days with default config, 5M sentences, RTX 2080Ti, 170K steps so far, and BLEU score increases are minimal with small fluctuations ~0.10 for the last past day. Should I consider this a convergence? Is there any chance for overfitting? BLEU is already good at 52.64 and actual model performance is also very good.