Hello !
I have a concern about translation (en-de). I need to translate a huge number of sentences, more than 100k. There is no problem translating wmt_14, wmt_17. However, when it comes to a larger dataset, I start having a problem with GPU consumption.

Initial consumption was arond 1,5-2 Gb, with batch 16, but after 3-4k sentence I get the consuption around 5 Gb and after 10k it is a bit more than 8 Gb. As I understand, there can be some GPU consumption fluctuation due to the length of the sentences in a batch, but I have constant increasing. I had an idea to crop the dataset into pieces of 3k, and then translate them, but it’s not that convenient. (I’m translating europarl )
Do you know what could be a problem or how to fix it ?