There is a case in __sk_mem_schedule(), where an allocationis beyond the maximum, but yet we are allowed to proceed.It happens under the following condition:

sk->sk_wmem_queued + size >= sk->sk_sndbuf

The network code won't revert the allocation in this case,meaning that at some point later it'll try to do it. Sincethis is never communicated to the underlying res_countercode, there is an inbalance in res_counter uncharge operation.

I see two ways of fixing this:

1) storing the information about those allocations somewhere in memcg, and then deducting from that first, before we start draining the res_counter,2) providing a slightly different allocation function for the res_counter, that matches the original behavior of the network code more closely.

I decided to go for #2 here, believing it to be more elegant,since #1 would require us to do basically that, but in a moreobscure way.