On Sat, 3 Jun 2000 kuznet@xxxxxxxxxxxxx wrote:
> Probably, receiver delays each ACK by 500msec. I have no idea
> why it does this, because it is apparently illegal behaviour.
> Look:
Yes, it looks that way and I got the acks mixed up.
> for these 5 seconds. Probably, your sender selected illegal sndbuf,
> which is much less than receiver window.
The application in question was ttcp. And your guess is quite close to
truth. I should've known what was the cause since we've been hit with the
exact same thing already (albeit on the receive side).
tcp_snd_test: tail: 1, packets_in_flight: 95, snd_cwnd: 95
end_seq: -685618378, snd_una: -685641578, snd_wnd: 32488
This is debugging output at the time the delay begins and looks perfectly
reasonable.
tcp_snd_test: tail: 1, packets_in_flight: 75, snd_cwnd: 116
end_seq: -685618190, snd_una: -685636454, snd_wnd: 32488
And after the delay - in the between no calls were made to tcp_snd_test.
So the real culprit is this test:
if (sock_wspace(sk) >= tcp_min_write_space(sk) &&
(sock = sk->socket) != NULL) {
The key here is that mss was 256 - sock_wspace() is:
amt = sk->sndbuf - atomic_read(&sk->wmem_alloc);
wmem_alloc contains the sum of skb->truesize fields. If we do not
take into account the overhead from struct (sk_buff) and aligning this
test would be circa:
65536-32767 >= 32767
when the sender is made to sleep so the two numbers are almost equal.
However with MTU of 296 (as given to pppd) it is:
tcp_new_space: wspace: 10435, write_space: 27550
tcp_new_space: wspace: 11015, write_space: 27260
tcp_new_space: wspace: 11595, write_space: 26970
tcp_new_space: wspace: 12175, write_space: 26680
... acks flow in ...
tcp_new_space: wspace: 19715, write_space: 22910
tcp_new_space: wspace: 20295, write_space: 22620
tcp_new_space: wspace: 20875, write_space: 22330
tcp_new_space: wspace: 21455, write_space: 22040
Hmm, 11015-10435=580 - it'd make sense if there were two skbs allocated
for each segment? Oh, I see skb_clone() in tcp_send_skb, right?
The disparity between this test and the available send window is the
cause of the bursts. Also explained is why the over-scheduling masked
this behaviour. Following patch changes wmem_alloc to only include
the actual data and it seems to work. This is a hackish approach at
best though.
diff -urN --exclude=*~ linux-2.4.0-test1-ac6.bak/net/ipv4/tcp.c
linux-2.4.0-test1-ac6/net/ipv4/tcp.c
--- linux-2.4.0-test1-ac6.bak/net/ipv4/tcp.c Mon Apr 24 23:59:57 2000
+++ linux-2.4.0-test1-ac6/net/ipv4/tcp.c Mon Jun 5 18:48:59 2000
@@ -960,6 +960,7 @@
skb = alloc_skb(tmp, GFP_KERNEL);
if (skb == NULL)
goto do_oom;
+ skb->truesize = copy;
skb_set_owner_w(skb, sk);
} else {
/* If we didn't get any memory, we need to
sleep. */
Our second problem with this disparity is on the receive side. The scenario
is essentially the same but with an unreliable link (read wireless) which
drops packets. In case of packet drop receiver keeps building an
out-of-order queue which grows to the limit of the receive buffer
quite quickly. However sender keeps sending more because of the difference
between advertised window and the actual allocated space. This triggers
tcp_input.c:prune_queue() which purges the whole out-of-order queue to
free up space, thus killing the TCP performance quite effectively.
The fix in our internal use is similar to the rmem_alloc case. I do think
both of these situations are quite valid. I am not so sure about the correct
fix though.