Using SFTP from the client we're only seeing about 40Kbps of throughput using large (>2GB) files. We've performed server to 'other local server' tests and see around 500Mbps through the local switches (Cat 6509s), we're going to do the same on the client side but that's a day or so away.

What other testing methods would you use to prove to the link providers that the problem is theirs?

4 Answers
4

Tuning an Elephant:
This could require tuning, probably not the issue here as pQd says though. This sort of link is known "Long, Fat Pipe" or elephant (see RFC 1072). Because this is a fat gigabit pipe going over a distance (distance is really time/latency in this case), the tcp receive window needs to be large (See TCP/IP Illustrated Volume 1, TCP Extensions Section for pictures).

To figure out what the receiving window needs to be, you calculate the bandwidth delay product:

Bandwidth * Delay = Product

If there is 10MS latency, this calculator estimates you want a receive window of about 1.2 MBytes. We can do the calculation ourselves with the above formula:

echo $(( (1000000.00/.01)/8 ))
12500000

So you might want to run a packet dump to see if tcp window scaling (the TCP extension that allows for larger windows) is happening right to tune this once you figure out whatever the large problem is.

Window Bound:
If this is the problem, that you are window size bound with no scaling, I would expect the following results if no Window scaling is in place and there is about 200ms latency regardless of the pipe size:

Throughput = Recieve Window/Round Trip Time

So:

echo $(( 65536/.2 ))
327680 #Bytes/second

In order to get the results you are seeing you would just need to solve for latency, which would be:

RTT = RWIN/Throughput

So (For 40 kBytes/s):

echo $(( 65536.0/40000.0 ))
1.63 #Seconds of Latency

(Please check my Math, and these of course don't include all the protocol/header overhead)

You know I felt a bit guilty for temporarily 'overtaking' you on rep the other week, and the reason is because of how damn good your answers are - and BOOM! you even use a shell to do your maths, not the 1.5MB Mac Calculator.app I do! :) Thank you.
– Chopper3Apr 16 '10 at 13:14

1

You have good answers too, and I like that I have someone I am close to in rep, enhances the game a little :-) Quick google query reminds me that you have answered my questions as well: serverfault.com/questions/107263/… . I just really appreciate the active users trying to make this community 'happen'. But thank you for the complement!
– Kyle BrandtApr 16 '10 at 13:20

Me too, there's nothing I like more than knowing we've helped someone who felt they were on their own with a frustrating problem - apart from cheese of course. That said I do hate it when we get badly formed questions too, did you you hear my question on SO podcast 82? got a free SF tshirt out of it too!
– Chopper3Apr 16 '10 at 13:23

I listen to most of the podcasts but missed that one, will go back and check it out (probably this weekend).
– Kyle BrandtApr 16 '10 at 13:26

40kbps is very low [up to the point that i would suspect faulty media converters/duplex mismatch [but you have gigabit so there is no place for half duplex!] etc]. there must be packet losses or very high jitter involved.

iperf is first tool that comes to my mind to measure available throughput. run at one side

iperf -s

and on the other:

iperf -t 60 -c 10.11.12.13

then you can swap client/server roles, use -d for duplex etc. run mtr between both machines before start of the test and see what latency / packet losses you have on unused link, and how do they change during the data transfer.

you would like to see: very small jitter and no packet losses until link is saturated at 90-something percent of its capacity.

We know that the link is made up of 6 1000-base-zx links so there's bound to be latency introduced by all that repeating but even so I'm surprised as you are how low it is, great tip on the iperf thing by the way, I'd totally forgotten it existed!
– Chopper3Apr 16 '10 at 11:03

do you know how this 1GB link is being provisioned? are you bridging or routing over this link? What is your SLA for the link? you could being shaped by your link provider?

if your only getting 40kbs , then there is a serious problem, are you sure that it's not a 1MB's link rather than 1GB/s link. You'll probably find that the speed of the link is not what you think it it :-)

Thanks for your answer, it's a dedicated multi-segment bridged single-mode fibre link, there's no shaping at all involved as it's just L2 all the way - oh and I do so hope it's not a 1Mbps link, not with the money it's costing :)
– Chopper3Apr 16 '10 at 11:08

1

if your bridging to your LAN, i.e. no routing anywhere, then network broadcasts will be wasting link capacity, true for 1gb's it will be a small fraction, but a misbehaving network service could flatten the link. I presume these bridges are out of your control. These switches may be overloaded, or incurring very high latency. High latency means low bandwidth.
– The Unix JanitorApr 16 '10 at 11:17

@user37899 - high latency does not have to mean low bandwidth, but requires tuning... anyway - how much latency can you get on 200 miles - if things are ok - no more than 3-10ms. arp [or other] broadcast at the gigabit link is probably very small fraction of the whole available capacity.
– pQdApr 16 '10 at 11:35

1

If you have network broadcasts occurring at such a level as to affect the performance of the link, then I suspect you would have had internal performance problems long before this new line came in and would have noticed as much.
– joeqwertyApr 16 '10 at 11:38

@pQd i was actually talking about a broadcast storm.
– The Unix JanitorApr 16 '10 at 11:39