Many of the links that connect NY and AMS are often saturated. That means that running a transfer over them (e.g., move 300GB at 1MB/s) would take an age if compared with what our connections can offer.

I came across to the problem in the past, like 3 years ago, when i was really newbie to coding and linux, i came to my conclusion that i will post on the bottom of this post. However, it's dirty and i don't like it. The script dosen't work as it is, since that it was written for a very specific environment, however it give you the idea.

My question is, do you know of anything better alternatives to transfer files across the ocean in a fast way?

I really don't know what you're talking about, not only can I buy entirely uncontended multi-gigabit links from any major European city to NYC but its estimated that there's between 8 and 12 times more available bandwidth between Europe and the US than there is demand. If you have financial constraints then please let us know what they are.
– Chopper3Sep 29 '12 at 19:21

i'm not in that business, i simply rent servers here and there, they all came with this restrictions, no matter how much i pay them. Sigle threaded connections are slow.
– cedivadSep 29 '12 at 20:10

Based on that last comment I am going to assume that you are looking for a way to bypass the bandwidth restrictions imposed on your hosting accounts. How do you imagine opening multiple streams is going to help you? Such restrictions apply to the total bandwidth, not bandwidth per connection.
– John GardeniersSep 30 '12 at 4:01

3 Answers
3

Assuming that your network are not saturated (contrary to what you're stating in the question), you should be tuning your link to deal with the (comparatively) high bandwidth delay product like Andrew mentioned. (The articles referenced at that link include some info on what to tweak, when, and why.)

If in fact your network links ARE saturated (moving the maximum amount of data they can) the only solution is to add more bandwidth (either more fiber trunks between the two sites, paying another carrier for transit to offload some of the peak period traffic, or if you're using "dedicated" links paying for a higher CIR/adding more circuits to the loop).

How can you tell the difference?
Well, if starting more streams gets you more speed you haven't saturated your link. You're probably getting hit by the relatively long round-trip time from the US to Europe (as compared to the round-trip time on a local network).
(There's a point of diminishing returns here as the overhead for more TCP connections will eventually cause other bottlenecks to show up.)

If adding more streams provides no net increase in speed (two streams run at half the net speed of one) your link is saturated, and you need to add bandwidth to improve performance.

Other stuff to consider

You should seek to minimize the data being pushed over the pipe, using rsync or similar protocols if appropriate (rsync works best with small-ish change sets to large-ish collections of data).

Never underestimate the bandwidth of a FedEx overnight package with a couple of hard disks in it. Especially for initial syncs.

I would check the TCP/IP tuning options, for example window scaling, retransmission, the routing table as well icmp. If this is all working OK, and the networking stack on the OS is not Windows XP or Centos 5 or anything older than Vista, you should be OK the way that multi-threaded network connection is not required. Or, it would not improve better than 20%, so in fact, it would just defragment the filesystem and slow it down even more.

I'm using freebsd over SSDs. Tons of SSDs. I don't have any disk bottleneck. If i setup the server in the EU i don't have any problems at all. Should i try to tune the TCP/IP protocol even in this case? I've never done it. Thank you.
– cedivadSep 30 '12 at 8:15

A high bandwidth-delay product is an important problem case
in the design of protocols such as Transmission Control
Protocol (TCP) in respect of TCP tuning, because the protocol
can only achieve optimum throughput if a sender sends a
sufficiently large quantity of data before being required to
stop and wait until a confirming message is received from the
receiver, acknowledging successful receipt of that data. If
the quantity of data sent is insufficient compared with the
bandwidth-delay product, then the link is not being kept busy and
the protocol is operating below peak efficiency for the link.

That's the basic theory, but there are additional factors: depending on your OS and TCP tuning options you might have large windows in play (large windows make it go faster), but then again some ISP's use "TCP Window Manipulation" as a shaping and congestion control tool (i.e. a box in the middle knows some link is congested and then attempts to quench the TCP source by editing the ACK packets in order to convince the source that the receiver's window size is small), so your large windows might not really be in play, even when you think you have them switched on.

There's one more thing that could be happening which is that as packet queues pile up in a congested router, it can start randomly dropping packets out of the queue (see Cisco Weighted Random Early Detection or WRED for short) but the guy using only one TCP stream tends to back off more rapidly than the guy using a bunch of parallel TCP streams so by using multiple parallel streams you can get a bigger "share" of the bandwidth on that congested queue (at the expense of others who abstain from this technique).

There's a fun tool called "tcptrace" which gives you visibility into what's going on, presuming you can capture packets at either end. Unfortunately you need to work with "xplot" which is a bit of a horrible program, but you can live with it.