Open Source Fast File Transfers

There exist a number of open source projects trying to tackle accelerated file transfer via UDP. Some solutions are more mature than others and also use different technologies to solve the same problem of large data transfer over WAN. This article should provide the reader enough information to compare the different solutions and gauge if an open source project could be used instead of purchasing a commercial solution like FileCatalyst.

Some commercial managed file transfer solutions claiming to have UDP acceleration have integrated one of those open source projects into their core file transfer technology. These solutions will inherit the strengths but also the weaknesses of the open source project. FileCatalyst has developed its own UDP based protocol written from the ground up, and does not include any code from open source UDP technology.

We will review 4 open source projects: three which use UDP and one which only optimizes TCP parameters to provide fast file transfer.

Common problems with all 4 solutions is lack of graphical user interface. Two provide bare bones sender and receiver APIs (meaning that the end user has to compile the classes), while the third one only comes with a command line interface (CLI). Another common problem is poor support for firewall traversal. While this is not an issue for internal transfers, most organizations are interested in sending files over the WAN (which will almost certainly have at least one firewall somewhere on the route). None of these solution fare well in the worst network conditions, where packet loss, bandwidth or latency are very high. Finally the congestion control in the UDP projects is missing the flexibility to adapt to ever changing network conditions during the data transfer.

Below is a quick reference table comparing the four products. Following the table are reviews of each of the products tested.

UDT

Tsunami

UFTP

GridFTP

Multi-threaded

no

no

no

yes

Protocol Overhead

10%

20%

~10%

6-8% (same as TCP)

Encryption

no

no

yes

yes

C++ source code

yes

yes

yes

yes

Java source code

partial

no

no

no

Command Line

no

no

yes

yes

Binaries

no (source code only)

no (source code only)

yes (CLI only)

yes (CLI only)

UDP based point-to-point

yes

yes

yes

no

Firewall Friendly

Partial (no auto-detection)

no

Partial (no auto-detection)

no

GUI client

no

no

no

no

Server with secure user accounts

no

no

no

yes

Congestion control

yes (udp blast mode preferred)

yes (limited)

yes (congestion control file has to be specified before the transfer starts)

yes – using TCP

Automatic retry and resume

no

no

no (manual resume yes)

yes

Jumbo Packets

yes

no

yes (up-to 8800 bytes)

yes

IPv6 support

yes

no

no

yes

support for any packet loss

no

no

no

yes

support for low bandwidth high packet loss (ie. satellite)

no

no

no

no

optimized for medium bandwidth (<155Mbps) high latency

yes

yes

yes

yes

optimized for high bandwidth (500Mbps or more) high latency

no

no

no

no

memory footprint

medium

medium

medium

high (grows with each concurrent stream)

1. UDT UDP-based Data Transfer

No installer and no binaries are available, both client and server have to be built from source

This is only a bare bones source code implementation of the sender and receiver, all the functionality around user authentication, reporting, monitoring and file management have to be implemented by the programmer

This project could only be used if 2 back office servers are sending files with no firewalls in between and without any user interaction.

Core:

no multi-threading, meaning that only a single CPU core can do the work of receiving, processing, decrypting, decompressing, and writing to disk, this may also limit the number of concurrent connections that can be serviced at once

poor performance on high packet loss, low bandwidth links, default configuration is very sensitive to packet loss. In fact a single dropped packet could force a failed transfer

2. Tsunami UDP Protocol

This open source project has not been developed in 2 years. (unchanged since May 2010)

Functionality:

requires to be built from Source (no binaries)

This is only a source code implementation of the sender and receiver; all the functionality around user authentication, reporting, monitoring and file management must be implemented by the programmer.

Core:

Only C++ source code

20% protocol overhead, example: (100 Mbps link will only be able to send at 80 Mbps)

no jumbo packet support

no multi-threading, meaning that only a single CPU core does the work of receiving, processing, decrypting, decompressing, and writing to disk. This may also limit the number of concurrent connections that can be serviced at once

Not optimized for very high bandwidth 100 Mbps or more

Not optimized for low bandwidth high pocket loss (ie. satellite)

no graphical client interface for point-to-point transfers

no support for firewall traversal

no resume and retry (although it could be built by the programmer)

3. UFTP

UFTP is a UDP-based file transfer protocol and the name of a tool that implements that protocol. (UFTP) is designed for particularly efficient file transfers under scenarios where the file is to be broadcast/multicast or the transfer occurs over a wireless link (such as satellite). However, in low-error, high-bandwidth or high-latency scenarios, it can outperform TCP-based protocols such as FTP by 100% or more.source: Wikipedia

The UFTP protocol was based on the Starburst MFTP protocol (source: Wikipedia)

Functionality:

Comes with command line tools only

No firewall auto-detection, meaning that UDP is always forced, there is no fall back to TCP/HTTP

Congestion Control can only be enabled ahead of the transfer via pre-populated config file

no user account management on the server

Core:

Protocol designed predominantly for multicast, point-to-point file transfer is not the core of the technology

Poor performance in high packet loss environment (satellite or wireless)

no multi-threading, meaning that only a single CPU core can do the work of receiving, processing, decrypting, decompressing, and writing to disk, this may also limit the number of concurrent connections that can be serviced at once

not optimized for high bandwidth (500 Mbps or more)

no graphical client interface for point-to-point transfers

4. GridFTP

The underlying TCP connection in FTP has numerous settings such as window size and buffer size. GridFTP allows automatic (or manual) negotiation of these settings to provide optimal transfer speeds and reliability (settings are likely to need to be different for best performance with large files and for large groups of files).Source: Wikipedia

Although GridFTP is not UDP based, it can be used to solve the problem of poor TCP performance with FTP.

Grid FTP requires a much larger framework called Globus, which is steered under the organisation of the Global Grid Forum.

For optimized transfers, multiple nodes or TCP streams must be used

Optimized transfer of a single large file wit a single stream between 2 nodes is not possible

Command line client interface only (no GUI)

must know TCP buffer size and block size ahead of time before the transfer begins: tcp-bs and -tcp-buffer-size

The server and client must be part of a much larger network of Globus nodes

Not optimized for very high bandwidth 500 Mbps or more

Not optimized for low bandwidth high packet loss (ie. satellite)

Conclusions

So… Which of these solutions is the most viable?

UDT seems to be pulling ahead for now. But none of these projects are a viable replacement for the enterprise because they are lacking the functionality and the ease of use of commercial applications. GridFTP could be used if the organization plans to use Globus and develop a file transfer workflow based on the CLI. A commercial solution such as FileCatalyst addresses each of the weak points, including flexible congestion control, firewall friendliness, GUI client apps and automatic resume/retry which provides a real cost savings and efficiency boost when compared to piecing together a custom solution using a bare bones API.