A Resilient Overlay Network (RON) is an architecture that allows
distributed Internet applications to detect and recover from path
outages and periods of degraded performance within several seconds,
improving over today's wide-area routing protocols that take at least
several minutes to recover. A RON is an application-layer overlay on
top of the existing Internet routing substrate. The RON nodes monitor
the functioning and quality of the Internet paths among themselves,
and use this information to decide whether to route packets directly
over the Internet or by way of other RON nodes, optimizing
application-specific routing metrics.

Results from two sets of measurements of a working RON deployed at
sites scattered across the Internet demonstrate the benefits of our
architecture. For instance, over a 64-hour sampling period in March
2001 across a twelve-node RON, there were 32 significant outages,
each lasting over thirty minutes, over the 132 measured paths. RON's
routing mechanism was able to detect, recover, and route around {\em
all} of them, in less than twenty seconds on average, showing that its
methods for fault detection and recovery work well at discovering
alternate paths in the Internet. Furthermore, RON was able to improve
the loss rate, latency, or throughput perceived by data transfers; for
example, about 5\% of the transfers doubled their TCP throughput and
5\% of our transfers saw their loss probability reduced by 0.05. We
found that forwarding packets via at most one intermediate RON node is
sufficient to overcome faults and improve performance in most cases.
These improvements, particularly in the area of fault detection and
recovery, demonstrate the benefits of moving some of the control over
routing into the hands of end-systems.