"... In a recent paper we have proposed FT-TCP: an architecture that allows a replicated service to survive crashes without breaking its TCP connections. FT-TCP is attractive in principle because it does not require modifications to the TCP protocol and does not affect any of the software running on the ..."

In a recent paper we have proposed FT-TCP: an architecture that allows a replicated service to survive crashes without breaking its TCP connections. FT-TCP is attractive in principle because it does not require modifications to the TCP protocol and does not affect any of the software running on the clients; however, its practicality for real-world applications remains to be proven. In this paper, we report on our experience in engineering FT-TCP for two such applications---the Samba file server and a multimedia streaming server from Apple. We compare two implementations of FT-TCP, one based on primary-backup and another based on message logging, focusing on scalability, failover time, and application transparency. Our experiments suggest that FT-TCP is a practicable approach for replicating TCP/IP-based services that incurs low overhead on throughput, scales well as the number of clients increases, and allows recovery of the service in near-optimal time.

...the client TCP stack to migrate a failed connection to a backup. A similar approach was adopted by [12], but it requires the server application to be aware of the replication. The system described in =-=[7]-=- enables transparent reconnection in Windows NT without changing the TCP stack by wrapping the socket standard library routines. This system was designed to support process migration, but can be used ...

"... Process migration has been used to perform specialized tasks, such as load sharing and checkpoint/restarting long running applications. Implementation typically consists of modifications to existing applications and the creation of specialized support systems, which limit the applicability of the m ..."

Process migration has been used to perform specialized tasks, such as load sharing and checkpoint/restarting long running applications. Implementation typically consists of modifications to existing applications and the creation of specialized support systems, which limit the applicability of the methodology. Off the shelf applications have not benefited from process migration technologies, mainly due to the lack of an effective generalized methodology and facility. The benefits of process migration include mobility, checkpointing, relocation, scheduling and on the fly maintenance. This paper shows how regular, shrink-wrapped applications can be migrated.

... Network: Network components are not considered mo vable or migratable, without the aid of some form of middleware or higher-level communication state management. Even using the approach pioneered in =-=[5]-=-, more work is required and thus more complexity is introduced. Operating System: The degree of homogeneity between the environments is crucial. Even though there is a general compatibility between th...

"... Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on multiprocessor systems for fault-tolerance. Multiple instances of the application are simultaneously executed in separate v ..."

Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on multiprocessor systems for fault-tolerance. Multiple instances of the application are simultaneously executed in separate virtualized environments called Containers. Containers facilitate state replication between the application instances by resolving the resource conflicts and providing a uniform view of the underlying operating system across all clones. The virtualization layer that creates the container abstraction actively monitors the primary instance of the application and synchronizes its state with that of the clones by transferring the necessary information to enforce identical state among them. In particular, we address the replication of relevant operating system state, such as network state to preserve network connections across failures, and the state that results from nondeterministic interleaved accesses to shared memory in SMP systems. We have implemented RR’s state replication mechanisms in the Linux operating system by making novel use of existing features on the Intel and PowerPC architectures. 1.

... addressed as an independent problem. Some of the previous approaches [20, 21] to connection recovery are non-transparent to the application as they require using a non-generic socket library. Others =-=[22, 23, 24]-=- are non-transparent to the external client. FT-TCP [14] is a kernel based connection recovery system that virtualizes TCP’s interface with the application and the rest of the protocol stack below it,...

"... This article describes an architecture that allows a replicated service to survive crashes without breaking its TCP connections. Our approach does not require modifications to the TCP protocol, to the operating system on the server, or to any of the software running on the clients. Furthermore, it r ..."

This article describes an architecture that allows a replicated service to survive crashes without breaking its TCP connections. Our approach does not require modifications to the TCP protocol, to the operating system on the server, or to any of the software running on the clients. Furthermore, it runs on commodity hardware. We compare two implementations of this architecture (one based on primary/backup replication and another based on message logging) focusing on scalability, failover time, and application transparency. We evaluate three types of services: a file server, a Web server, and a multimedia streaming server. Our experiments suggest that the approach incurs low overhead on throughput, scales well as the number of clients increases, and allows recovery of the service in near-optimal time.

...ed in that it can migrate individual connections (not just whole processes) but it does require the server application to participate in the transfer of application state. The system by Nasika et al. =-=[26]-=- enables transparent reconnection in Windows NT without changing the TCP stack by wrapping the socket standard library routines. This system was primarily designed to support process migration, but in...

"... Software aging causes software programs to fail over time. Rejuvenation of the software is a preemptive methodology developed to reduce failure, which reduces the need for complex methods to identify and fix problems after a failure has occurred. It does not elim inate the need for managing failur ..."

Software aging causes software programs to fail over time. Rejuvenation of the software is a preemptive methodology developed to reduce failure, which reduces the need for complex methods to identify and fix problems after a failure has occurred. It does not elim inate the need for managing failure, it simply moves the bulk of the processing to a more controllable and simpler pre-failure state.

... more specific module level controls, an Enhanced Minimal State Migration methodology is used. The enhancements include handling of files and managing active network connections. Nasika and Heballalu =-=[6,7]-=- provided the pioneering work and Zhang, Khambatti and Dasgupta [8] extended the findings for the Minimal State Model methodology . Given an application, a certain minimal set of state elements combin...

"... approach to integrate the ubiquitous desktop with an underlying distributed system. The approach involves unobtrusive modification of functionality of existing systems by decoupling the application process from the operating system. Using this, we are building an integrated distributed computing pla ..."

approach to integrate the ubiquitous desktop with an underlying distributed system. The approach involves unobtrusive modification of functionality of existing systems by decoupling the application process from the operating system. Using this, we are building an integrated distributed computing platform. Process migration is an underlying mechanism that is key to enabling the Computing Communities framework. We migrate regular, general-purpose computations, running shrink-wrapped binaries. By augmenting the decoupling of applications with techniques that checkpoint and restore the state of a process, we are able to migrate a process within our distributed environment. In this paper we outline our experiences in migrating Win32 process running over Windows 2000.

...e [11]. This increases cost of development and has therefore not been popular amongst researchers. Second, the void of applications for a new platform, also called the application development barrier =-=[12]-=- makes such platforms unattractive. Finally, legacy applications need to be rewritten in order to use the features of the new platform [11]. Again this leads to the increase in the cost of development...

"... The emerging computational grid infrastructure will provide users with access to orders of magnitude more computing power than they currently have available. These resources will be heterogeneous in type and implementation and independently controlled and administered. Users of grid resources will n ..."

The emerging computational grid infrastructure will provide users with access to orders of magnitude more computing power than they currently have available. These resources will be heterogeneous in type and implementation and independently controlled and administered. Users of grid resources will need mechanisms to account for these variations that must both be without a high performance overhead while providing resource owners strong safety assurances. We propose using virtual machines, a classic idea from the 1960’s and 70’s, as this mechanism. In this work, we briefly explore some of the challenges that grid computing users will be faced with. We compare several different architectures for building virtual machines, and compare performance of several standard benchmarks under these different virtual machines. We examine some unique opportunities the additional layer of abstraction provides, such as checkpointing, split execution, and heterogenous process migration, all with unmodified user executables running on unmodified operating systems. We conclude by examining future work, and propose integrating this work with

...ombined with a Cygwin port of the rest of the Linux kernel to produce a Windows version of User-Mode-Linux. 5. RELATED WORK This work appears to be quite similar to that of Nasika, Boyd, and Dasgupta =-=[24]-=-[3][4] at Arizona State University. From the published work it does not appear that they have a production system at this point. Their work also uses virtualization to provide new services to applicat...