Context Navigation

Debugging networked applications

Application failures due to network issues are some of the most difficult to diagnose and debug. The failure may be due to in-network state or state maintained by a remote end-host, both of which are invisible to an application host. For instance, data may be dropped due to MTU issues, NAT devices and firewalls introduce problems like connection blocking, default IPv6 options can cause IPv4 applications to fail, and default buffer size settings can cause UDP datagrams to be dropped. Such failures are challenging for developers and administrators to understand and to fix. Numerous fault diagnosis tools have been developed, but few of these tools are applicable to large applications whose source code is not available. Without source code administrators often resort to probing tools such as ping and traceroute, which can help to diagnose reachability, but cannot diagnose application-level issues.

The NetCheck tool

NetCheck is a tool that determines the cause of a failure in a networked application. In contrast with most prior approaches, NetCheck does not require application- or network-specific knowledge to perform its diagnoses, and no modification to the application or the infrastructure is necessary. NetCheck treats an application as a blackbox and requires just a set of system call (syscall) invocation traces from the relevant end-hosts. These traces can be easily collected at runtime with standard blackbox tracing tools, such as strace. To perform its diagnosis, NetCheck derives a global ordering of the input syscalls by simulating the syscalls against a network model. The model is also used to identify those syscalls that deviate from expected network semantics. These deviations are then mapped to a diagnosis by using a set of heuristics.