Practical Network Tomography

In this thesis, we investigate methods for the practical and accurate localization of Internet performance problems. The methods we propose belong to the field of network loss tomography, that is, they infer the loss characteristics of links from end-to-end measurements. The existing versions of the problem of network loss tomography are ill-posed, hence, tomographic algorithms that attempt to solve them resort to making various assumptions, and as these assumptions do not usually hold in practice, the information provided by the algorithms might be inaccurate. We argue, therefore, for tomographic algorithms that work under weak, realistic assumptions. We first propose an algorithm that infers the loss rates of network links from end-to-end measurements. Inspired by previous work, we design an algorithm that gains initial information about the network by computing the variances of links' loss rates and by using these variances as an indication of the congestion level of links, i.e., the more congested the link, the higher the variance of its loss rate. Its novelty lies in the way it uses this information – to identify and characterize the maximum set of links whose loss rates can be accurately inferred from end-to-end measurements. We show that our algorithm performs significantly better than the existing alternatives, and that this advantage increases with the number of congested links in the network. Furthermore, we validate its performance by using an "Internet tomographer" that runs on a real testbed. Second, we show that it is feasible to perform network loss tomography in the presence of "link correlations," i.e., when the losses that occur on one link might depend on the losses that occur on other links in the network. More precisely, we formally derive the necessary and sufficient condition under which the probability that each set of links is congested is statistically identifiable from end-to-end measurements even in the presence of link correlations. In doing so, we challenge one of the popular assumptions in network loss tomography, specifically, the assumption that all links are independent. The model we propose assumes we know which links are most likely to be correlated, but it does not assume any knowledge about the nature or the degree of their correlations. In practice, we consider that all links in the same local area network or the same administrative domain are potentially correlated, because they could be sharing physical links, network equipment, or even management processes. Finally, we design a practical algorithm that solves "Congestion Probability Inference" even in the presence of link correlations, i.e., it infers the probability that each set of links is congested even when the losses that occur on one link might depend on the losses that occur on other links in the network. We model Congestion Probability Inference as a system of linear equations where each equation corresponds to a set of paths. Because it is infeasible to consider an equation for each set of paths in the network, our algorithm finds the maximum number of linearly independent equations by selecting particular sets of paths based on our theoretical results. On the one hand, the information provided by our algorithm is less than that provided by the existing alternatives that infer either the loss rates or the congestion statuses of links, i.e., we only learn how often each set of links is congested, as opposed to how many packets were lost at each link, or to which particular links were congested when. On the other hand, this information is more useful in practice because our algorithm works under assumptions weaker than those required by the existing alternatives, and we experimentally show that it is accurate under challenging network conditions such as non-stationary network dynamics and sparse topologies.