Traffic source identification aims to overcome obfuscation techniques that hide traffic sources to evade detection. Common obfuscation techniques include IP address spoofing, encryption together with proxy, or even unifying packet sizes. On one hand, traffic source identification provides the technical means to conduct web access surveillance so as to combat crimes even if the traffic are obfuscated. Yet on the other hand, adversary may exploit traffic souce identification to intrude user privacy by profiling user interests.
We lay out a framework of traffic source identification, in which we investigate the general approaches and factors in designing a traffic source identification scheme with respect to different traffic models and analyst?s capabilities.
Guided by the framework, we examine three traffic source identification applications, namely, tracing back DDoS attackers, passively fingerprinting websites over proxied and encrypted VPN or SSH channel, and actively fingerprinting websites over Tor.
In the analysis of identifying DDoS attackers, we find out that with the information of network topology, it is unnecessary to construct packet marks with sophisticated structures. Based on this observation, we design a new probabilistic packet marking scheme that can significantly improve the traceback accuracy upon previous schemes, by increasing the randomness in the collection of packet marks and hence the amount of information they transmit.
We develop a passive website fingerprinting scheme applicable to TLS and SSH tunnels. Previous website fingerprinting schemes have demonstrated good identification accuracy using only side channel features related to packet sizes. Yet these schemes are rendered ineffective under traffic morphing, which modifies the packet size distribution of a source website to mimic some target website. However, we show that traffic morphing has a severe limitation that it cannot handle packet ordering while simultaneously satisfying the low bandwidth overhead constraint. Hence we develop a website fingerprinting scheme that makes use of the packet ordering information in addition to packet sizes. Our scheme enhances the website fingerprinting accuracy as well as withstands the traffic morphing technique.
Extending from the passive website fingerprinting model, we propose an active website fingerprinting model that can be applied to essentially any low latency, encrypted and proxied communication channel, including TLS or SSH tunnels and Tor. Our model is able to recover web object sizes as website fingerprint features, by injecting delay between object requests to isolate the download of data for each object. The scheme we develop following the active model obtains high identification accuracy. It drastically reduces the anonymity provided by Tor.
Through our study, we find that protecting user privacy involves tradeoff between communication anonymity and overheads, such as bandwidth overhead, delay, and sometimes even computation and storage. Currently, the most reliable countermeasures against traffic source identification are packet padding and adding dummy traffic. The aggressiveness of applying the countermeasures and the willingness to trade off the overheads impact the effectiveness of the anonymity protection.