Abstract

The TOR network aims at providing anonymity to those users of the Internet who do not want to reveal their identity when browsing specific contents. TOR is composed of a set of routers that make use of cryptography to apply several layers of encryption to packets, thus hiding the IP address of the user but also the relationship between the user and the server, and the contents he/she is accessing to. Despite TOR can provide a high degree of anonymity, it is vulnerable to specific attacks, such as Website Fingerprinting (WF). By means of WF, an attacker can guess the web page accessed by a given user under some circumstances. The main goal of this work is to analyze the robustness of TOR against such attacks. With this purpose, an attacker who is capable of eavesdropping the packets being sent and received by a given user has been emulated. The attack relies on the hypothesis that the attacker knows in advance the set of possible web pages the user is accessing to, i.e., it is performed under the assumption of a closed world. In order to perform the attack, it is need to create a dataset containing a set of traces with the packets sent and received when downloading each one of the web pages. The dataset must be properly formatted, and then a Machine Learning algorithm is applied to classify each trace, according to the web page it is linked to. An exhaustive set of tests has been run by using three algorithms: KNN, SVM and Random Forest. The parameters for each algorithm have been tuned in order to obtain the optimal setting that exhibits the highest precision or success rate.