Title

Author

Date of Award

5-2014

Document Type

Thesis

Degree Name

Master of Science (MS)

Legacy Department

Computer Engineering

Advisor

Brooks, Richard R

Committee Member

Wang , Kuang-Ching

Committee Member

Hoover , Adam

Abstract

Over the past few years, there have been an increase in the development and improvement of circumvention tools like Tor and Psiphon. These tools provide an environment for citizens of oppressive regimes to access websites freely without fear of identification, these tools aid democracy activists and journalists in West Africa in using the Internet securely. A similar circumvention tool was developed by us. This tool circumvents DNS and IP address blocking/filtering, by leveraging technologies developed by criminal botnet enterprises. To improve and maintain the circumvention tool we developed, it is important to quantify the number and country of origin of users. System statistics are used to give feedback to the US State Department, who funded this project. We need to show them that target users are taking advantage of the developed system. Considering that the system helps provide anonymity to users as well as bypassing DNS and IP filtering, and system users have a high demand for privacy, we must not collect sensitive user information. We therefore develop statistics that aim to not compromise user anonymity. Two probabilistic data structures are introduced, evaluated, improved upon and used, to keep system statistics without compromising user privacy. The first data structure is the negative survey. Using negative survey we can keep an aggregate count of user countries of origin without knowing the country of origin of any individual session by asking the user to report a country that they do not belong to. Negative survey allows us to calculate how many accesses there have been from each country, while keeping insensitive user information The second data structure is a probabilistic counting algorithm which, without keeping a list of already encountered data, like IP addresses, estimates the number of distinct elements in a large collection of data. We use hash values to obtain the number of unique users of the system. This algorithm is based on statistical observations made on bits of hashed values of records. Our records contain the hash values the users ssl certificates. We store the least significant bit that was set in the ssl certificate hash. From the bit position of the lowest bit that is not set, we get a good estimate of the number of system users. We contribute to this technique by considering when the number of collisions of the hash values will affect the estimate and use this amount to give a better estimate. This also allows us to decide on-line the proper register size to maintain