An infrastructure for automated web privacy measurement has three components: simulating users, recording observations (response metadata, cookies, behavior of scripts, etc.), and analysis. We set out to build a platform that can automate the first two components and can ease the researcher’s analysis task. We sought to make OpenWPM general, modular, and scalable enough to support essentially any privacy measurement.

OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection, including a proxy, a Firefox extension, and access to Flash cookies. Check out the Github repository for more information on the instrumentation and data collected.

FTC PrivacyCon. Washington, DC

Publications

Steven Englehardt, Jeffrey Han, and Arvind Narayanan

To appear at “Privacy Enhancing Technologies Symposium (PETS) 2018”.
We show that the simple act of viewing emails contains privacy pitfalls for the unwary. We assembled a corpus of commercial mailing-list emails, and find a network of hundreds of third parties that track email recipients via methods such as embedded pixels. About 30% of emails leak the recipient’s email address to one or more of these third parties when they are viewed. In the majority of cases, these leaks are intentional on the part of email senders, and further leaks occur if the recipient clicks links in emails. Mail servers and clients may employ a variety of defenses, but we analyze 16 servers and clients and find that they are far from comprehensive. We propose, prototype, and evaluate a new defense, namely stripping tracking tags from emails based on enhanced versions of existing web tracking protection lists.

We show how third-party web trackers can deanonymize users of cryptocurrencies. We present two distinct but complementary attacks. On most shopping websites, third party tracker receive information about user purchases for purposes of advertising and analytics. We show that, if the user pays using a cryptocurrency, trackers typically possess enough information about the purchase to uniquely identify the transaction on the blockchain, link it to the user’s cookie, and further to the user’s real identity. Our second attack shows that if the tracker is able to link two purchases of the same user to the blockchain in this manner, it can identify the user’s entire cluster of addresses and transactions on the blockchain, even if the user employs blockchain anonymity techniques such as CoinJoin. The attacks are passive and hence can be retroactively applied to past purchases. We discuss several mitigations, but none are perfect.

Lukasz Olejnik, Steven Englehardt, and Arvind Narayanan

Appeared at: “International Workshop on Privacy Engineering (IWPE) 2017”
We systematically analyze the W3C Battery Status API to help inform future privacy assessments. We begin by reviewing its evolution — the initial specification, which only cursorily addressed privacy, the discovery of surprising privacy vulnerabilities as well as actual misuse in the wild, followed by the removal of the API from major browser engines, an unprecedented move. Next, we analyze web measurement data from late 2016 and confirm that the majority of scripts used the API for fingerprinting. Finally, we draw lessons from this affair and make recommendations for improving privacy engineering of web standards

April 14, 2017

The Future of Ad Blocking: Analytical Framework and New Techniques

Grant Storey, Dillon Reisman, Arvind Narayanan, Jonathan Mayer

We present a systematic study of ad blocking — and the associated “arms race” — as a security problem. We propose multiple new ad blocking techniques and evaluate them using prototype implementations we have built. Contrary to widespread assumption, we argue that users / ad blockers hold the upper hand.

Arvind Narayanan and Dillon Reisman

To appear in “Transparent data mining for Big and Small Data”. Editors: Tania Cerquitelli, Daniele Quercia, Frank Pasquale. Springer. 2017.
This book chapter presents an overview of the goals, design, and findings of the WebTAP project. We recommend that readers new to the project begin here. The chapter concludes with recommendations for public policy and regulation of privacy.

Jessica Su, Ansh Shukla, Sharad Goel, Arvind Narayanan

We show — theoretically, via simulation, and through experiments on real user data — that de-identified web browsing histories can be linked to social media profiles using only publicly available data. This is possible because each person has a distinctive social network, and thus the set of links appearing in one’s feed is unique. We recruited nearly 400 people to donate their web browsing histories, and we were able to correctly identify more than 70% of them.

Steven Englehardt and Arvind Narayanan

We present the largest and most detailed measurement of online tracking conducted to date, based on a crawl of the top 1 million websites. We make 15 types of measurements on each site, including stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and the exchange of tracking data between different sites (“cookie syncing”). Our findings include multiple sophisticated fingerprinting techniques never before measured in the wild.

Peter Zimmerman

We show that “malvertisers” can circumvent ad review processes, execute arbitrary JavaScript, or phish credentials from large numbers of users. Through the purchase of 415,343 ad impressions, we detect 984 instances of page alteration or script injection within 16 countries.

We investigate the ability of a passive network observer to leverage third-party HTTP tracking cookies for global surveillance. Using simulated browsing profiles from several locations around the world, we cluster network traffic by transitively linking shared unique cookies and estimate that for typical users 62-73% of websites with embedded trackers are located in a single connected component. Furthermore, almost half of the most popular webpages will leak a logged-in user’s real-world identity to an eavesdropper in unencrypted traffic.

Michael Kranch, Joseph Bonneau

We have conducted the first in-depth empirical study of two important new web security features, strict transport security (HSTS) and public-key pinning. Our findings highlight that the web platform, as well as modern websites, are large and complicated enough to make even conceptually simple security upgrades challenging to deploy in practice.

This collaboration between researchers at KU Leuven and Princeton is the first large-scale study of three advanced web tracking mechanisms – canvas fingerprinting, evercookies, and the use of “cookie syncing” in conjunction with evercookies.

[Manuscript]

We identify 32 web privacy measurement studies, cast them as instances of a generic experimental framework, and perform a thorough methodological analysis. We analyze design and implementation alternatives and make recommendations based on considerations of experimental rigor and engineering feasibility. We present a flexible, modular web privacy measurement platform that supports any experiment that fits the framework. It is also highly scalable and avoids many common pitfalls. Finally, as a case study of our methods and infrastructure, we measure the “filter bubble”, i.e., the extent of personalization based on a user’s history, by crawling approximately 300,000 pages across nine news sites and present evidence that this personalization effect has been greatly overstated in the popular press.

Join Us

Academic researchers, developers, public advocates, and others with expertise in online privacy all could advance our progress towards providing accurate web privacy information and best practices for the public.

If you are interested in working with us on these issues, please *protected email* Arvind Narayanan.