Publicly posted URLs may contain a wealth of information about the identities and activities of the users who share them. URLs often utilize query strings (i.e., key-value pairs appended to the URL path) as a means to pass session parameters and form data. While often benign and necessary to render the web page, query strings sometimes contain tracking mechanisms, user names, email addresses, and other information that users may not wish to publicly reveal. In isolation this is not particularly problematic, but the growth of Web 2.0 platforms such as social networks and micro-blogging means URLs (often copy-pasted from web browsers) are increasingly being publicly broadcast.

To study the privacy ramifications of URL sharing this paper presents a measurement study of 892 million user-submitted URLs, many disseminated in (semi-) public forums. Within the corpus we find a trove of personal information, including 1.7 million email addresses. In the most egregious examples, query strings contain plaintext usernames and passwords for administrative and sensitive accounts. Data leakage is identified via both key-driven and value-driven analysis using manual inspections and automatic detection logic. Additionally, we analyze the "click-through" rates of sensitive URLs, examine geographical and mobile behavior patterns, and measure the broader statistical properties of key/value pairs. Finally, we argue that this study motivates a "CleanURL" service that can "scrub" URLs of privacy violating content.