Gone In Six Characters: Short URLs Considered Harmful for Cloud Services

[This is a guest post by Vitaly Shmatikov, professor at Cornell Tech and once upon a time my adviser at the University of Texas at Austin. — Arvind Narayanan.]

TL;DR: short URLs produced by bit.ly, goo.gl, and similar services are so short that they can be scanned by brute force. Our scan discovered a large number of Microsoft OneDrive accounts with private documents. Many of these accounts are unlocked and allow anyone to inject malware that will be automatically downloaded to users’ devices. We also discovered many driving directions that reveal sensitive information for identifiable individuals, including their visits to specialized medical facilities, prisons, and adult establishments.

URL shorteners such as bit.ly and goo.gl perform a straightforward task: they turn long URLs into short ones, consisting of a domain name followed by a 5-, 6-, or 7-character token. This simple convenience feature turns out to have an unintended consequence. The tokens are so short that the entire set of URLs can be scanned by brute force. The actual, long URLs are thus effectively public and can be discovered by anyone with a little patience and a few machines at her disposal.

Today, we are releasing our study, 18 months in the making, of what URL shortening means for the security and privacy of cloud services. We did not perform a comprehensive scan of all short URLs (as our analysis shows, such a scan would have been within the capabilities of a more powerful adversary), but we sampled enough to discover interesting information and draw important conclusions. Our study focused on two cloud services that directly integrate URL shortening: Microsoft OneDrive cloud storage (formerly known as SkyDrive) and Google Maps. In both cases, whenever a user wants to share a link to a document, folder, or map with another user, the service offers to generate a short URL – which, as we show, unintentionally makes the original URL public.OneDrive.

OneDrive generates short URLs for documents and folders using the 1drv.ms domain. This is a “branded short domain” operated by Bitly and uses the same tokens as bit.ly. Therefore, any scan of bit.ly short URLs automatically discovers 1drv.ms URLs. In our sample scan of 100,000,000 bit.ly URLs with randomly chosen 6-character tokens, 42% resolved to actual URLs. Of those, 19,524 URLs lead to OneDrive/SkyDrive files and folders, most of them live. But this is just the beginning.

OneDrive URLs have predictable structure. From the URL to a single shared document (“seed”), one can construct the root URL and automatically traverse the account, discovering all files and folders shared under the same capability as the seed document or without a capability. For example, suppose you obtain a short URL such as http://1drv.ms/1xNOWV7 which resolves to https://onedrive.live.com/?cid=48…48&id=48…48!115&ithint=folder,xlsx&authkey=!A..q4. First parse the URL and extract the cid and authkeyparameters.Then, construct the root URL for the account as https://onedrive.live.com/?cid=48…48&authkey=!A...q4. From the root URL, it is easy to automatically discover URLs of other shared files and folders in the account (note: the following traversal methodology no longer works as of March 2016). To find individual files, parse the HTML code of the page and look for a elements with href attributes containing &app=, &v=, /download.aspx?, or /survey?. To find other folders, look for links that start with https://onedrive.live.com/ and contain the account’s cid.

The traversal-augmented scan yielded URLs to 227,276 publicly accessible OneDrive documents, including dozens of thousands of PDF and Word files, spreadsheets, media files, and executable binaries. A similar scan of 100,000,000 random 7-character bit.ly tokens yielded URLs to 1,105,146 publicly accessible OneDrive documents. We did not download their contents, but just from the metadata it is obvious that many of them contain private or sensitive information.

Around 7% of the OneDrive folders discovered in this fashion allow writing. This means that anyone who randomly scans bit.ly URLs will find thousands of unlocked OneDrive folders and can modify existing files in them or upload arbitrary content, potentially including malware. Microsoft’s virus scanning for OneDrive accounts is trivial to evade (for example, it fails to discover even the test EICAR virus if the attacker goes to the trouble of compressing it). Furthermore, OneDrive “synchronizes” account contents across the user’s OneDrive clients. Therefore, the injected malware will be automatically downloaded to all of the user’s machines and devices running OneDrive.

Google Maps.

Before September 2015, short goo.gl/maps URLs used 5-character tokens. Our sample random scan of these URLs yielded 23,965,718 live links, of which 10% were for maps with driving directions. These include directions to and from many sensitive locations: clinics for specific diseases (including cancer and mental diseases), addiction treatment centers, abortion providers, correctional and juvenile detention facilities, payday and car-title lenders, gentlemen’s clubs, etc. The endpoints of driving directions often contain enough information (e.g., addresses of single-family residences) to uniquely identify the individuals who requested the directions. For instance, when analyzing one such endpoint, we uncovered the address, full name, and age of a young woman who shared directions to a planned parenthood facility. Conversely, by starting from a residential address and mapping all addresses appearing as the endpoints of the directions to and from the initial address, one can create a map of who visited whom.

Fine-grained data associated with individual residential addresses can be used to infer interesting information about the residents. We conjecture that one of the most frequently occurring residential addresses in our sample is the residence of a geocaching enthusiast. He or she shared directions to hundreds of locations around Austin, Texas, as shown in the picture, many of them specified as GPS coordinates. We have been able to find some of these coordinates in a geocaching database.

It is also worth mentioning that there is a rich literature on inferring information about individuals from location data. For example, Crandall et al. inferred social ties between people based on their co-occurrence in a geographic location, Isaacman et al. inferred important places in people’s lives from location traces, and Montjoye et al. observed that 95% of individuals can be uniquely identified given only 4 points in a high-resolution location dataset.

What happened when we told them.

We made several attempts to report the security and privacy risks of short OneDrive URLs to Microsoft’s Security Response Center (MSRC). After an email exchange that lasted over two months, “Brian” informed us on August 1, 2015, that the ability to share documents via short URLs “appears by design” and “does not currently warrant an MSRC case.” As of March of 2016, the URL shortening option is no longer available in the OneDrive interface, and the account traversal methodology described above no longer works. After we contacted MSRC again, they denied that these changes have anything to do with our previous report and reiterated that the issues we discovered do not qualify as a security vulnerability,

As of this writing, all previously generated short OneDrive URLs remain vulnerable to scanning and malware injection.

We reported the privacy risks of short Google Maps URLs to the Google Security Team. They responded immediately. All newly generated goo.gl/maps URLs have 11- or 12-character tokens, and Google deployed defenses to limit the scanning of the existing URLs.

How cloud services should use URL shorteners.

Use longer tokens in short URLs. Warn users that shortening a URL may expose the content behind the original URL to unintended third parties. Use your own resolver and tokens, not bit.ly. Detect and limit scanning, and consider techniques such as CAPTCHAs to separate human users from automated scanners. Finally, design better APIs so that leakage of a single URL does not compromise every shared URL in the account.

This is one of the reasons I’m looking to move away from Google Photos. Previously we could securely share family albums with family based on Google Accounts, but not the whole thing relies on urls being kept secret. They might be HTTPS, and they might be long, but once they’re out, they’re out. How do you share them? SMS? Email? Hangouts? Are they opened on shared devices? I expect better of Google; we could do secure sharing before, why can’t we now? 🙁

If I want to give a link on a PowerPoint slide for people to copy down, the unshortened URLs are hopeless. On many University email systems, the unshortened URLs get truncated at 70 characters and wrapped to the next line in such a way that they’re very hard to put together again. Slightly longer “short” URLs are probably the solution, as each extra character must decrease the probability of your folder being found. Allowing write access should be the exception, not the norm. However, if you’re prepared to send a read-only access short URL over Twitter or email, or give it out verbally, then you’d be a fool to think that you hadn’t already made that folder or document almost completely public. It’d be like sending your email password over Twitter and then complaining that third parties have access to your email account… So this is a bit of a non-story as far as read-only access is concerned. As far as write access is concerned, there is some merit in pointing out that hackers can inject malware too easily via such a system.

Off the top of my head, some contexts where one might reasonably want the ease of a short URL:

Handwritten notes.
Speaking a URL to somebody.
Printing a URL in a physical paper magazine or book (article or advertisement).
A URL in a poster or advertisement in a public space.
Displaying a URL in an image (e.g. PNG file) which you want to spread around.

“Why would I want a short URL in the first place? All the other person will do is click on it, so who cares if it’s not short?”

The issue is how the URL is shared, and whether a long URL *can* be clicked on. A common sharing method is email, and not all email clients handle long URLs well. I also see issues in newsgroups propagated by NNTP, where the newsreader client wraps long URLs on display and breaks them.

My preference when I use a short URL is the older TinyURL service, which has a preview option – click on the preview URL link, and it displays what the real underlying URL is so the recipient can choose whether to open the link.

My takeaway from this is that the issue isn’t the use of URL shorteners – it’s insecure setup in what the shortened URL points to.

The reason they said it wasn’t a security vulnerability is because it isn’t. It’s an obscurity vulnerability. If you rely entirely on URL obfuscation to secure your data, you will be pwned. The traversal method certainly makes discovery easier, but if you require authentication to get to data, it doesn’t matter.

That said, Microsoft and others should use this report to better inform users that sharing links does not protect their data.

I cannot speak to OneDrive specifically, but I have seen services (such as DropBox) suggesting that open but unhindered access is “share with people with whom I have shared the link.” If a user who uncritically follows your interface would rely on the security of something merely obscure, I do think it a security concern if it is not awfully obscure.

public URLs are public, what a surprise. the only security issue if someone understands short url’s as protection. they’re – like the name says – meant to shorten URLs, not for protection. if you want protected data use crypto/passwords

Freedom to Tinker is hosted by Princeton's Center for Information Technology Policy, a research center that studies digital technologies in public life. Here you'll find comment and analysis from the digital frontier, written by the Center's faculty, students, and friends.