Previous posts have covered trustworthy principles in general and some productspecifics as well. Privacy is an important part of trustworthy computing. This post discusses one aspect of privacy on the web: third-party content.

When most people browse the web, they think what they see in the address bar and the site they are visiting are the same thing. However, web sites today typically incorporate content from many different web sites. For the sake of clear terminology, the site the user browses to directly (seen in the address bar) is the first-party site; the other sites that the first-party site incorporates in its site experience (but that the user hasn’t navigated to directly) are third-party sites.

When you browse to a first-party site, you know that it can collect information about how you use the site. What many users don’t realize is that technically, third-party sites can collect information about users as well. Users aren’t typically well-informed about which third-party sites are collecting what information, how the sites use this information today, or how the sites could use the information in the future.

Identifying Third-party Sites

Most websites today are actually mosaics, or mash-ups, of several different sites. To see this, you can bring up the Privacy Report in Internet Explorer (from IE7’s Page menu or IE6’s View menu, choose the Web Page Privacy Policy menu item) for any site you visit. Here’s part of the report for a news site, and another from a credit card site:

While the address bar shows the address of the current, first-party, site, this dialog shows the addresses of all the different web sites (including third-party sites) that the current web page includes content from. The browser visits every one of these sites in order to show the current web page’s content.

The way that sites can pull content in from other sites is useful and powerful and typical on the web today. It’s part of the underlying design and structure of the web, and enables functionality (like an interactive map in the middle of a restaurant’s website, or a “share this” link in the middle a news article) that people value.

Third-Party Sites and Privacy

At the same time, bringing information together from different websites has privacy implications. A good example of this issue that most people have experienced involves email. Many email systems treat email messages that come from unknown senders in a special way, blocking images in them and displaying a warning like this one:

The message body typically has some missing images (“red X’s”) with text nearby, like “Right-click here to download pictures. To help protect your privacy, Outlook prevented automatic download of this picture from the Internet.”

Why do email systems block these external images? The sender may have programmed some information in the external image that is ­unique to the recipient – for example, having the image’s file name or location include the recipient’s email address. When the sender sees that a particular image was downloaded, then the sender knows which email message arrived in a valid account and was opened. By not downloading the content, the email recipient prevents his email system from disclosing information and protects his privacy from the unknown sender. Potentially, the recipient protects himself from more unsolicited email.

In general, every piece of web content that a computer requests from a website discloses information to that website. This basic technique enables a third-party site to track visitors across different first-party websites that include content from the same third-party. When several websites show content (like a syndicated photo or article) from the same third-party website, that third-party site can determine which of the websites a particular visitor has browsed to.

For example, say two totally unrelated sites, Site1.com and Site2.com, both include images from MySyndicatedPhotos.com. The user browses to both Site1.com and Site2.com, and the user’s browser calls MySyndicatedPhotos.com in order to get the images these sites include. MySyndicatedPhotos.com can figure out (by various means) that the same machine visited these two different sites.

As the user visits additional sites that show content from this same third-party site, this third-party site is in position to build a profile of the user’s activity across the different sites that include its content.

While cookies can definitely contribute here, and there’s been long-standing concern and confusion about “tracking cookies,” the fact is that any content coming from a third-party site can function like a tracking cookie. The intent of the content (a photo, article, logo, or site-specific analytics; image, text, or script) is technologically irrelevant to its potential use as a tracking mechanism. Note that even if the user had blocked all cookies, other content on third-party websites could still be used to build a profile. Third-party content isn’t inherently good or bad; it’s just technically possible to use it this way.

Actually Happening or Just Technically Possible, and Other Questions

To be clear, this post is about what a website can do when several other websites use content from it. It’s not what all third-party sites actually do when other sites refer to content on them. What is actually done with the available information is up to the third-party site, and in some ways very hard for anyone else to figure out. The third-party site could have a clear, well-written, and prominently posted privacy policy that guides its operations. It might not. The site could have an employee who loses a laptop with the data collected, or has malware on his machine and discloses collected information against policy. The site could have business arrangements with other sites that involve pooling data.

Also, this blog post isn’t meant as a technical deep-dive on the techniques sites can use to track users, or the different counter-measures technically-savvy users might take to avoid being tracked. The common technical theme here (as described above in the email case and here) involves ways that first-party sites enable information that can uniquely identify site visitors to flow to third-party sites. For example, many of the web addresses you’ll find in the Web Page Privacy Policy dialog are often quite long and contain unique identifiers. There are better discussions of this topic elsewhere. For example, a recent IRC discussion about developing new standards for rich websites covered aspects of this topic. While it’s quite long, some parts are very relevant, like this one (that people “are being tracked whether they send cookies or not”) and this one (“anyone who wants to track people across the web can trivially do so at this point, even without cookies…. you can pretty easily ‘fingerprint’ people through things like their user-agent string, ip address, screen size, other js- and http- accessible prefs, etc and then with a simple set of analysis scripts you can easily work out who is who just look at the ‘anonymised’ search query string data aol released”).

Web browsing isn’t anonymous or perfectly private even without third-party sites. For example, the provider of Internet access (to a person’s home, hotel room, café table, or desk at school or work) can observe where the computer goes on the Internet. These providers typically provide terms of use, so users have clear notice and can choose to accept or decline connectivity under the stated terms. Any software running on the user’s machine can determine the websites the machine has visited; this is the basis of features like History, or toolbars that copy a user’s browser history up to the web so users can get at it from different machines. Again, terms of use and privacy policies are important tools here for users. The websites a user visits can determine information about the user (for example, the user’s likely location). Also, users give the sites they visit information directly in terms of what they click on and choose to do.

Third-Party Sites and Trust Issues

Given that web browsing isn’t anonymous and in some ways this is “how things work” on the web, what exactly is the trust issue? For many people, trust begins with security. The security risk here is plain: visiting one website exposes the user to potentially malicious content from other websites. The user visits one site and sees content on it that seems trustworthy (it’s on the site!) but actually comes from a different source. Finding examples of this problem on the web isn’t hard; it’s happened to visitors of several toptierwebsites.

Trust includes privacy as well. The privacy concern involves users having a choice, and being able to exercise control about what information they share. Today, users are not in control of which websites can get information about their browsing activities. As a result, web sites that users aren’t aware that they’ve visited and don’t have a well-defined relationship with are in position to build a profile of the users’ browsing patterns.

A guiding principle for Internet Explorer (and Microsoft overall, as part of Trustworthy Computing) is that the user should be in control. Consumers have come to expect security protections from their browsers, and are starting to have higher expectations about privacy protections as well. Control here means that users have clear notice and can tell what sites they may be disclosing information to and under what terms. Control also means that users can exercise choice about what information they disclose to whom. Preventing information disclosure means blocking content; blocking content creates a possible impact to the appearance and functionality of the page.

Beyond these issues, accountability is a question here as well. When a user visits one site after another, and each one includes some third-party content, who is accountable and who takes responsibility for the information collected about the user? On today’s web, that’s not at all clear.

The privacy and trust issues around third-party content are complex and important. As discussed in this blog before, trustworthy browsing involves many industry challenges, and, like many other efforts (e.g. interoperability), requires cooperation and trade-offs. Web privacy involves more than just blocking cookies. Enabling users to be in control starts with making users aware of the issues. In another post, we’ll cover IE8 functionality that helps users stay in control of their information.

I’m not sure if this article is introducing a new policy, or just a new feature that lets people see what third party sites are present in the current page. Are MS going to block third party cookies by default? It’ll harm your own MSN web properties if you do. If advertisers aren’t able to track their ad performance (which is the only reason advertisers do tracking, they’re not interested in individual users) then they’ll become less effective places to advertise.

I currently have IE6-System IE7-Standalone installed for clarification (been needing to do some backwards compatibility testing of late)…

Looking at the options in IE7’s slider bar I’m a bit confused, what is a compact privacy policy? I’m not sure what IE8 has off hand, will take a second look when B2 comes out.

I think IE8B1 (when I had it installed) was not able to open my P3P’s URL. By chance has this issue been addressed in B2? I can wait and will try to remember to test it out when B2 comes out and if it hasn’t been resolved I’ll file a bug report.

I’m a bit confused as to what Microsoft constitutes as third party these days.

Recently my company began doing research as to whether or not we could offer the custom modules that we’ve developed for our WYSIWYG editor as widgets for use. To do this we were going to offer an iframe that contains a reference to a page on our server that outputs the necessary HTML/CSS/JavaScript to run the module. The iframe/HTML/CSS/JavaScript would be a third party domain to be sure, but shouldn’t say the JavaScript that is contained within the iframe be able to reference the iframe’s document body (but obviously not the parent)? From what I’ve seen, Firefox, Opera, and Safari say yes because the JS and the iframe’s document are on the same domain, but IE reports that document.body does not exist. The only thing that it sees are the HTML, HEAD, TITLE, and SCRIPT tags.

I understand that if they are on another domain that the iframe should absolutely not be able to reference content on the parent window, but what purpose is served from blocking access to the iframe’s own body?

Can you improve the UX? In the first picture, it says "some cookies" were blocked, but that’s the only way to distinguish it from the second. How about an icon or indicator that I was protected?

Also, it would be great to be able to see which cookies were blocked, and which were let through. It would be *awesome* to be able to control those cookies right there, maybe with a right-click or a checkbox.

I think this is a good thing as I believe end-users are probably being exploited more than they would be comfortable with if they were only to realise what was going on.

As a fairly tech-savvy person myself I’ve already taken a number of steps to reduce these possibilities, but believe the majority of end-users out there would probably have no clue, and thus no real defense against such potentially exploitative methods.

Any word on if the third-party (affiliate-type) cookies will be blocked by default? Jeez, that would be a huge blow to the affiliate industry. Irrational to strike at one of the largest economy blocks of the Internet… I don’t quite follow MS’s thoughts here.

In this installment of our Privacy Solutions Series, we’ll be taking a look at the privacy-related features in the most popular browser in use today, Microsoft’s Internet Explorer. Specifically, we’ll be examining the most recent version of the browser,