Common Issues in Assessing Browser Performance

I’m Christian Stockwell, a Program Manager on the IE team focused on browser performance.

Measuring the overall performance of websites and web browsers is important for users comparing the performance characteristics of competitive browsers, developers optimizing their websites for download times and responsiveness, browser vendors monitoring the performance implications of code changes, and everyone else who is generally interested in understanding website performance.

I thought it would be interesting to follow up on my previousposts on performance with a discussion around some of the issues impacting browser performance testing and the techniques that you can use to effectively measure browser performance.

Measuring Browser Performance

A common way to approach browser performance testing is to focus on specific benchmarking suites. Although they can be useful metrics, it can be misleading to rely solely on a small number of targeted benchmarks to understand browser performance as users perceive it—we believe that the most accurate way to measure browser performance must include measuring real-world browsing scenarios. Measuring real sites captures factors that are difficult to isolate in other benchmarks and provides a holistic view of performance. Testing browsers against real-world sites does, however, introduce some key challenges and this post discusses some of the mitigations we’ve adopted to effectively measure IE performance as part of our development process.

Before delving too deeply into this post I wanted to say that effective performance benchmarking is surprisingly difficult. The IE team has invested a great deal of effort building a testing and performance lab in which hundreds of desktop and laptop computers run thousands of individual tests daily against a large set of servers, and our team rarely ends a day at work without a few new ideas for how we can improve the reliability, accuracy, or clarity of our performance data.

Part of the challenge in measuring browser performance is the vast number of different activities for which browsers are used. Every day users browse sites that cover the gamut from content heavy sites like Flickr to minimalist sites like Google. They may encounter interactive AJAX sites like Windows Live Hotmail or purely static HTML sites like Craigslist. Still others may use their browsers at work to use mission-critical business applications.

The performance for each of these categories of sites is often gated by different browser subsystems. For example, an image-heavy site may depend on the speed with which the browser can download and decompress images. In contrast, the performance of a simple site may be predominantly a factor of how fast browsers can render HTML. In another twist, AJAX website performance can be a factor of how tightly the JavaScript engine, CSS engine, and DOM are integrated—rather than the individual speed of any of those individual subsystems. When third party controls like Flash and Silverlight enter the equation, performance is often related to how efficiently the control integrates itself into the browser.

I expect that some of the approaches I discuss here will lend more context to the performance work we’ve done for IE8 and give you some insight into our engineering process. Above all, I hope that this post gives you ideas for improving how some of you measure and think about browser and site performance.

Caching Behavior

All browsers are inherently dependent on the network and any tests need to reflect that reality to adequately measure performance.

One aspect of the makeup of the internet that can impact browser performance measurement is how content is stored at various levels throughout the servers that comprise the internet. That storage is called caching.

With regards to browser performance measurement, what it means is that when you visit www.microsoft.com, your browser may request that content from several servers in turn—from your corporate proxy, from a local server, or from a broader set of international servers.

In a similar vein, when measuring the performance of several browsers it’s important that we consider the impact of caching. For example, if I were to open ten tabs to ten different websites in one browser, and then open the same ten tabs in a second browser I could wrongfully conclude that the second browser was faster when in fact the difference was due primarily to the content being stored by a nearby server when the first browser requested the pages.

It’s hard to rigorously control how servers may cache content but one general principle of performance measurement is to never only measure anything once. Unless you are specifically trying to measure the impact of upstream caching you should navigate to the sites you want to measure at least once before you start collecting any performance data. In fact, since proxies can cache content per user agent (browser), you should visit each site you intend to test against with every browser you will test.

My summary of caching behaviour is simplified. If you’d like more detailed information many great resources exist that describe the process in greater detail, including the HTTP protocol specification itself. The HTTP protocol spec also makes great nighttime reading and is a conversation starter at any party.

Sample Size

Precisely because there are so many external factors involved in browser performance the number of performance measurements you take can drastically change your conclusions.

I’ve mentioned that a general principle in performance measurement is to never measure anything only once. I’m going to expand that principle to “always measure everything enough times”. Many different schemes exist to determine what “enough times” means—using confidence intervals, standard deviations and other fun applications of statistics.

For a lot of the performance data we collect we often find that adopting a pragmatic approach and avoiding those relatively complex schemes is sufficient. In our lab we find that 7-10 repetitions is usually enough to collect a reliable set of data and identify trends, but you may find that more repetitions are needed if your environment is less controlled.

Once you’ve collected your performance data you will likely want to summarize your results to draw conclusions. Whether you use the arithmetic mean, harmonic mean, geometric mean, or some other method, you should be consistent and fully understand the ramifications of how you are summarizing your data.

For example, let’s look at the following data points collected by testing two browsers navigating to a single webpage:

Browser A

Browser B

Sample 1

1.0

2.0

Sample 2

1.0

2.0

Sample 3

1.0

2.0

Sample 4

1.0

2.0

Sample 5

10.0

4.0

Arithmetic Mean

2.8

2.4

Geometric Mean

1.6

2.3

Harmonic Mean

1.2

2.2

In this contrived example it’s clear that how you summarize your data can change your interpretation of the data—whereas the arithmetic mean suggests that Browser B is faster than Browser A, both the geometric and harmonic means would lead you to the opposite conclusion.

Bandwidth Competition

Sharing your network with other users means that—seemingly without rhyme or reason—your web browser may suddenly take much longer to perform the same action.

One benefit of working for a very large company like Microsoft is that the large number of employees makes certain phenomena reliable and measurable. For example, measuring page download times over the course of the day it’s clear that most of Microsoft starts working in earnest between 8am and 9am, and leaves between 5pm and 6pm.

The reason that I can tell that is that most Microsoft employees are accessing the network fairly constantly over the course of the day. Whether we’re browsing MSDN, reading documents on sharepoint, or rigorously testing the latest xbox games, we’re all competing for bandwidth. That sharing means that if I measure browser performance at 6am I will reliably get more consistent results than if I measure browser performance at 9am, when the entire company is getting to work and starting to email away.

Given the wide variety of networking configurations available in different companies it’s hard to predict the impact of bandwidth competition. To avoid having it distort your results I suggest that you try to collect performance data outside of core business hours if you’re collecting performance data at work.

If you’re collecting performance data at home you can similarly be sharing bandwidth with your family or other people nearby. In those cases you could time your measurements during times when fewer people are likely to be browsing—during business hours, late at night, or very early in the morning.

Resource Competition

Sharing resources across applications on your own machine can affect browser performance just as severely as competing for bandwidth.

Testing two browsers side-by-side can produce the most distorted set of results. For example, on the Windows platform there is a limit of ten outbound socket requests at any one time; additional requests are queued until connection requests succeed or fail. Testing two browsers side-by-side means that you are likely to run into that limit, which could result in one browser maintaining an unfair advantage by virtue of having started microseconds sooner.

I’ve offered up two simple examples and others certainly exist. Without going into far greater detail I think it’s clear that I advise against running multiple applications when trying to measure browser performance.

At a minimum you should take two steps to reduce the chance of interference from other applications:

Close any open applications, including those that may only appear in your notification area to the right of the taskbar. This is particularly important for other applications that are likely to use the network.

In a command window, run the following command to reduce system activity while you are testing: %windir%\\system32\rundll32.exe advapi32.dll,ProcessIdleTasks

Server Interactions

Beyond interference from shared resources on your machine or on your network, your performance results can also be impacted by the internal behaviour of the servers you are visiting.

One of the overarching principles when taking performance measurements is to try to maintain a common state between your tests. For cache management that meant you should give upstream servers a chance to reach a known state before collecting performance data, whereas for the network it meant trying to conduct your tests in a consistent environment that reduces the impact from external sources.

For an example of the application design characteristics which may impact benchmarking, let’s take the example of an online banking application. For security reasons some banking applications only provide access to account information when appropriate credentials are provided. Assuming the benchmarking test is trying to compare two (or more) browsers at this online banking Web site, it’s important to ensure the application is in a consistent state for each browser. By design, most online banking applications will prevent a user from being logged in to two sessions at the same time – when one is logged in, the other is logged out. Failure to reset the Web application state before starting the test on the second browser could cause the server based application to take extra time to analyze the second request, close the first session and start a new one.

That setup and teardown process can impact benchmarking and is not limited to online banking applications, so you should try to remove it as a factor. More generally, you should understand how your sites behave before using them during performance testing.

Observer Effect

In many fields there is the potential that the action of taking a measurement can change the thing that you are trying to measure—that phenomenon is called the Observer Effect.

You can use any of a number of frameworks to simplify the task of measuring specific browsing scenarios. These frameworks are typically aimed at developers or technical users. One example of such a framework is Jiffy.

As with any infrastructure that may directly impact the results you are trying to measure you should carefully assess and minimize the potential for introducing changes to the performance due to the framework you are using for measurement.

As an aside, the IE team uses the Event Tracing for Windows (ETW) logging infrastructure for our internal testing as it provides a highly-scalable logging infrastructure that allows us to minimize the potential for the Observer Effect to distort our results.

Machine Configurations

Just as with humans, no two machines are exactly alike.

As I mentioned above, within the IE performance lab we have a very large bank of machines that are running performance tests every hour of every day. To maximize our lab’s flexibility, early in IE8 we attempted to create a set of “identical” machines that could be used interchangeably to produce a consistent set of performance data. Those machines bore consecutive serials numbers and were from the same assembly line, and all their component parts were “identical”. Despite those efforts, however, the data we collected on that set of machines has been sufficiently varied that we avoid directly comparing performance results from two different machines.

It should come as no surprise, then, that I suggest that unless you want to study how browser performance varies across different platforms you should test all browsers on a single machine.

Cold Start vs. Warm Start

The amount of time it takes to start a browser can depend on many factors outside of the control of the browser.

As with caching, measuring the speed of browser startup is susceptible to outside factors—particularly the first time you start the browser. Before the browser can start navigating to websites it needs to load parts of itself into memory—a process that can take some time. The first time you start the browser, it is difficult to know exactly how much may already be loaded into memory. This is particularly true for IE since many of its components are shared with other programs.

To collect more consistent data, open and close each browser at least once before you start testing against them. If you have no other applications running that should give your operating system the opportunity to load the required components into memory and improve the consistency and accuracy of your results. It should also provide a fairer comparison between browsers, especially in light of features like Windows Superfetch that may otherwise favour your preferred browser.

Website Content

Websites change constantly. Unfortunately, that also includes the time when you are attempting to test performance.

Within the IE team’s performance lab all website content is cached for the duration of our testing. One impact of that caching is that we can ensure that exactly the same content is delivered to the browser for each repetition of a test. In the real world, however, that is often not the case.

News sites, for example, may update their content as a story breaks. Visiting Facebook or MySpace twice may result in radically different experiences as your friends add new pictures or update their status. On many websites advertisements change continually, ensuring that any two visits to your favorite site are going to be different.

Outside of a lab environment it is hard to control that type of change. Approaches certainly exist, and you can use tools like Fiddler to manipulate the content your browser receives. Unfortunately, those approaches stand a very good chance of affecting any performance results. As a result, the pragmatic solution is to follow the advice I’ve outlined in my point on sample sizes above—and if you notice that a very heavy advertisement is appearing every few times you visit a page, I think it’s fair to repeat that measurement to get a consistent set of results.

Website Design

Not only can websites change under you, but site authors may also have written drastically different versions of their website for different browsers.

One tricky spin on the problem of ensuring that websites serve up the same content for each of your tests are those sites that serve distinctly different code to different browsers. In most cases you should ignore these differences when measuring browser performance because those are valid representations of what users will experience when visiting different websites.

In some cases, however, websites can offer functionality that differs so widely between browsers that the cross-browser comparison is no longer valid. For example, I was recently investigating an issue where one of our customers was reporting that a website in IE8 was taking several times what is was in a competitive browser. After some investigation I discovered that the website was using a framework that provided much richer functionality in IE than in the other browser. Fortunately the website was not relying on any of that richer functionality so they were able to slightly modify how they were using the framework to make their site equally fast across browsers.

In that example the website was not using the extra functionality offered by their framework and they were able to update their site—but in many cases websites offer completely different user experiences depending on the browser. Assessing those websites is largely a case-by-case affair, but I typically consider those sites unsuitable for direct comparisons because their performance reflects the intentions of the site developers as much as the performance of browsers.

Identifying sites that differentiate between browsers is not simple, and in this case web developers generally have an upper hand on reviewers. Web developers should use profilers, debuggers, and other tools at their disposal to identify areas in which their websites may offer drastically different experiences across browsers.

Reviewers and less technical users should avoid measuring cross-browser performance on sites that clearly look and behave differently when you try to use them, since in those cases it is difficult to disentangle browser performance from website design.

“Done”

Can you define what “a webpage is done loading” means? How about for a complex interactive AJAX site?

One surprisingly intractable issue in performance measurement is defining what “done” really means in terms of navigating to a webpage. The problems involved are compounded as websites grow increasingly complex and asynchronous. Some web developers have used the HTML “onload” event as an indicator of when the browser has finished navigating to a webpage. The definition of that event is, unfortunately, interpreted differently across different browsers.

Within the IE team we use some of our internal logging events to measure page loads across sites. Since that logging is IE-specific it does not, unfortunately, provide an easy cross-browser solution to measure page loading performance across browsers. And, although cross-browser frameworks like Jiffy and Episodes exist that can help site developers define when their scenarios are “Done”, those indicators are not yet widely consumable by users at large.

Beyond specific code-level indicators some people use browser progress indicators to assess when a page is finished downloading—hourglasses, blue donuts, progress bars, text boxes, and other UI conventions. These conventions, however, are not governed by any standards body and browser makers can independently change where and when (and if!) they are displayed.

Faced with those realities, the pragmatic approach I encourage reviewers and users to adopt is to use browser progress indicators while validating those indicators against actual webpage behaviour. For example, when you are testing how quickly a particular web page loads, try to interact with it while it is loading for the first time. If the webpage appears to be loaded and is interactive before the progress indicators complete then you may want to consider ignoring the indicators and using the page appearance for your measurements. Otherwise, the progress indicators may be enough for an initial assessment of how quickly a page is downloading across various browsers. Without validating that the actual page load corresponds closely to the browser indicators it is difficult to understand when they can be trusted for performance measurement.

Browser Add-ons

Running an add-on when you are testing a browser means that you are no longer only testing browser performance.

As I discussed in my April post, add-ons can have a tremendous impact on the performance of a browser. In the data I receive through Microsoft’s data channels it is not uncommon for me to see browsers with dozens of add-ons installed and I suspect that my colleagues at the Mozilla corporation could say the same of their browser.

Any of those add-ons may be performing arbitrary activity within the browser. Illustrating that impact by way of an anecdote, I’ve noticed that some users with a preferred browser sometimes find any alternative browser faster simply because it comes with a clean slate. For example, a Firefox user with several add-ons installed could move to IE and observe enormous performance improvements while an IE user could migrate to Firefox and observe the same performance benefit. Those results are not contradictory, but rather reflect the significant impact of browser add-ons.

As a result, in our performance lab we test both clean browser installations as well as with the most common add-ons installed. To disable all add-ons in IE8, click on the “Tools” menu and select “Manage add-ons”. In the Manage Add-ons screen ensure that you’ve chosen to show “All Add-ons”, and disable each listed add-on. Alternatively, if you are comfortable with the command line you can run IE with add-ons disabled with the “iexplore.exe -extoff” command.

Since most browser makers go to great lengths to ensure that add-ons continue to work as expected when upgrading, taking the time to follow these steps is particularly important when evaluating new versions of browsers as any performance improvements may be hidden by a single misbehaving add-on.

I know this post has been quite long, but I hope that by covering a few of the techniques we use when measuring IE performance you will be able to adapt some of them to your particular needs. Understanding how we think about performance testing may also give you a better understanding of our process and our approach to browser performance. Last but not least, I hope that I’ve given you a little more insight into some of the work going on behind the scenes to deliver IE8.

on the new IE 8 it dos not show the hole forcast discussion olny parts on it all so i like to point out a other bug when evere you hit AFDSTO onn the top or any other that you see on there it dos not take you too that page that is two things that i have noted so far with the new IE 8 do you have a fixs comeing out soon or some in ???

4- Nowhere in your post did you address the issue of deep (and very deep) nesting of containers and elements and its impact on performance (page load time). On the web, there has been and still is a tendency (sometimes created by so-called web authoring softwares) to have leave-nodes at very deep levels of DOM tree and such affect browser performance (parsing time, rendering time). Some of us do not dispose of fast cpu, lots of RAM and performant video card

5- Proliferation of nodes in DOM tree: it takes time, cpu, RAM to process a webpage filled with dozens of <br>, spacer.gif of various sizes (width and height), with typical "divitis" design, with typical "classitis" design. Such whole "tag soup" can also furthermore affect performance if served with multiple external stylesheets, many external script files, flash-based parts of webpages, animated gifs.

Are there any numbers you can share? I think numbers always speak well.

Is there a benchmark you can publish and describe the methodology? As you point out, each browser developer may pick a different set of metrics; it would be great to see how you measure IE8 and how it compares to other browsers.

Still IE behave strangely in unusual situations. Visit a big document library of a sharepoint site. MOSS 2007 generates a huge nested table structure, and IE is several times slower than any other browser we tested. Our premier support posted the problem to IE team and MOSS team and nobody listens.

In IE7 and 8, if IE hadn’t been recently loaded, I can start IE and type a site in the address field and press enter before IE’s *completely* loaded. When doing this, the site doesn’t load and IE wipes out the address field and I have to type it in all again. It’s as if the address field isn’t fully functional yet.

In IE6, this was not a problem. Seems like it started happening once IE got tabs.

It’s a major performance concern when the user can beat IE’s address field to the punch.

I guess it’s not too bad in itself, but it’s a regression and worked fine in IE6, so something got messed up.

What would be extremely useful at this point in time for IE8 would be to have an "unsupported" alert option for what JavaScript objects or what not aren’t supported. I’m not sure what I have to fix on the editor on my site to get it to work (or if it will) in IE8 and there are no errors when I attempt to use it in IE8…

Thats the outbond half-open connection limit, it was added since XP SP2 as a way of combating botnets and such (so they can’t bomb thousands of computers at the same time). It’s a pain in the ass for filesharing apps, and I’m not sure it was effective at all against malware, since there are multiple methods for turning the limit off.

All that said at the end of the day, creating greater performing browsers in terms of their ability to perform well at some very specific benchmarks, allows greater freedom for website developers to do more things.

I know IE8 doesn’t perform badly with current website models, but creating much better performance should be about allowing developers to do more things. So sure, current non-IE browsers might be in a bit of a stupid content of who completes sunspider faster or whatever, but it will allow developers to do more with their web pages if browsers are ultra competitive.

Though, I completely agree, you can look at performance from so many different angles that it’s almost subjective as to which browser is faster. It’s nice to see the IE team is taking performance very seriously and look at it from a balanced view point :-). That said, a JIT based JavaScript engine and things like high performance SVG and SMIL adoption would be a wonderful wonderful thing for web development a whole.

This is a little off-topic, but I hope the IE team can answer this question anyway.

The one thing I never liked about IE7 is the way RSS feeds are aggregated as a webpage, but not as a long list of titles ala Firefox. If I want to read the articles on an RSS feed for a blog or something just as a webpage, wouldn’t I just visit the webpage itself? With IE7 could you please provide the option to view the RSS as a long list of titles just as with Firefox (even better make it the default option)? Anywho, looking forward to IE8, so please keep up the good work.

I kept reading and reading to see if you’d mention IE’s biggest performance weakness (IMHO), and you did not.

When, oh when, will the IE team address the issue of dynamic manipulation of DOM, especially when the page is crowded with many DOM objects?

Take any page that allows the user to drag a DOM object around a page, and compare the performance of IE with that of Chrome.

There is no comparison: Chrome acts like the DOM object is "stuck" to the mouse, and IE8 acts like the object is herky-jerky, trailing behind the mouse on a string.

Caching and all those other things are wonderful — address network issues — but how about improving the user experience on the DESKTOP, and utilizing all that horsepower provided on new computers, both graphics and processor?

@Dan – meet Ted. Ted meet Dan. Now both of you go grab a coffee and chat in the park somewhere so the rest of us can talk about fixing IE, not claiming that it is a highly polished work of art or denying anything to the contrary.

I’m with Speednet. This sounds like a bunch of PR talk to cover the: "we changed IE to handle tabs and some proper CSS and DOM manipulation… but since it was a hack job, and we still keep the horribly broken IE around, the performance tanks as a result. If we throw in enough red herrings to explain why IE might be slower when its a Tuesday, and raining, and the moon is full, and you hold the mouse in your left hand, we might buy enough time to get this fixed in IE9 before people start to peel back the covers and see the FUD.

@Jens: Implementing prefetch properly is quite tricky, because the bandwidth being used for prefetching might be better used by another tab or application. You wouldn’t want to "prefetch" things the user isn’t going to use because it could slow the rest of the system down.

As it stands, it’s already possible to use prefetch in *any* browser: Just set up a hidden IFRAME pointing to a page the prefetches the desired resources. This is how, for instance, Outlook Web Access works. While the user is typing their login info on the login screen, there’s a hidden IFRAME downloading all of the images/etc that will be used by the application once the user clicks "Login".

Have you fixed the incompatability with McAfee Security Center. I have beta 2,and when I try to open McAfee Security Center, I only get a blank window. Supposedly, this is due to an imcompatability between Beta 2 and McAfee. I heard that you were working with McAfee to fix the problem. Have you fixed this in the latest builds, or does McAfee have to fix it themselves?

really, despite how you measure it, IE is still the slowest compared to ANY competition out there. Despite how you measure performance any way you like, you still can’t show even ONE instance where IE beats a competitor. LOL.

When I can open an IE Tab in less time than I can in Firefox, Chrome, Safari or Opera, then… and only then will I even consider IE to be faster.

At the moment IE loads a new tab in about the same amount of time it takes me to go make a coffee, add cream, add sugar, stir, and return to my desk.

Since there is **NOTHING** to load in this new tab it drives us users nuts waiting for IE to become responsive.

To add insult to injury when the new tab finally loads, it loads a bunch of useless links and my clipboard. Switching to default to about:blank helps a bit, but having about:tabs as the default was a clear mistake considering the lack of performance when opening a new tab to begin with.

Since this post came out **BEFORE** the RC1 release was this to calm the waters before we discover that it is even slower than PR1 was? I get this sinking feeling we’re all in for a world of disappointment when installing RC1.

@quiteanicefool: This post wasn’t meant to be a literacy test, but you decided to announce your failure here anyway?

@bradley: You must not have been reading the blog for very long. Either you’re running a Pentium 100mhz, or you have a buggy plugin. My laptop (3 years old) easily loads new tabs in well under 1 second.