A New Blind Spot for Web Performance Tools

Web performance tools rely on actual browsers to capture the performance data of a webpage and any requests made by a webpage. Monitoring from the browser, comes with some limitations due to the complexity of the browser and the internet, causing at times what we call “Blind Spots”. A “blind spot” occurs when the data provided by a tool lacks clarity.

The main “blind spot” with external monitoring is that you cannot always distinguish between network and application performance. At Catchpoint we introduced Catchpoint Insight and other features to remove this limitation and facilitate the understanding of the performance factors.

Recently we came across another “blind spot” related to monitoring tools built on top of Internet Explorer. We internally refer to it as “Objects in IE might be faster than they appear”.

It all started when a client of ours engaged us in a performance study regarding the impact of their Iframe tag on the pages of one of their clients. Their client was observing network times on the Iframe call that were quite high on an IE7 based performance monitoring service. We were able to reproduce the problem on the webpage with various tools like HTTPwatch, DynaTrace Ajax, Webpagetest, IE9 Developer tools, and even in the Catchpoint IE monitor.

The performance numbers we observed made no sense! The response content of the Iframe URL was less than 1000 bytes (it fits in a single TCP packet), yet the tools were displaying 500ms+ for the time from the 1st byte to last byte of the HTTP Content. The only way this could happen is if there was something wrong at the TCP level and packets were fragmented and lost.

To ensure it was not an issue at the TCP level, we utilized Wireshark to capture the TCP packets as we were monitoring the pages with the other tools and mapped the data from Wireshark to the metrics displayed in the tools. The data confirmed that the URL content was delivered always in a single packet and the URL response was less than 100ms. However, the monitoring tools built on top of IE still showed that 1st to last byte was 500ms or more for the same request! Clearly a new “blind spot” with IE!

Since we proved it was not the network, the only other possibility was that something happened during the browser execution! We looked through the 20+ JavaScript files referenced on the webpage and we determined that the page executed JavaScript code when the DOMContentLoaded event was reached. The event is not native in pre IE9 browsers, and the page relied on one of two solutions: “doScroll()” or “script defer” to approximate when the event has been reached. Once the event was fired, JavaScript on the page made DOM modifications that were time consuming. However, this JavaScript execution time was not being displayed on the tools as gap.

To test what was happening, we created several simple pages that contained an Iframe pointing to URL. The pages also contained a JavaScript that created 10,000 spans and appended them to a DIV on the page. The JavaScript execution would vary on each page and rely on:
1. the “doScroll()” method to detect DOMContentLoaded and execute
2. the “Script Defer” method to detect DOMContentLoaded and execute
3. the native DOMContentLoaded for IE9 to execute the script
4. inline execution below the Iframe tag

In all four test cases we observed that all the tools, including IE9 developer tools, always included the time of the JavaScript execution to the network time of the Iframe request! We replicated the test cases with an image in place of the Iframe, and were unable to reproduce the same results. Interestingly the issue did not occur on Firefox and Chrome on Windows – but both clearly showed there was JavaScript executing and delaying rendering of the Iframe content.

We believe the problem occurs due to the fact that the browser executes JavaScript in a single threaded mode and it takes precedence over the Iframe creation. The monitoring tools are relying on the browser to tell them when the Iframe is complete, but the IE browser does not mark the Iframe complete until the JavaScript execution is complete. Hence, the JavaScript execution time is included in the Iframe response time!

This means that monitoring tools relying on Internet Explorer might append the time to execute JavaScript to the Iframe request time, if the JavaScript executes right after the Iframe request starts. This does not mean that the server serving the Iframe is slow and it does not mean that the Iframe slowed down the page. It simply means the JavaScript time was incorrectly attached to the iframe request. So the next time you see a very slow request in a monitoring tool, try the request standalone to ensure it is the request and not something else on the page.

At Catchpoint we understand such “blind spots” have an impact on our users, therefore we have already started development work to address this issue on our waterfall charts. Our IE based monitor will be able to clearly distinguish between the network request time, and the JavaScript execution time.

Post navigation

You Might Also Like

Comments (6)

FWIW, the problem isn’t specific to iFrames. IE services network requests on it’s main thread (or synchronizes to it’s main thread) so if the main UI thread of the browser is busy, any network activity going on at that particular time can look like it took longer than it really did. It DID still take the browser that long to actually process the request but you could end up trying to solve the wrong problem if you only look at the waterfall. This also isn’t likely to be an IE-only phenomenon but should show up in any tools that are looking at the network traffic from the browser’s perspective.

One of the BIG clues on WebPagetest is the CPU utilization line (and the network traffic line). If you see the CPU utilization spike and requests are taking a long time then you really need to look into what is causing the CPU spike. Capturing a dynaTrace session of the test is usualy the fastest way to find the offending Javascript or page code (and you can run a dynaTrace profile run directly from WebPagetest so you get the waterfall and profile data together). You can also capture a tcpdump from WebPagetest directly but that’s not for the faint of heart.

At some point I would like to mash the tcpdump data against the browser-level data to “fix-up” any network requests that look like this but it’s not very high on the priority list (and at some level you still care about when the browser actually processed things so I’m not sure yet how it should be represented in the UI).

We ran the same tests with image tags and style tags, and we were unable to reproduce the issue on Internet Explorer with any of the tools. This does not mean that it does not exist, but maybe is less of an issue or hard to replicate the same scenario on the main browser thread.

The issue also does not occur on Chrome with the Developer Tools, or if it does occur is in 1-10 milliseconds versus 100s of milliseconds. On Firefox with HTTPWatch we were able to replicate it if the iframe content was cached.

Dynatrace does capture the JavaScript activity correctly in our test cases, so it was easy to identify. However, in the original client pages with the DOMContentLoaded workarounds we were unable to clearly figure out it was the JavaScript. In some captures you would see the JavaScript execution parallel to the request – in other cases you wouldn’t.

The main reason for this article is inform users of such tools, including ours, that what they see in the tool is not always the network activity – and is impacted by other factors in the page. Therefore they shouldn’t be putting at fault the server behind the iframe just on a waterfall result – but take the time to confirm it is that iframe (either via testing iframe stand alone, or Dynatrace, or TCP Dump).

Early versions of Firebug Net Panel and Speed Tracer had similar bugs where a long-executing (non-reliquishing) JS execution would produce network times that were too long. I found and reported bugs for those tools and others. But I’m surprised that this happened in tools that are not written in JS – like HTTPWatch and WebPagetest. I guess, tho, the issue is the same. IE is producing network events and the long JS execution is causing either the production of those network events or the consumption of same to be blocked. The fix in Firefox was to add the event epoch time to the event object data, so that it didn’t matter when the event was consumed.

Interesting find. One saving grace is that hopefully web sites don’t have JS threads that execute for 2 seconds without relinquishing control. All JS developers should keep in mind that their JS should never execute for more than a few hundred milliseconds, and use callbacks to relinquish control. Screwy waterfall charts isn’t the worst side effect – that long-executing JS is also blocking all UI updates.

[…] A New Blind Spot for Web Performance Tools (03/02/2011): We cover a new monitoring blind spot with Internet Explorer tools when measuring iframe response time. If a JavaScript is executed right after iframe request starts then IE might include the JavaScript execution time in the iframe response time. This causes misleading data that incorrectly suggests the server for the iframe URL is responding slowly, when in fact is not. […]