According to http://support.microsoft.com/kb/944884, "when a large response or large responses are sent to a client over a slow network connection, the value of the time-taken field may be more than expected".

I have a situation where a client will say, "I sent a request to your web server at 10:03:24 and it took 20 seconds, why?". I can see this in the IIS logs as well, but the server's ASP.NET module logged it as taking 100ms, and CPU and Disk counters were low.

I suspect that it's due to a slow network connection. How can I prove this?

Update:

1) These are SOAP Web Service requests, therefore no embedded graphics, just an HTTP POST with a single XML page of results.

2) Also, I've reproduced this by throttling network speed on the client side and the symptoms are exactly the same.

3) The problem is intermittent, meaning the same request is normally fast for the client but occasionally slow. I can't reproduce this myself other than by throttling the network. The server's ASP.NET logging shows it always fast, but IIS logging shows it slow when the client says it's slow.

4) I only have access to the server, and need to provide as much information as possible to the client so they accept that the issue was not on the server and know what logging/tools to run on the client to find root cause.

Are these requests normal page views that require fetching embedding graphics and so on? Or are they automated queries that return only a single page? Are we actually measuring the time to load a page or the time to respond to a single HTTP request?
– David SchwartzJul 30 '12 at 8:18

3 Answers
3

I have a situation where a client will say, "I sent a request to your web server at 10:03:24 and it took 20 seconds, why?". I can see this in the IIS logs as well, but the server's ASP.NET module logged it as taking 100ms, and CPU and Disk counters were low.

I suspect that it's due to a slow network connection. How can I prove this?

It starts with looking for packet drops between your client's browser and all the sources of images / scripts / html for the aforementioned web page. If you find consistent packet drops, then you know for sure there is something in the network that needs to be fixed... even if it is just a link that's overloaded. Packet drops are not the only reason for a slow network, but it's the most common source in my experience. Other sources could be a misconfigured proxy or cache engine. Sadly, I can't list all possible network culprits here.

However, people often blame the network, when in-fact the speed issues are well-within their own control. Possible explanations:

Suppose the HTML for that page was written poorly and it loads required scripts in the wrong order so the whole page renders slowly, even though almost all resources were in-place.

The page is waiting for a resource that simply doesn't exist and times-out while waiting.

A script is in a slow loop that blocks for a while

A cache engine takes a long time delivering an image

Your CGI is looking up something in a database, and the lookup itself is slow

You're using google analytics, which slows things down due to the way the page is written

I could go on, but the point is you have to nail down the exact reason for why the page is slow yourself. A flawed network is possible; it is also possible that other factors are contributing to the slow performance.

To diagnose further:

If the page loads well in Firefox, then the Network tab in Firebug is your friend (Hit F12, then go to the Network tab and reload the page). Firebug gives you a nice waterfall diagram for how the page loads and where the delays are

If the page loads well in Chrome, you can do something similar (Hit CntlShiftI, click on the network tab, and reload the page).

If the page is only supported in IE (btw, shame on your HTML developers), your best bet is to start loading each of these ASP page elements individually with curl until you find something that looks way too slow, then find out why that particular element is slow.

BTW, the Chrome and Firefox examples used a CGI query from Debian.org; this is a good example of a delay that comes from a CGI lookup.

When all else fails, you can get a .pcap from wireshark and run it through tcptrace; however, while tcptrace is very good at analyzing packet dumps, there are no guarantees that you can isolate the issue with tcptrace alone. See this answer for information on using tcptrace diagnostics.

See my updates above. While your info is very useful in the general case, I don't think it applies here. The page is only intermittently slow, and the symptoms are only reproducible when I throttle the network at the client side.
– JonJul 31 '12 at 2:30

the waterfall charts in firefox / chrome support http post operations, as well as curl... I am not sure how you concluded that the info doesnt apply, but it would seem that it doesnt involve a full application of the tools against the problem domain.
– Mike PenningtonJul 31 '12 at 4:18

Firefox/chrome are client-side tools. I only have access to the server, and I can't repro using my own client. I need to tell, from the server only, if a particular request was slow due to network issues. That leaves packet capturing, but that is too heavy to leave on in production (consider 1 in 10,000 requests might be slow).
– JonJul 31 '12 at 5:30

As a network engineer with over 15 years under my belt, may I respectfully suggest that you cannot diagnose a client-side HTTP services problem from the server alone; you simply don't have enough information (which is apparently your conclusion too... however, you don't seem to be open to living with this reality :-).
– Mike PenningtonJul 31 '12 at 5:34

If packet capturing at the server can diagnose network issues (eg via seeing a slow TCP ack), is it not reasonable to expect a lighter-weight tool/logger could show the same?
– JonJul 31 '12 at 5:42