If you got a 500 or something like that it will show up as the result code for the page (which is normally 0 for success or 99999 for a content error) and a screen shot could be taken because it was an error (I assume you figured out how to grab screen shots on errors). If the page returned a 200 but gracefully failed then it wouldn't.

Right now pagetest can only either grab screen shots on errors or do a full dump of screen shots and graphics (used for one-off testing). Wouldn't be hard to add the ability to grab a screen shot for every page but be warned that the storage requirements could get out of hand pretty quickly if you're crawling a complicated site.

Are you talking about the page-level or request-level data? For bulk processing of the request-level data I wrote an app that parses the log files and splits the results out by domain so we could see what all of the broken requests were for a given domain, regardless of property. Most of the bulk analysis has been targeted at looking for specific things (broken content, missing gzip, etc) so it wasn't too hard to throw together a script that could just look for all incidences.

For non-crawled testing we database the results and have a front-end for plotting the results and doing drill-downs. That's a fairly large and complex system though.

Thank you for telling me what result code 99999 was, i had worked out the others from the php code, but noticed you read in records that were 0 or 999999 so i was a little confused.

Re screenshots: I think the problem i have is that for certain URLs in my URL file, the application is throwing an error server side, and JBoss returns a 500. But on its way out of Apache, it gets munged into a 200. So urlblast sees a tiny page returned in a few ms. Without a screen shot i cant prove that. I agree, the storage would be a consideration. I think overwriting the screen shots would be an acceptable tradeoff.

I was talking about page level data. I'm scanning 100 URLs, and i'm only interested in performance, and im running it around the clock so i can build up a trend over the month. I'm seeing v. fast times early morning and they slow down through the day until about 9pm when they begin to speed up again. I want to see the patterns over the weekends vs weekdays etc.

By doing this i can factor in latency of the ISP used for the collection of data, the latency of the Internet in the UK, and get a real handle on how well those websites were performing.

I could just do with 20 more nodes!!

I'm going to read in the tsv page file using mysql, and then drill down.

Cheers,

Stuart.

P.S

Once I've completed my testing, i'd like to offer my node out for use from the main pagetest site.

I read this post with great interest. I want interested in trying to generalize these results. Although we know that website optimization is important, from a telco or operator perspective, it is relevant to understand what is the common experience per webpage. I hope you could provide your feedback on whether this makes sense or not at all.

Your stats provide us really interesting information. However, out of all the unique URLs, is it possible to know how many are for instance, in the top 100 as defined by Google? My guess is that most people use the same sites frequently.

In my attempt to generalize your results I have taken metrics from google, especifically requests per page and web page size and try to filter out your data with those two items.http://code.google.com/speed/articles/web-metrics.html
My intention is to have data from your set as closely similar as the top 100 defined in this list and also take some outliers off.

I did the following:
(1) filter all your results that are
- below Top 100 low 10% page size
- above Top 100 high 10% page size

(2) filter further the results that are
- below Top 100 low 10% number of objects
- above Top 100 high 10% number of objects

Re plot the chart and recalculate averages for load time and average page size.

Would you be so kind on giving any thoughts on whether you believe this is representative/useful, or likely not to be the experience of an average user?

Best regards
-Andreas
Solution Architect, Japan

-------------
At a high-level, here are the average statistics across the whole sample set:

I read this post with great interest. I want interested in trying to generalize these results. Although we know that website optimization is important, from a telco or operator perspective, it is relevant to understand what is the common experience per webpage. I hope you could provide your feedback on whether this makes sense or not at all.

Your stats provide us really interesting information. However, out of all the unique URLs, is it possible to know how many are for instance, in the top 100 as defined by Google? My guess is that most people use the same sites frequently.

In my attempt to generalize your results I have taken metrics from google, especifically requests per page and web page size and try to filter out your data with those two items.http://code.google.com/speed/articles/web-metrics.html
My intention is to have data from your set as closely similar as the top 100 defined in this list and also take some outliers off.

I did the following:
(1) filter all your results that are
- below Top 100 low 10% page size
- above Top 100 high 10% page size

(2) filter further the results that are
- below Top 100 low 10% number of objects
- above Top 100 high 10% number of objects

Re plot the chart and recalculate averages for load time and average page size.

Would you be so kind on giving any thoughts on whether you believe this is representative/useful, or likely not to be the experience of an average user?

Best regards
-Andreas
Solution Architect, Japan

-------------
At a high-level, here are the average statistics across the whole sample set:

I'd love to see what the separation is between the top 100 sites and the not-in-the-top 100 sites. I'm not sure you could draw any conclusions from it but it would be an interesting data point.

I don't know that there really is such thing as "the average user". We see huge variations in browser installs with even the most prevalent version usually not having more than 30-40% of the share for a given site. Put variations in geography and connectivity on top of that and at least for the times there is a huge distribution.

The page size and request count information should be fairly consistent (assuming the sites aren't doing much browser-specific work like data URI's). Even then the "averages" might be interesting to look at over time but I bet the distribution would be even more interesting (the average may stay the same but are the heavy sites getting heavier and the light sites getting lighter?