Posted
by
timothyon Thursday December 22, 2011 @04:18PM
from the includes-the-toaster-and-the-pool dept.

MrSeb writes "According to new research from HTTP Archive, which regularly scans the internet's most popular destinations, the average size of a single web page is now 965 kilobytes, up more than 30% from last year's average of 702KB. This rapid growth is fairly normal for the internet — the average web page was 14KB in 1995, 93KB by 2003, and 300KB in 2008 — but by burrowing a little deeper into HTTP Archive's recent data, we can discern some interesting trends. Between 2010 and 2011, the average amount of Flash content downloaded stayed exactly the same — 90KB — but JavaScript experienced massive growth from 113KB to 172KB. The amount of HTML, CSS, and images on websites also showed a significant increase year over year. There is absolutely no doubt that these trends are attributable to the death throes of Flash and emergence of HTML5 and its open web cohorts." If you have a personal home page, how big is it?

Well, I don't mind bragging about mine. I was 100k, but now has swollen to 150k this year. As to *real* servers, I try to keep our ecommerce pages below 250k for gateway pages. Until this year, I tried to keep them under 150k. Up until 2008, 100k was the target. Before 2003, 50k. This is kind of light, and a few pages bust this, but very few. Before 2000, I used to spend lots of time just optimizing graphics, now I just use some common sense, PS, and very little time.

What I have found is that the total k of data isn't as important as the number of items and hosts the page calls. I find I can make my pages faster by using image maps, which make larger images size (12 images 1 image of all 12 items) but load faster because it takes less connects. There are a few tools online that can help you figure out total load times. Nowadays, load time is NOT purely a function of the size of the data. If you can cut down on the number of GETS and cross domain GETS (ie: DNS lookups) you can radically cut down load time and reliability.

Also, pages that don't need to be dynamic, shouldn't be. Our gateway (to product categories) pages are generated as we update the site, and stored static. This allows them to be cached. It sounds old fashioned, but the fact is that it greatly increases perceived latency. I am amazed at how many websites are generated via PHP and SQL on the fly, yet aren't updated more than a couple times a day or less. That is a lot of wasted CPU cycles on the server, and a lot of wasted potential for caching, both locally and down the line. And yes, it makes your website load slower, making it seem like your pages are larger than they are.

Also, pages that don't need to be dynamic, shouldn't be. Our gateway (to product categories) pages are generated as we update the site, and stored static. This allows them to be cached. It sounds old fashioned, but the fact is that it greatly increases perceived latency. I am amazed at how many websites are generated via PHP and SQL on the fly, yet aren't updated more than a couple times a day or less. That is a lot of wasted CPU cycles on the server, and a lot of wasted potential for caching, both locally

No rational ecommerce site designs for a 1920 wide screen. You always design for the lowest common denominator. Most customers aren't computer geeks, and you want everyone's money, not just the smart peoples. Personal websites (the topic of the story) yes, but never for money making websites. We still design around a maximum width of 900 pixels (menus plus 760px of actual content), and will for some time. This allows for for scroll bars and a little buffer on a 1024x768 screen, or for half of a 1080p screen. It also makes it readable from tablets and phones without having multiple code bases, which is prohibitively expensive for a small to medium sized ecommerce site.

Yes, it is a good thing. We get unlimited 3G data for an increment of 3euro/month on the basic cellphone service per phone. It's only a 384kbps data service, but it can be used non-stop without incurring any extra fees. For the three cellphones I pay for (me and two kids), the combined bill rarely reaches 20euro/month, including taxes and all calls.

The phone issue is interesting. I was just on the "How Much Data Do You Need?" page for a local provider. You slide the bar for various things, like how many web pages you visit in a month (as if anyone really knows that). Their assumption was 0.17MB/page. I know there are mobile versions of pages and such, but this still seems like a gross underestimate given this story.

With the growth of Javascript libraries like JQuery for more UI features, more images, I can see it reaching that high.

Meanwhile, web developers don't care because more and more people are getting faster and faster broadband speeds. So as long as the page-load metric works OK on their rig or perhaps what the envision most of their viewers have... they think it's all OK.

Web developers don't care because the majority of their images/css/js is cacheable by each visitor (and most people have jQuery cached from the official site and many sites link to that directly). 1MB page but it's only 45k on the next visit.

That'd be an interesting refinement, attempting to get numbers for typical data loaded after caching. Would be hard to come up with a "typical" user profile to use at various times in history for comparison, though.

Would be something interesting if the browser vendors collected anonymous usage data though. Not even which sites you visited, but just stuff that would be useful to the developers, like page size, time to download, time to render, cache hit frequency. Obviously many people would want to turn this off, but if you got enough people using it, you could get some really interesting data. As long as you don't collect information about which sites are being visited, but rather the characteristics of the sites

Yup. Google helps us out here. [google.com] If we're using offsite resources like that, there's a fair likelihood that it's cached in the user's browser even if it's the first time they've visited the site.

Yup. Google helps us out here. [google.com] If we're using offsite resources like that, there's a fair likelihood that it's cached in the user's browser even if it's the first time they've visited the site.

Certainly, JS frameworks do contribute to the total size of a page, that framework is generally cached and isn't re-downloaded on subsequent pages on the same site. So, your 965KB page just dropped to 800KB after the first page load. Images that are carried through a site (logos, widget buttons, backgrounds) can also only be counted on the first page load.

I tend to focus on keeping things small, reusing anything I can. Some web developers do care...at least I do.

Well, stuff like jQuery/Dojo/etc libraries shouldn't be loading every time you view a page.The first view, your browser will need to load all the associated CSS, HTML, etc.After that, included files should hopefully be cached, and only page content need be loaded.

Also, with JS libraries and AJAX, one should be able to build pages that load the overall template once, but don't require pulling large HTML files for updates (rather just pull content with AJAX).

I don't know about other developers, but I do care, and try to keep pages small. More and more people are accessing the web on mobile devices, so minimizing the data going back and forth, and round trips to the server, is important to user experience. In the design community, designing with mobile devices in mind is a growing practice.

Forgive me piggybacking here, but I've a web question. I read Slashdot predominantly on my phone (doesn't everyone?), but once you get 5 or 6 levels of replying in, the posts become unreadable. Each reply has a shorter width then the one above, meaning you end up with a handful of characters per line, and the rest of the horizontal space as just that - space. Is that really how it's supposed to be - completely unreadable? Is there no way of overriding it and saying 'look, I know it's a reply due to the context`. I've tried poking around in the various options within Slashdot, but I don't understand what most of them do, and the so-called help is completely useless and doesn't describe what the options mean nor how to use them. I think the problem is that the designers of Slashdot believe everyone is using a monitor so you'd probably need to be about 30 or 40 levels in to get to the same problem.

I'm using Dolphin HD on Android but a friend with an Apple phone has the same problem. Is there an answer?

Good web developers do take that into consideration. I've had both lead and non-lead positions and one thing we always did was check the actual size of the complete page, load times, and several other things. I've even go so far as to check sites on a dial up account to see how they preformed. Sometimes it gets a little nit picky, but if you care about the user experience it's something you have to do.

10 years ago online video was virtually nonexistent, and where it did exist it was never larger than 320x240. Pictures were equally low resolution and page formatting was minimal. Allowing user comments was rare, and user contribution based sites like YouTube and Wikipedia were nonexistent. Oh yea, and the "blink" tag was still popular. So yes, I would say the amount of information has increased significantly.

and the 3G users, and the satellite users, and everyone else that has a low-bandwidth and/or high cost per byte connection.

My parents can't get DSL or cable. They're stuck with 22k dial-up, and use AdBlock Plus, NoFlash, and Propel accelerator with compression set to the point where you can barely recognize photos, and it still takes 2 minutes for a reasonably normal page (CNN, MSNBC) to load, much less anything with a ton of Javascript or Flash.

Can't websites automatically detect connection speed the first time a client visits, and store a cookie so that us slow people get a nice, simple website?

Oh, and Propel, please move to JPEG2000 and XZ compression. Some people need every byte they can get.

They don't have any cell reception, barring standing in the right spot in the yard. They live at the bottom of a large valley that blocks cell signals.

One thing I have thought about is buying two antennas and building a passive reflector to beam some signal into their valley, but I'm waiting for Verizon (or anyone, for that matter) to roll out 4G before I spend money on it.

Can't websites automatically detect connection speed the first time a client visits, and store a cookie so that us slow people get a nice, simple website?

Ooooh, ooooh, I have a better idea! Can't website ditch the f**cking crap they now use (reaching 1MB average?) and just re-do their site as a nice, simple website that you describe?
I will never understand why any website would have a legitimate need for background music. Or an interactive (a-la-DVD opening screen) navigation with 1-second delay and bu

I've used Lynx since the early 90's, but it's not a realistic solution for my parents. Since they have both Chrome and Firefox installed, I disabled loading photos in Firefox to give them something Lynx-like, but it renders a number of sites unreadable, primarily ones that validator.w3.org barf at.

I have a homepage, and it's only 4.92Kb. Granted it is the "It Works!" page for CentOS which has all of the other text and icons and such but who needs more than that? Do people really have personalized home pages now that Facebook came about (other than some hobbyists or professionals who run a side business)?

I wonder what the average "Facebook" homepage size is... since that is what most people will be seeing regularly.

Why, what is your theory? The only reason that page even exists is because I have to host the Google validation page to show ownership of my domain (Google Apps) and didn't have anything else interesting to put up on the site.

I'm going to laugh if you think it's an old age / generational thing too, but I am curious.

Yes, compression helps (and is generally done automatically in any good Apache configuration). What helps even more from a user's perspective is combining files; basically, in the backend we combine all our Javascript and CSS (or as much as is reasonable) into one file instead of serving it as multiple, separate files linked to the current page. This cuts down on HTTP requests massively and speeds site loading from a user's perspective. Yahoo has a great list of best practices for speeding up sites [yahoo.com] if you're interested.

Persistent HTTP connections were tacked on to HTTP 1.0 years ago and are widely supported, but you still have the "can I have that bit now please?" overhead with the associated latency between retrieving each file on each connection. 100ms of latency multiplied by a dozen assets soon adds up. HTTP 1.1's pipelining means you can ask for many things at once so only suffer that hit once (or twice - page then assets), but in practice browser support for pipelining is poor.

On most sites that I go to that have a paragraph per page model, I just click the "Print" button/link on the site and they combine the pages for printing. Then I read it without needing to print it. Sometimes they require printing it. If they do, I am less likely to read the article at all.

This only matters if people go to the first page, and never go to any additional ones.

For most websites these days, you'll take the initial hit from javascript and the 'branding' images when you first get to the site... but the changing content per page is much lower.

If websites are using standard javascript libraries being served by Google's CDN [google.com], then it's possible that someone visiting your page already has jquery, mootools or similar cached and doesn't need to load yet another copy.

I also didn't see if they had any comparison between transferred size vs. used size. (eg, javascript that's sent compressed)... and as this is from an new archive... does anyone know if Archive.org could analyze their holdings to see what the longer term trends are?

My home page remains where it has been since 1993 at the Calgary Unix Users Group: http://www.cuug.ab.ca/branderr [cuug.ab.ca]...clocks in at 9.2K, plus a 15K GIF and a 9.1K JPG (if you "turn on images" in your browser - remember when it was a realistic option not to?)

I have held the line, while Viewing With Alarm (VWA) the growth of web pages for the entire 18 years since. I wrote Bob Metcalfe when he had a column at InfoWorld 15 years back, and he was Viewing With Alarm the exponential growth in Internet traffic and predicting the "collapse of the Internet" (had to eat those words - literally) because of it. My letter pointed out that his column constituted 2K of text - that was all the generated content that was bringing in the readers, (unless you count the 10K gif of Bob Metcalfe, and I don't), and the page had an additional 100K of framing and advertising-related image GIFs. His reply was somewhat defensive.

This last year, I had occasion to travel on the Queen Mary 2, where all internet is via satellite at a minimum of 34 cents per minute with their bulk plan. How quickly I grew to resent the giant Flash blobs that would be automatically downloaded with every page of a newspaper so I wouldn't miss the animated ads for the latest in car buys. At QM2 speeds, I'd have to wait about two minutes before I even had an "X" mark to click on to dismiss the ad. I was rather quickly cured of almost any interest in the Internet content at ALL, I did my E-mail, checked the google news headlines (fewest high-byte ads), and logged off.

My point: 90% of mail is spam. So are 90% of web page bytes. We just don't call it spam. We call it "the whole outside frame around the news page that we try not to see, but keeps jumping around into our field of view".

Blue background. Why? Are you trying to accomplish some artistic purpose we're not privy to?

Why are the pictures laid out vertically rather than horizontally? Why is there lots of text to the right of the second picture rather than to the right of both pictures. That means that your contact info is obscured/invisible to potential readers -- it's also out of context in that place.

Why do your anchors span multiple sentences rather than just a few semantically relevant key words?

Sorry to be a dick but you're bragging about that page? Really? You know when they say "size doesn't matter"? Yeah - sometimes it also means being as small as possible is not necessarily a good thing. I would have thought that page was trash ten years ago when Geocities webpages were everywhere so, now, it's really not good... Seriously, stop bragging about it and spend some time designing a real page.

There is absolutely no doubt that these trends are attributable to the death throes of Flash and emergence of HTML5 and its open web cohorts.

No, it's not about HTML 5. A lot of it is about bloated content management systems and templates.

I was looking at a Wall Street Journal page recently, and I brought it into an HTML editor so I could eliminate all non-story content. The story required an HTML page with only 72 lines. The original page was over 4000 lines. It contained a vast amount of hidden content, including the entire registration system for buying a subscription. All that junk appears on every page.. Inline, not in an included file.

On top of that, there are content management systems which create a custom CSS page for each content page. So there's no useful caching in the browser.

Remember those people who said CSS was going to make web pages shorter? They were wrong. Look at Slashdot - bloated, slow pages that don't do much, yet consume CPU time when idle.

I would mod you up but i have commented all ready - its a major problem seems worse in old school publishers unfortunately. Lucky the one I work for has finally seen the light lets how that I see some changes next year.

I agree that the reason things are getting bigger is because of extra "crap" getting served. Comercial pages are the biggest chunk of it, but even stuff like wordpress can toss out a lot of junk with templates. With bigger screen resolutions and assumed high speed internet, I'm seeing many sites being much more sloppy with large graphics too. The slashdot question at the end makes it sound like personal pages are relavent to this statistic. What percentage of the population actually has a personal homepa

No, it's not about HTML 5. A lot of it is about bloated content management systems and templates.

What do you think HTML5 is all about.

An all new way to deliver bloated CMS's. Why do some people think HTML5 is some kind of magic fix for all the ills of the web?

The problem is bad design and lack of care. No one gets punished for creating a crap system, bad developers get coddled, customers are coerced, sweet talked and sometimes forced into accepting bad CMS's.

My personal site's home page? Fairly large, 18k of which 11k is images. I mean, it's a home page not an image gallery or something like that where you expect a lot of large content.

I've seen some of those sites with large pages, and mostly I hate visiting them. The loading makes them feel like I'm wading through molasses, and the amount of stuff they're loading and the complexity of the scripts means more and more glitches and things that break when the network isn't perfect or they didn't expect the exact

my site's a pyjamas application. it is therefore 1,000 lines of python.... or, when compiled (and therefore including the pyjs runtime which does stuff like dict, list, exceptions etc. all emulated in javascript, as well as including the library of widgets that are used on the page) it's 1.3mb of really obtuse but functionally correct javascript.

I've been able to run both CPU and GPU based CFD and 3D visualisation on my laptop without any problems, yet some flash games which are just doing 2D animation will roast a 2.7 GHz CPU to the point that the kernel decides to call it a day and shut down the whole system.

Unbelievably, these flash games aren't doing anything more complex than playing a retro 2D platform game. I'm guessing that this is due to the way in which all the separate texturemaps/pixelmaps are treated as generic webpage images rather than as a single DOOM style WAD file.

It's caused by inept code that just runs at maximum frame rate regardless of display. Those 2d animations are probably being generated at several thousand FPS, just because the programmer didn't know how to limit it to something more reasonable.

No doubt you ran into a game using the Flixel library for AS3. 99% of the time I see the Flixel logo pop up before a game I know my laptop fan is going to turn on and the game is going to run choppy. I don't know what that library does behind the scenes, but it's an amazing CPU hog....And yet in my own games when I implement the built-in AS3 BitmapData.copyPixels() routines to move around massive amounts of sprites, my CPU doesn't even break a sweat.

The abuse of __VIEWSTATE in certain pages makes the actual viewstate bigger that the site itself, per click, growing and growing. Which basically must count for something. I have always wondered how Microsoft hould have thought this out, or maybe the lack of education of its "developers".

My personal web site's home page is 2KB. It's HTML5, no CSS, no JS.
My research group site has a bit of all three plus a handful of images and comes in at 125KB.
Big website I sysadmin weighs in at 1.1MB.
A nice variety there. I think my personal site claims the crown as the fastest loading and quickest to render.

Some sites use Javascript to display what is semi-static data that should be assembled on the server side before transmitting to the user. For example, a news site where the stories are loaded by Javascript.

Some sites even have pages that are entirely blank if Javascript is turned off. It seems that some of these "web programmers" don't even know how to dynamically build a page with server-side scripting instead of Javascript.

Actually that's a really smart thing to do from a bandwidth point of view. There are all kinds of reasons not to do that (some of those are gradually disappearing, since now Google's crawler is starting to run some javascript to build the page as the user will see it), but if you are concerned about bandwidth, having javascript build your page for you is a very good way to do it.

From 1995 to 2003, 26.7% annual gowth (take the eighth root of (93/14) and then subtract 1). From 2003 to 2008, 26.4%. From 2008 to 2010, 53.0%. Last year's growth was 37.5%. All percentages rounded to the nearest tenth of a percent.

What's worse is that the "payload" of text is less and less interesting. Bandwidth isn't the problem. I have more than enough bandwidth for these pages. When they hit the browser, they take forever just to render. There are a handful of web sites I still use, Slashdot among them. Most new sites I just back right up. If your site does that on day 1, it's not worth the bother. I'm not buying a new machine just to look at your crap web site that's probably just a rehash of every Internet meme.

Riiigght... Javascript increases by about 50Kb, so it's responsible for the other several hundred Kb of increase over the last few years?

Everyone realizes that gzipped jQuery is only 31Kb, right? I'm sick of people blaming Javascript for bloat. Do you realize how much work it would be to produce several hundred Kb of it? Much less think of reasons to produce that much?

I've been a web designer for years, and where the increases in page size I've seen actually come from is just plain old images. Monitors are

Well maybe if you include all the images and the PDFs. I have a rather extensive website and if I recall, even when I backed up the entire thing, it came out to maybe 76MB, and that included all the image hosting I was doing for a different website.

The problem is the same problem we're having now with "windows" software. Bloated because it's being generated by machine rather than hand-coded. All these WYSIWYG HTML code generators that allow people to just drag and drop text and pictures and let Dreamweaver

About half my regular blog readers are based in emerging markets / less developed countries. I began to notice that hittership was dropping in Africa and India. Reviewing about a thousand posts, I noticed that the more photos and "blogger apps" I put on the web page, the lower the readership in countries with low bandwidth. I've been more conscientious now about which photo resolution I post and tend to avoid videos. And a lot of the cool little blogger widgets don't seem as important when measured in seconds to open the page. http://retroworks.blogspot.com/2010/12/blog-has-widget-fever.html [blogspot.com] Of course my content also sometimes sucks, and it also helps if I lay off the haiku.

The "home" page at our home web server is 9.5kB, including some Javascript, but it will load about 80kB of Logos from various FOSS sites (Gimp, Scribus, Inkscape, SciLab, etc.). Most of the index pages in different areas are also rather less than 10kB in size, but some of them link to pages containing albums of photos and videos. The entire site contains 15.6 GB of files which can be served up, mostly in these albums.

CSS is for prima donnas and Flash is for artistes. PHP is for chatterboxes and Perl is for psychics. Javascript is for the clinically insane, and Ruby is for hipsters. Drupal is for geeks and Ajax is for nerds.