Wednesday, August 1, 2007

Shared Library Delivery over CDN?

A slew of enabling libraries and frameworks are available to enhance functionality, usability, and interactivity of web applications we build ... mootools, prototype, scriptaculous, dojo, yui, to name a very, very few.

While these libraries afford developers tremendous agility to build advanced applications, some get large in size, and are often served directly from those applications in less-than-optimal ways, resulting in perceptible sluggishness. This is where a highly-optimized Content-Delivery Network (CDN) to serve such libraries can dramatically improve application performance.

Yahoo's inaugural blog post on the subject, goes over advantages of leveraging their delivery framework:

(...) Moreover, Yahoo!’s hosting network is configured to serve JavaScript and CSS using gzip compression. We minify YUI JavaScript before pushing it to our servers; in combination with gzipping, this results in a 90% reduction in transmitted filesize as compared to the footprint of YUI’s raw (and commented) source. CSS files weigh 60% less on the wire using gzip compression. If your current host does not support mod-gzip or mod-deflate, the advantages of using Yahoo! hosting could be dramatic. (...)

... while they also caution:

Serving YUI from Yahoo! servers won’t be the right decision for all implementers; if you’re aggregating or customizing YUI source code and serving it from a highly performant host, there will be little reason to switch. However, for some implementers the provision of free, robust, edge-network hosting will have significant upside.

Should a large number of sites elect to load the YUI libraries directly from the same Yahoo CDN URL, the caching benefits and efficiencies could be tremendous.

A person browsing the web, could load a YUI library in their browser's cache upon first visiting "Site A". Since Yahoo sets an aggressive "Expires:" HTTP Header, the user's browser will likely not even try to "revalidate" the file with a conditional HTTP GET for quite some time during subsequent visits to "Site A". Later, the same person might visit "Site B", which also happens to be loading the same YUI Library from the same Yahoo CDN URL. The browser will recognize it, realize it has it in the cache, and, in theory, not even try to revalidate it with a conditional HTTP GET. Meanwhile, "Site B" might feel "impressively fast" as it loaded quickly even though it was the user's first visit to "Site B". That's because "Site A" laid out the ground work!

... You get the idea.

Yahoo's optimizations around HTTP Performance and Caching, many of which they've outlined in their 13 Rules, ought to be a great contributing factor to limiting costs of operating their Library Content Delivery Network.

There are however many other widely-used libraries beyond the ones Yahoo authors, that could benefit from such a model. If I were to look at my browser's cache right now, I could see a dozen instances of the same scriptaculous library loaded from a dozen social networks I've visited in the past. It's getting to be silly, inefficient.

As both Library Authors and Implementers, we'd like to think of ways the developer community could benefit from an optimized framework similar to Yahoo's model

Which is where a Shared Content Delivery Network for client-side libraries might become interesting.

Such framework would allow site owners and developers to "register for the right to include a library on their site from the shared content delivery network URL". Let's face it, bandwidth and CDN infrastructures cost money, and access to such services should be contingent upon modest charges, tied to a Paypal or Google Checkout account specified during the registration process.

As the nature of those libraries is to be embedded within documents, the HTTP Referer (sic) should be sent with every request, at which point the "CDN Service" could verify that the originator site is actually registered. If it isn't, an HTTP 403 (Forbidden) response would be thrown. Each registered "Hit" would be tallied to a given account, and settled via Paypal or Google Checkout at the end of the month (or any other recurrence pattern). If no "Referer" header is present in the request, then a 403 would be thrown.

Beyond site owners, essentially "the consumers/implementers of enabling libraries", we need to consider Library Authors. How do we determine who gets to put their Library on the Shared CDN? A human-driven application process might be appropriate.

Looking at the tip of a likely large iceberg of custom functionality, such a framework definitely falls outside of the typical "file pushing" "out-of-the-box features" you might find on most commercial CDNs. However, it ought to be possible to leverage some of their more advanced features to build this custom framework. One of them might elect to build it in-house, or one of their clients might build a prototype.

Akamai comes to mind. They support deployment of custom J2EE apps onto their network. But there are others too. Most sport a large worldwide infrastructure for edge-caching and optimized content delivery, be it static or transient through efficient network routing.

Developing this framework would likely enable them to gain some revenue, catering to a more "Long-Tail" clientele, be regarded as innovating pioneers among developers potentially leading to larger accounts, and become de-facto "parts" of vital Web Infrastructure, thereby further cementing their longevity.

Update 11/21/2007: See Ajaxian.com's entry on CacheFile.net. It looks promising! As of this writing, it doesn't offer CDN-backed delivery, or compression, and ought to provide for some sort of revenue stream (as offered above) beyond donations to at the very least cover its operating costs. Let's watch them closely as they evolve! :)

2 comments:

G: how would you handle the referrer-checking at the edge servers, tied to a database of "registered sites" without a custom application? It doesn't have to be an application that actually uses the full j2ee stack, I just seemed to remember they supported deployment of .ear apps at the edges. The app itself can simply be a handful of servlets and JSPs.