Site Building Hotness with Edge Side Includes

We've covered caching a
fair amount recently.
We've discussed how front-end caching can be a powerful asset when building
sites. As is often the case, though, using a powerful tool effectively takes
some getting used to. It also means learning new ways of doing things. Caching
is no exception.

When most developers build web apps, they have a model in mind that is so
ingrained that they scarcely know it's there. What is it? It can be summed up
by a phrase you've heard many times; 'The user requesting this page...' It is
the assumption that the page is being built right now for one person who is
waiting for their browser right now to show the page. The assumption that
follows is that you are building the entire page at once.

When you add caching to your application, the first assumption immediately
fails. You are no longer building the page for a single person, you are
building the page for potentially every user of your site at once. This
requires some re-thinking of how you build your pages, and how you use
per-user information to customize your pages.

In order to cope with this change, you have to start generalizing your pages,
so the content is not specific to any particular user. This is a topic I
touched on at the end of last year's article. We briefly covered
a couple of options for solving that problem, mostly involving JavaScript.
What you may not have noticed, though, is that in doing so we were directly
trying to cope with the realization that our second assumption had failed; we
can no longer build the entire page at once.

So, here we sit, the fundamental pillars of our web development process
shattered around us. What do we do now? Well, we do what we always do, we
start building again. The JavaScript / AJAX options I mentioned for loading
data into the page at view time make a lot of sense; but in some sense it
feels wrong, like a blank spot in your newspaper that says 'go and look in our
weekly magazine to see this article.' It works, but it doesn't feel right.

We've moved our processing to the client, where we know we have very little
control over what happens, if anything happens at all. Additionally it
barely solves our problems, and doesn't solve them in every case.

So what do we do? For a long time the answer was 'nothing.' We were out of
luck. Use JavaScript and cookies to customize anything on the page. Trim our
cache times to their absolute minimum in order to allow the dynamism that our
sites and our users have become accustomed to. It worked, but again, not
entirely, not necessarily every time, and it was complicated.

Things have changed, though. There is a tool which until recently was only
available to those willing to pay for hosted caching systems. With the
2.0 release of the Varnish cache software, we
finally get easy access to Edge Side Includes, or ESI.

What is ESI?

So what exactly is ESI and what does it do to address our shattered
development process? Well, put simply, ESI allows you to include data at the
edge of your web application, the cache. You can use ESI to break every one of
your pages into smaller pieces and let the cache retrieve them separately and
assemble the whole page at request time.

What this means for you is that each 'page' that you create doesn't have to
know the details of every piece of content that will show up on the fully
rendered page. Instead, it just needs to know what, on a high level, will go
there. It leaves actually filling in those details to the cache, which is the
point in your system that the browser is actually talking to anyway.

It also means that each piece of content can have its own cache time. Up
until now, we've had to be very conservative about our cache times. Our
content could only live in the cache for as long as the piece of content with
the shortest time to live. If we have a 'recently published stories' block in
the sidebar, we have to cache for minutes, even though our story is unlikely
EVER to change once it's published. If the 'recently published stories' block
can be handled separately, then the story page itself can last much longer.

A game changer

I have seen MVC catch a lot of flak lately in the web development
realm. MVC itself originated in the GUI realm and has been somewhat
successfully applied elsewhere. I've read blog posts and comments about
how MVC doesn't translate to the web and how it is not the right pattern
to use. Most often the support for this argument relates to the web's
transactional nature or its lack of state. Some people are familiar
enough with the origins of MVC to bring up another point. They say one
of the points of MVC was to allow the view to retrieve its data as it
was needed, which was necessary to really provide an interactive
application, an application that responds to us the way we expect
real-world objects and machines to respond to us. All of these argments
are correct.

That said, for me even though there is something of a mismatch between web
development and MVC in its truest sense, there are other benefits of the MVC
pattern that make it worthwhile anyway. Separating the problem space into
manageable pieces is important. Forcing the conceptualization of the whole
problem and breaking it down properly is essential for good software. The code
reuse and maintainability that comes from breaking the problem down according
to the MVC pattern is more important in my experience than the slight mismatch
in the presentation portion of the application.

Even so, there is still a mismatch. When you load a whole screen at once, the
dynamism that the original GUI MVC pattern provided is lost. This issue is
made worse when you start caching. You're not only not getting an immediately
interactive and dynamic view into the data you are interested in, you are in
fact getting guaranteed OLD data. That's not what MVC is about.

So what does all this MVC stuff have to do with ESI? Good question. ESI goes
beyond solving our dynamic content problem. If you let it, it will change
everything about how you develop your web applications. If you pause to think
about it, you begin to realize that with ESI every piece of repeated content
can be served up separately. Every piece of content can contain its own rules
about how it is displayed, and even more interestingly, how it is cached.

Having ESI in our web-development toolbox moves us one step closer to true MVC
development on the web. Each view of a piece of content is a self-contained
item. It has its own view rules, its own source of data, and its own
controller logic, all of which contain only the details they need to do their
very specific job. For anyone who has done desktop GUI programming (especially
with systems like Cocoa or QT), this will begin to feel familiar.

With ESI you get another bit of the MVC pattern that GUI developers take for
granted. You get reuse. We're not talking about the 'factor out this bit of
code and call it from lots of places' type of reuse that we as web developers
are all familiar with. With ESI we finally get to have small, self contained
bits of functionality that exist entirely on their own and can be reused
easily anywhere and any time you choose! With ESI we can finally have
self-contained widgets that we can plop into a site anywhere we want. I don't
have to tell you that this is a big deal.

When the editor asks 'can't we put that most recent articles box on this
page,' we can finally comfortably say 'Yes, no problem, give me about 10
minutes.' We can finally say 'Oh, you want a custom box that shows only
stories that are of interest to the current user? Sure. On every page?
alright.' I know some server admins who would have nightmares just
overhearing that statement.

With ESI, we finally have the toolset that lets us do the kind of things we've
wanted to do with our web applications forever. And what's more, it's easy,
it's flexible and unlike the other solutions to this problem that have been
foisted on us before, it actually makes our web applications faster.

What does ESI look like?

Now that I've showed you that ESI is something you want to know, it's time
to start actually exploring it. ESI in its entirety is complex. There are many
bits of functionality available in the complete ESI specification. Some of
them are very useful, some of them are only occasionally useful. We won't
explore all of the abilities of ESI in this article, but we will cover the
most important ones, which happen to also be the ones that Varnish provides.

There are three bits of ESI that you need to know. They are esi:include,
esi:remove, and the <!--esi ... -- > constructs.

The esi:include construct does exactly what you think it does. It allows
you to include something into the current page. This is the piece of ESI you
will use most often, as it allows you to load another piece of content within
the current page.

The esi:remove construct also does what you think it does. Anything between
the <esi:remove > tag and the </esi:remove > will be removed from the
page. This is useful for providing content you want to include if ESI
processing is not enabled. If ESI is not processed, the receiving browser will
ignore the <esi:remove > tag, and the contents will appear in the browser.

The final construct is <!--esi ... -- > construct. When the ESI processor
encounters this construct, it will remove it, but leave the contents intact.
This basically allows you to hide your esi markup from the client browser, in
case the ESI is not processed. If you didn't wrap your <esi:include > tags
within it, and ESI was not active, the content could show up in the users
browser. It also lets you have other content that is not ESI included, but is
only shown if the page is being ESI processed.

Since our main concern is the include routine, this will be our focus.

An Example

We've talked about why to use ESI. We've talked briefly about how to use
ESI. It's time to take some concrete examples and show how it all works.

For demonstration purposes, we will create a hypothetical site. Let's suppose
we have a blog site. The site has multiple authors and people post new entries
throughout the day. Under each article, we will have a block that displays
links to the last 5 articles added to the site.

In order for this to work, our article view action must contain the logic to
load the article being requested. It also must contain the logic to figure out
what the most recent five articles are. If you were smart you probably
factored out the most recent five article logic, but you still have to forward
to it within your view action to fill in the 'recentarticles' stash data. If
you had done this, and you are already using the caching method outlined
last year your actions
might look something like this:

This works. It's not too bad. It's well factored out; the logic for recent
articles is self contained so you can call it from wherever. Where it starts
to get a little ugly is the template. Under normal circumstances we might have
a template that looks like this:

Still not too bad. Pretty normal actually. But what happens when the editor
(or the designer) decides that they want to show the most recent posts on
another page, say in a little callout-box on the static 'about us' page? Well
now we have a problem. If your static page isn't handled by Catalyst, you
can't call the recent article routine at all. Let's assume, though, that your
static pages are handled by Catalyst. The action might look something like
this (error checking excluded for simplicity):

Ok... So in order to add the recent articles block, first we have to edit the
template and add our recent-articles html to the page. Unfortunately, though,
the html won't work if we don't populate the ' recentarticles ' stash variable.
Not a big deal... we just have to forward to the recentarticles action. Our
page action would now look like this:

Have you spotted the problem yet? There are actually two. The first is that
now your database is hit for every static page view, whether it displays the
recent articles block or not. The second is that now your static pages are
forced to cache at the very small 5 minute interval that the recent article
block requires.

The particularly experienced among you are thinking of ways to avoid
putting the recentarticles call into every page. Calling the
routine from within the template, perhaps, or some sort of lazy-loading
hackery. Wouldn't it be nice if you didn't have to hack it in?

Using ESI

Let's tackle the same problem now, only this time let's use ESI. We get to use
the recentarticles action as-is, because we factored it out well in the first
place. With ESI, though, we will be calling recentarticles directly, so we
will need to give it its own template. Let's see what it would look like.

Pretty straightforward. Now that we've done this, we can start using it via ESI.
Let's start cleaning up. We'll start by revisiting our view template, this
time pulling in the recentarticles block via ESI.

That's a bit simpler, isn't it? For both the the static pages and the article
view routine, the page you see looks the same. What is happenning on each
request has changed significantly however.

First, we are not calling our recent article logic on every request; we are
also not hitting our database on every request. There's something else that's
changed as well. Both our article content and our static page content are not
limited to 5-minute caching anymore. They are back to caching at 1 and 2 hours
respectively.

But what about the recent articles content? That can't be cached for 2 hours.
Don't worry. It isn't. The recent article information is only cached for its
5 minute limit. So what's going on? How can the article be cached for an hour
but the recent article list be cached for 5 minutes? They are on the same
page, it's got to be one or the other.

You are absolutely correct. The full page that is delivered to the browser
must have a single limited cache time. This is where Varnish + ESI begins to
really shine. Each piece of content is cached according to its own rules, and
Varnish assembles the page at request time from the data it has. So the
article view page, which may be 30 minutes old, is pulled in from the cache,
and the recentarticles block which may be only a minute old, is also pulled
in. The content is combined and served up to the waiting browser.

Your site has just experienced a drastic reduction in processing required to
serve up each page. The article view page for any given article is requested
at most once every hour. The recentarticles block is requested more
frequently, but even that has reduced your processing, as only the recent
articles logic is run, instead of the combined article retrieval and recent
article logic.

Per-user content

This gets more interesting when you have a block of content that is customized
for each user. On a site that doesn't use ESI, any page that contains a block
that is personalized for the end-user can not be cached at all. This is a
major problem. It puts you in the position of choosing personalization or
performance.

Again, ESI opens up new options for us. Not only can ESI serve up blocks of
content that are cached for different times, it can serve up blocks of
content that aren't cached at all. If you have a 'your favorite articles'
block, that block can be passed through to the web server, even while the page
it's included into is served from the cache.

All you have to do to make this work is what we did above for recentarticles
with the minor addition of setting the cache time to zero. Your users get
personalized content and you still get to have caching.

Turning it on

There's only one thing left for us to do. We need to turn ESI on in our
Varnish cache. This is actually quite easy. If we use the vcl file
we were looking at in our last article, we just need to add a bit of code to
our vcl_fetch routine. At the very top of our vcl_fetch routine we add the
following:

if (obj.http.Content-Type ~ "html") {
esi;
}

This tells Varnish that if the content-type of the data being received from
the web server is some form of html, we should process it with ESI. We need to
put this test in place because we do not want to run esi processing on binary
files. You can expand the content-type match if you want to use ESI in other
files, like CSS or JSON responses.

And that's it. Enable the above in your Varnish config and you get ESI
instantly.

One note I will add is that once you start using ESI, you need to make sure
that you use the s-max-age header instead of max-age. Varnish is smart enough
to pay attention to the various headers for all of the ESI included content.
However, the browser is not aware of the sub-components, so it will obey the
max-age header and cache the entire page. Chances are you don't want that.

Trying it out

One of the more challenging parts of working with ESI is that often you are
doing development in a different environment than production. Not many people
have a cache in their dev environment, and rightly so. There is nothing quite
as frustrating as trying to track down a bug for two hours only to realize you
fixed it an hour and 50 minutes ago, but have been looking at a cached copy
since then. Without Varnish (or some other ESI processor) in front of your dev
site processing the ESI, though, your site will look awfully strange. So what
do you do?

Once again, we are developers, so we build something. In this case, to solve
this problem I wrote an ESI parser in javascript that runs in the browser. You
can place it in your ESI enabled pages within an <esi:remove > block, and
your production site will never be the wiser, but without the Varnish cache in
front you will still be able to do your development.

The code to do this is
here.
I am releasing this code under the GPL, so you are welcome to download it and
use it for your own development. It requires jQuery
to function, which for simplicity is included with the tar file. The tar file
contains the ESI parser module and some example files that show its usage.

Summary

Including ESI in your our development toolkit gives us a lot of options that
we've never had before. It makes it possible to break your application into
pieces that are more manageable and easy to work with. It makes it possible to
avoid the speed for dynamism trade-off.

Again, though, don't take my word for it. Grab the JS based ESI parser
and try it out. It's an easy way to explore ESI with your own app before
committing to a Varnish installation. I'm sure that once you see what it can
do, though, you'll be looking at Varnish again.