A Link: rel=preload Analysis From the Chrome Data Saver Team

Some of us in the Chrome Data Saver (Flywheel) team at Google (Simon Pelchat, Michael Buettner and Tom Bergan) experimented with the new preload directives as a means to improve page load performance for Chrome Data Saver users. We found our initial assumptions about preload to be wrong and decided to share our experiences in the hope they might benefit others. This post summarizes our findings.

For those who don’t have time to read through this rather long post, our conclusion is that:

Preload can greatly improve performance on some pages. We saw time to first contentful paint (TTFCP) improvements of up to 20% in our experiments. However, in some cases preload can degrade performance. We came up with three rules of thumb to use as a starting point when experimenting with preload:

Preload usually works well for doc.written resources.

Preload works better earlier in the page load.

Use the link tag and place it after the resource that will request the preloaded resource.

What is Preload?

The preload keyword (note that this is still a draft RFC) is being added to the Link HTTP header and link HTML element. To quote the spec:

This keyword provides a declarative fetch primitive that initiates an early fetch and separates fetching from resource execution.

The main use cases mentioned by the spec are:

Early fetch of critical resources:Some resources are not discoverable by the browser’s preload scanner. The browser may only learn about them when executing Javascript or loading CSS. At this point, the browser usually needs the resource right away, but will have to wait one round trip time (RTT) before receiving the response. Preload can be used to let the browser know a resource will be needed before it is discovered.

Early fetch and application-defined execution:The preload link element provides async-like semantics for non-script elements. This lets the web page trigger fetches for resources early and apply the resource at a later (application-defined) time.

When Do Preloads Happen?

The spec states that, unlike prefetch, preload is a high-priority fetch that is necessary for the current navigation: the browser is required to fetch the resource. However, the spec leaves the prioritization specifics undefined. In particular, there is no mention of how preload requests should be prioritized relative to other requests the browser will make. This is unfortunate, but understandable, given that request prioritization within a page load is already unspecified, and the prioritization mechanism available varies on the browser and the network protocol used (e.g. HTTP/2 has multiplexing dependency trees, whereas HTTP/1.1 has no multiplexing and a limit of six connections per origin).

Nonetheless, this makes it difficult to use preload. We want to preload resources to promote them earlier in the fetch order, but if we don’t know where they will be promoted to, it is hard to reason about if they should be promoted at all. Since the spec leaves this undefined, we must assume some definition.

We try to use a definition that is as general as possible for most of our examples, so that our analysis remains valid even if browsers change their implementation. For this reason, as well as for simplicity, we mostly limit ourselves to analyzing pages containing only scripts. While this is not realistic, we show that even in this restricted case, preload is hard to reason about and has some non-trivial implications. We have seen these issues on real pages and distilled them to the simplest examples we could think of. Adding more resource types complicates the analysis with browser-specific implementation details.

Our definition is:

preloads from the link HTML element are prioritized so that they will be fetched immediately after the preceding resource in the HTML

preloads from the Link header are prioritized as if they came first in the HTML

We think this definition is reasonable because it seems to be intuitively (though not explicitly) stated by the spec. It also coincides with Chrome’s implementation for the pages that we’re analyzing.

Moreover, we assume resources are fetched sequentially, in priority order. This coincides with what Chrome does when using HTTP/2 over a single origin. Apart from certain resource types like progressive images and videos, this is an optimal bandwidth allocation since resources are only useful once they have fully been fetched.

Finally, although the spec makes no distinction between the Link header and the link element, there is a key difference between them. The latter has a position in the document, which means that it implicitly contains more information than the Link header; i.e., its relative ordering with respect to other resources in the document. However, when enabling preload from a web server or web proxy, the Link header has the advantage of not requiring the server to parse and modify the page’s HTML, which could break the page.

How Chrome Prioritizes Preloads

We note here some details on how a real browser implements preload prioritization. These details will matter on real pages with different resource types when deciding when to use preload.

For resource prioritization, Chrome uses a few priority buckets that are mostly separated by resource type (for example, CSS has a higher priority than images). Among requests in a given bucket, Chrome prioritizes them in the order they are discovered. These buckets are the only difference between Chrome’s implementation and the definition we used above.

When using HTTP/2 over a single origin, Chrome will fetch resources sequentially, from the most important resource (first discovered resource of the most important bucket) to the least important (last discovered resource of the least important bucket).

Preloads from the Link header are discovered in the order that they appear in the header, before any resource included in the HTML. Preloaded resources are otherwise treated similarly to non-preloaded resource and use the same prioritization scheme as mentioned above. Given that preloads are meant for high-priority fetches, this implementation is reasonable.

Page Load Metrics

In this document, we focus on using preload to improve TTFCP. For the most part, our discussion is not specific to improving TTFCP: one could imagine using preload to improve a different metric. However, as we’ll see later, preload is mostly useful early on in the page load, which makes TTFCP a natural candidate to optimize.

Example Where Preload Improves TTFCP

If we add the header Link: rel=preload; b.js; as=script to the HTML response, then preload works as expected and we save one RTT when fetching b.js. The page load waterfall will look something like the following (the blue bar represents TTFCP, filled green space is download time and yellow space is idle network time):

If we do as before and add the header Link: rel=preload; b.js; as=script to the HTML response, then preload actually increases the TTFCP because b.js is fetched before c.js and c.js is the only resource required to trigger first contentful paint.

Preload with Link header

If instead we had not used preload, the waterfall would have been:

No preload

While we did save 1 RTT in the time to fetch all resources, we delayed TTFCP.

The issue here is that while b.js is indeed needed, the browser does not know how to prioritize the request for b.js relative to other requests. From the HTML, it can tell that c.js is needed before a.js, but it has no idea when b.js will be needed.

The Preload Link Element Improves Resource Ordering

Note that this particular problem can be fixed by using the link element instead of the Link header:

This is optimal. The ordering of the link HTML element relative to other resources determines its priority. In many cases, this prioritization is key. Indeed, if we limit ourselves to resources inserted synchronously and executed synchronously, such as doc.written resources, then inserting a link tag after the resource that will need the preloaded resource will always yield an optimal fetch schedule.

Example With Asynchronous Resources

In reality, page loads are much more complex than the above toy examples. Images do not block rendering (i.e. they are “executed” asynchronously), many scripts are asynchronous, and CSS blocks Javascript execution, but not HTML parsing. This is great from a performance point of view, but makes reasoning about the page load process very difficult.

Both a.js and b.js insert a script (a-hidden.js and b-hidden.js respectively) in the document. In turn, a-hidden.js and b-hidden.js both insert text in the body. Assume that a.js and b.js are small (0.5 RTT to download), and a-hidden.js and b-hidden.js are bigger (1 RTT to download). The waterfall will look something like this:

No preload

We can see there is 0.5 RTT where the network is idle when waiting for a-hidden.js. It seems like a good opportunity to use preload. So we preload a-hidden.js and b-hidden.js:

But what if the user had b-hidden.js cached? We ignore the disk cache latency for this example (assuming it is much faster than the network). The original (without preload) waterfall would have been:

No preload, b-hidden.js cached

The waterfall with preload will be:

Preload, b-hidden.js cached

Preloading here will actually increase the TTFCP by 0.5 RTT.

The reason why prioritizing preload requests over other requests can cause performance regressions is that when the rendering process is not fully synchronous and linear, the critical path of a metric is highly unstable. We will explain what we mean by critical path in the next section.

Critical Path

In the context of first contentful paint, we can loosely view the critical path as the set of resources required to reach first contentful paint. This is a simplification of the critical path concept, but this is good enough for our discussion; see the WProf paper for a more precise definition.

The critical path varies based on:

cache state

network conditions: assuming one path has three RTTs, but has fewer bytes, and the other has 2 RTTs but more bytes, the critical path will depend on the relative latency and bandwidth of the connection

performance characteristics: if one path is CPU-bound, but the other is IO-bound (either disk or network), then the critical path will vary depending on the specs of the device or even what other processes are running on the device

scheduling in the browser: the scheduling of requests/rendering/Javascript execution in the browser is nondeterministic and can affect the critical path

In the previous example, a.js and a-hidden.js are the critical path in the case where the cache is empty (a-hidden.js is what triggers first contentful paint and a.js is needed to load a-hidden.js). However, when b-hidden.js is in the cache, then the critical path consists of b.js and b-hidden.js (b-hidden.js is what triggers first paint).

The fact that the critical path is unstable is what makes it difficult to correctly position preload link tags. In our example, we have to guess as to which of a-hidden.js or b-hidden.js will execute first, to decide which path we want to optimize with preload (optimizing one is usually at the cost of the other given that bandwidth is limited). For fully synchronous page loads, the critical path does not change (all resources are always evaluated in the same order), so this is not a problem.

Note that images also result in unstable critical paths because they are asynchronous in their execution. Suppose we have the page

where a.js inserts big.jpg in the body. The critical path for this page is usually a.js and small.jpg. However, by preloading big.jpg after a.js, one would change the critical path to be a.js and big.jpg, which makes performance worse.

Low-Priority Preloads

One way to use preload in such a way that it will never make page loads worse is to use the lowest possible priority for preload requests, until the browser discovers the corresponding resource through conventional means, at which point it can set the correct priority. Note that this implies using a different definition of preload that the one we have been using so far and requires modifying the preload implementation in browsers. This implementation is a middle ground between the current preload implementation, which gives high priority to preloaded resources, and Link rel=prefetch, which is meant for resources that may be needed in a future page load. Unfortunately, there are several issues with such an implementation of preload.

Lower Potential Gains

By only using idle time to fetch preloaded resources, we significantly limit the potential gain from using preload. If the network is busy downloading low-priority resources during the RTT between two important requests, then preload will have no benefit. For example, if a page full of images includes a synchronous script in its head which calls document.write to include another script, then preloading the doc.written script will have no effect on this page: the first script will be fetched, it will trigger a request for the second script, but 1 RTT will be spent downloading images before the response for the second script arrives.

Bufferbloat

While the possible gains are lower, it still seems tempting to implement “low-priority” preloads, since the in theory they cannot negatively affect performance. However, in practice they can negatively affect performance because of bufferbloat.

When there is buffering between the server and the client (this could be a kernel send buffer in the server, or queues in the network), the effective RTT of the connection increases significantly: the time between the moment a server writes a byte and the client receives it is the time it takes to flush the buffers plus 0.5 * RTT.

This means client-directed scheduling is less responsive, and sending low-priority bytes on otherwise idle network time can delay later requests because the server will fill the buffers with low-priority bytes.

For example, while we imagine preloading a low-priority image (preloaded.jpg) would fill RTT 4 as below:

Preload without bufferbloat

In practice, we see something like:

Preload with bufferbloat

The server fills the buffers with the response for preloaded.jpg. When it gets the request for doc-written.js, there is 0.5 RTT worth of buffers to flush. Had preloaded.jpg not been requested, the buffers would have been empty and the client would have received the response for doc-written.js sooner.

Bufferbloat with High-Priority Preload

Note that even if we are not using low-priority preload, bufferbloat is still very relevant to preload.

For example, it is tempting to preload all low-priority requests for which we know the browser will correctly assign a low-priority (e.g. hidden images).

However, in practice, the server may take more time to generate responses for some requests. For example, the server may be a CDN, where images are cached at the CDN and therefore images responses will be ready faster than, for example, a higher-priority Javascript request that needs to be dynamically generated by the origin. This means the server will start filling the network buffers with image bytes until the higher-priority response has been generated, at which point the network buffers will already be full with low-priority bytes.

This problem is not specific to preload, but preloading low-priority resources increases the number of low-priority requests sent to the server, which increases the chances that the server will have a low-priority response ready sooner that high-priority responses.

Somewhat counter-intuitively, preload can also mitigate bufferbloat. Let’s ignore the previous issue we mentioned and assume the server can generate all response immediately. Suppose we have:

Here, the browser requests a.js and c.js first. It then discovers b.js , which is higher-priority than c.js because the latter is asynchronous. This is a Chrome implementation detail: for this example, we are deviating from our simple model where resources are prioritized strictly in discovery order. Unfortunately, the network buffers have been filled with bytes from c.js so it takes some time for the server to switch to serving b.js .

Using preload, we get:

Bufferbloat, with preload

Here, since the browser knows about all three scripts right away and fetches the resources with the correct priority, bufferbloat no longer affects the page load. Therefore, preload saves 1.5 RTT: 0.5 RTT more than is expected in the absence of bufferbloat.

The takeaway here is that bufferbloat can affect page loads in non-intuitive ways. This makes experimenting in realistic conditions all the more important.

A promising solution to the bufferbloat problem is to use a bandwidth-aware congestion control protocol, like the new BBR congestion-control algorithm, which could be used in web servers to avoid filling intermediate buffers in the network.

Resources Inserted by Event Handlers

So far we’ve focused on resources inserted during the page load process through initial script execution or by CSS @imports.

One important use case we’ve ignored is resources inserted by event handlers such as onload and onclick. The issue here is that if one preloads these resources by including link tags in the initial HTML, there is no logical place to put them. It’s not clear how we can tell the browser that a resource will be needed after a certain event handler is triggered.

The best alternative is to either include the link tags at the end of the HTML, or to dynamically insert link tags later on in the page load, but (hopefully) before the event of interest fires. Including the link tag at the end of the HTML can be problematic for certain content types like CSS because, as mentioned earlier, Chrome considers CSS to be high-priority and will fetch the lazy-loaded CSS before most other resource types.

What Should We Preload in Practice?

We condensed our findings into three rules of thumb for using preload:

Preload usually works well for doc.written resources:The benefit of preloading such resources is usually so great that it dwarfs other potential downsides. However, it’s not safe in general to preload doc.written resources that appear late in the page load. For example, they might be at the end of the HTML, where they don’t actually block rendering and they might use bandwidth that would otherwise be used for above-the-fold images.

Preload works better earlier in the page load when things are more synchronous:Early in the page load, browsers are requesting mostly synchronous render-blocking resources, such as CSS and synchronous scripts. At this point, the critical path is more stable. It’s therefore easier to reason about the page load process and it’s easier to use preload without having a negative effect on performance. As the page load progresses, there are more asynchronous requests and the order of loading events is less deterministic, which makes the critical path less stable. At this point it becomes very hard to reason about the page load process and to know what to preload.

Use the link tag and place it after the resource that will insert the preloaded resource:This is a blunt way to give the browser some information on how to prioritize the preloaded request.

Given the complexity of page loads, our rules of thumb are only general guidelines. There are cases where those rules can be violated and also cases where following those rules will degrade performance. When in doubt, measure performance in realistic scenarios, such as with a partially warm cache, and varying network conditions. Even better, measure on live traffic with real users.

Live Experiment

Strategy

We tried automatically inserting Link preload headers to page loads going through Chrome Data Saver. We used the Link header (contrary to our suggestion), because automatically modifying HTML is difficult and can break pages (e.g. a script might assume that the element following it is a <div>, so inserting a link element there might break the page). Instead, to preserve the ordering of the resources to preload relative to non-preloaded resources, our Link header lists resources in the order they are evaluated by Chrome. Chome orders preload fetches in the order they are specified in the Link header. This is roughly equivalent to using the preload HTML element as we have defined it.

Given the conclusions above, we preloaded only hidden render-blocking resources (HRBRs) that block first contentful paint. HRBRs are resources that cannot be discovered by the preload scanner and that block rendering, such as scripts and CSS inserted using document.write and CSS @imports. We discovered HRBRs on pages by cloud rendering and instrumenting them.

Results

We found that 90% of page loads going through Chrome Data Saver correspond to pages that do not have HRBRs blocking first contentful paint. Of the pages with HRBRs, about 65% of the page loads included a request for an HRBR. This means that 35% of the time, all HRBRs are cached at the client (for popular sites, this figure was significantly higher). Therefore, at most 6.5% of page loads going through Chrome Data Saver can possibly benefit from preload. This made it difficult to find a substantial set of pages for which we had enough data to know if preload had an impact.

We selected a set of popular pages that contain HRBRs. Of those pages, there were 22 which had a significant number of loads that requested at least one HRBR per experiment arm (control vs preload). Of these pages, 8 had significant (Mann-Whitney test with p=0.005) improvements in TTFCP (12% on average, from 6% to 16%), and none had a significant degradation in TTFCP.

Conclusion

Preload is a great new tool in developers’ toolboxes. It has a lot of potential for addressing common performance issues. That said, it should not be used indiscriminately. The key to using preload well is to carefully analyze the page load, see which (if any) resources it makes sense to preload, and measure the impact in realistic conditions. Good candidates for preload are important resources that are not discoverable by the preload scanner that will be needed early on in the page load.

Reviewed by Addy, Gray, Yoav and others passionate about loading on the web.