Saturday, February 5, 2011

Chrome's 10 Caches

While defining the set of page load time metrics that we think are most important for benchmarking, Mike Belshe, James Simonsen and I went through a seemingly simple exercise: enumerate the ways in which Chrome caches data. The resulting list was interesting enough to me that I thought it worthwhile to share.

When most people think of "the browser's cache" they envision a single map of HTTP requests to HTTP responses on disk (and perhaps partially in memory). This cache may arguably have the most impact on page load times, but to get to a truly stable benchmark, we identified 10 caches that need to be considered. An understanding of the various caches is also useful to web page optimization experts who seek to maximize cache hits.

HTTP disk cache

Stores HTTP responses on disk as long as their headers permit caching. Lookups are usually significantly cheaper than fetching over the network, but they are not free as a single disk seek might take 10-15ms and that doesn't include the time to read the data from disk.

The maximum size of the cache is calculated as a percentage of available disk space. The contents can be viewed at chrome://net-internals/#httpCache. It can be cleared manually at chrome://settings/advanced or programmatically by calling chrome.benchmarking.clearCache() when Chrome is run with the --enable-benchmarking flag set. Note that for incognito windows this cache actually resides in memory. source

HTTP memory cache

Similar to the HTTP disk cache, but entirely unrelated in code. Lookups in this cache are fast enough that they may be thought of as "free."

This cache is limited to 32 megabytes, however, when the system is not under memory pressure the effective limit may be higher due to its use of purgeable memory. Conversely, when multiple tabs are open, the limit may be divided among the tabs. It is cleared in the same manner as the HTTP disk cache. source

DNS host cache

Caches up to 100 DNS resolutions for up to 1 minute each. It is somewhat unfortunate that this cache needs to exist in Chrome, but OS level caching cannot be trusted across platforms.

A unique and important optimization in Chrome is the ability to remember the set of domains used by all subresources referenced by a page. Upon the next visit to the page, Chrome can preemptively perform DNS resolution and even establish a TCP connection to these domains.

Compilation can be an expensive step in executing JavaScript. V8 stores compiled JS keyed off of a hash of the source for up to 5 generations (garbage collections). This means that two identical pieces of source code will share a cache entry regardless of how they were included. source

SSL session cache

Caches SSL sessions to disk. This saves several round trips of negotiation when connecting to HTTPS pages by allowing the connection to skip directly to the encrypted stream. Implementation and limits vary by platform, as an example, when OpenSSL is used, the limit is 1,024 sessions. source

TCP connections

Establishing a TCP connection takes about one round trip time. Newer connections also have a smaller window so they have a lower effective bandwidth. For this reason Chrome keeps connections open for a period in hopes that they can be reused. This can be thought of as an in-memory cache.

Connections may be viewed at chrome://net-internals/#sockets and cleared programmatically by calling chrome.benchmarking.closeConnections() when Chrome is run with the --enable-benchmarking flag set. source

Cookies

While not usually thought of as a cache, this is web page state which is persisted to disk. The presence of cookies can have a large impact on performance. They can bloat requests and change how the client and server behave in limitless ways.

While currently only used by Google Search, the SDCH protocol requires a shared dictionary to be downloaded periodically. A performance hit is taken infrequently to download the dictionary which makes future requests much faster. source

I hope you found this as interested as I did. Please let me know if I left anything out.

Edit: Will Chan points out additional caches for proxies, authentication, glyphs and backing stores. Proxy and authentication caches were intentionally omitted because they aren't relevant to our benchmark, however glyphs and backing stores are two additional things we need to consider. Thanks!