Magento 2 Caching Overview

Web accelerators (caching) can make a Magento installation deliver a more responsive user experience with less hardware. A famous quote however is “There are only two hard things in Computer Science: cache invalidation and naming things” – Phil Karlton. Yes, caching was mentioned first! Magento 2 has included a lot of changes from Magento 1 to improve its built in support for caching, making it easier to deploy caching and take some the complexity out of caching for developers. This post provides a high level summary of the new caching strategy.

What is Caching?

First, a quick summary of the basics of HTTP caching. A HTTP GET request to fetch a page has headers that say how long a returned page can be trusted as being up-to-date. This allows a web browser to save a copy of the page so if the user comes back later, the page does not have to be downloaded again. This is also true for images on a page (each image typically triggers a separate HTTP request). It is also true for all the JavaScript and CSS files on a page. So one page for a customer may be the result of many HTTP requests, and each returned asset might have different caching requirements.

It was not long before people realized that as well as caching in a web browser, caching is also useful at the server side as the same request might come from different users. For most pages containing dynamic content (e.g. content from a database), cache is faster than hitting your main web server. This means you can serve more requests per second by using your cache. Every so often you should get a new copy of the page from the real web server and put that into your cache to keep the served content fresh, or discard old content that has not been accessed to reclaim cache space for other pages.

Using a web accelerator such as Varnish is one approach to caching. You can put a copy of Varnish in your data center next to your web server and it will reduce the traffic that hits your web server allowing increased traffic.

Content Delivery Networks (CDNs) support another form of caching, “edge caching”. Akamai is one of the well-known CDN providers. The idea is to reduce network latency by getting the cache closer to the user’s web browser. With Varnish you would have only one cached copy of the page (in your data center). With a CDN your content can be cached in multiple locations around the world to get your content that bit closer to the end user. (Note: Varnish itself can be used at the edge in a CDN solution. In this post I only talk about Varnish being used in the data center to take load off the web server.)

It is common for a site to combine both forms of caching, the importance of which may become more apparent later when it comes to cache invalidation strategies below.

How Long to Cache For?

So how long should content in a cache last for? There is no single answer to this question. One approach is to cache pages for a very long time (days, weeks, years) so returning users get better performance. Another approach is to cache for a very short time (minutes or seconds). Say what? Cache for minutes or seconds? Well, for high volume sites with hundreds of requests per second, a cache even for seconds can still reduce the traffic by an order of magnitude or more. The short cache lifetime also means you don’t need to worry about how to invalidate the cache – it will expire by itself in a few seconds anyway.

So how long to cache for on your site? If you are a Merchant, this is where I would point you at finding a good partner to work with! Remember, “there are two hard things…”

What Can’t You Cache?

Hey, caching sounds great, you get more done with reduced costs and better performance. So what can’t you cache? Examples on a commerce site include the customer’s name (if they have logged on) and the number of items in their shopping cart. This will be different for each user (it is private to that user) – it cannot be cached in a shared (public) cache.

Edge Side Includes (ESI) are one technology to address this. An edge side include is where a web server returns an HTML page that can be cached, but where parts of the page are replaced with an “include” reference (URL) that will return just the content for that part of the page (such as the customer’s name). Using ESI there is reduced load on the web server – most of the heavy work is cached, with HTTP requests for small parts of a page where the content needs to be different per user.

If a page has 10 edge side includes, though, it may end up slower. There is an overhead per HTTP request even if the request is quick for a web server to respond to. It requires careful thought and tuning to get the right number of includes on a page to make sure performance is optimized.

Magento 2 and Private Content

Magento 2 includes improved caching support built in. It however has chosen not to use ESI for private content (content specific to one user, such as their name). Magento 2 does use ESI in some circumstances, but for a different use case. This is really important to sink in – it is a common source of confusion. When you see ESI in Magento 2, it is not for private (uncacheable) content. More on this later.

What Magento 2 does instead is to fetch a (typically) 95% complete page and then rely on JavaScript and AJAX to inject the last 5% of user specific content onto the page. For example, the skeleton page might leave the user name area on the page empty. There are several benefits of this approach:

A single AJAX call can fetch all the private user content instead of one request per part of a page to be replaced (as is done with ESI). This can reduce the number of HTTP requests.

The private content is also cacheable by the web browser. A customer’s name is not likely to change for example, so why not keep it in the web browser cache and avoid future AJAX calls?

There is the question of how to refresh the private content cached in the web browser (for example if a customer adds an item to their cart then the ‘number of items in cart’ will change). This is addressed by flushing the web browser cache every time a HTTP POST request is made. HTTP POST is how you send a form or do some action on the site. So if a user browses around a site reading pages they will just be doing GET requests which can be completely cached. If they do a POST (e.g. clicks a button to add an item to their cart) then the cache in the web browser will be flushed and an AJAX call will be done to fetch an updated copy of the private content (so details such as the number of the items in the cart will be updated).

(Please note: I have left all sorts of details out here for simplicity. For example, different pages may have different private content, so there is complexity about what private content needs fetching and how to cache it. I leave that level of detail for the official documentation.)

Magento 2 and Public Content

So what about public content? This is where I again refer back to the “two hard things” statement at the top of the post. It would seem simple to cache public content wouldn’t it? The problem is no content really lives forever. So let’s consider some different types of content.

Images tend to be long lived. It is generally safe to cache them a long time. If you want to change a product image, give it a different URL.

Details on a specific product might be able to be cached fairly well as product details do not change so often (e.g. the description). But when the details do change, how to flush the caches to get the new details to site?

What if available quantity of an item is added to a product page? Each time the product was sold, the product page would need to be refreshed.

What if a page has two blocks of somewhat expensive content to generate (maybe a category page with a merchandising block on the side that uses some expensive algorithm). If the category page hid products out of stock maybe you want to update that part of the page without re-computing the merchandising data.

The basic building blocks supported by Magento 2 caching allow different approaches to cater for the above use cases.

Public cacheable content can be returned with tags for use by the cache (such as Varnish). Tags hold identity information, such as the product number of the product(s) shown on the page. If an administrator updates a product, it can then send Varnish a PURGE request based on the tag to tell it to “flush all pages containing this identity (pages with this tag) from your cache”. This allows selective cache invalidation, instead of wiping the whole cache (which would trigger a large spike on the server).

Different content can be returned with a different Time To Live (TTL) value. This can be used in combination with ESI requests so that different parts of a page can be cached with different lifetimes.

Remember first that in Magento ESI is only used to cache shared content, not private content for a specific customer. So the use of ESI is not to embed private (uncacheable) content on a cacheable page, but instead to allow different parts of a page to be cached for different lengths of time. This is really only of benefit if there are different parts of a page worth caching separately. Otherwise it is simpler to just cache the whole page and regenerate it when required.

Magento 2 Cache Implementation

Right now there are two forms of caching supported in Magento 2. This may be refactored a bit to improve the modularity (e.g. make it easier for other caches to be supported via extensions), but the current two approaches are a built in cache and support for an external Varnish instance. The Varnish cache will give superior performance and is recommended for production usage. The built in cache is mainly included for developers to use in their personal development environment. (This is not to say a small site could not take advantage of it, but the benefit of caching on a low volume site is lower.) You could enable both caches if you wanted to, but there is little reason to believe the built-in cache will deliver much improvement beyond what the external Varnish cache would catch.

Form Keys and Caching

In Magento 1 “form keys” were introduced for added security against Cross Site Request Forgery (XSRF) attacks. This involved putting a random (secret) hidden string into a form so the web server can verify that POST requests come from an HTML page it returned. This however meant caching of pages with form keys was unsafe. (The form key must be different per retrieved HTML page.)

Magento 2 also supports the concept of form keys but using a new approach such that the returned HTML page does not need to include a random string. This makes the form cacheable again. I am not going to go into the technical details here – the only point you need to know is caching of HTML pages with forms is possible with Magento 2 out of the box without missing out on the additional security provided by form keys.

Conclusions

Hopefully this post gives a useful overview of how caching in Magento 2 has improved over Magento 1. It has been reworked based on lessons from Magento 1 (such as from form keys), as well as improving the cacheability of content (such as private content) compared to techniques such as ESI.

For those with previous ESI experience, it is however important to realize that ESI in Magento 2 is not used for hole punching of cached pages (it is not used to embed user specific private content into a returned page, which ESI is normally used for). It is to allow finer grain control over when to recomputed parts of an expensive page. It may be common for sites not to use ESI at all as the base caching support may be sufficient.

Share this:

Like this:

Related

22 comments

Another downside of ESI is that they tend to get processed synchronously. And even if they were to be processed asynchronously the sessions end up getting locked and so each request would end up getting processed synchronously anyway. My personal testing showed that more than 2 ESI entries per page tend to make the page slower. ESI is a great concept but, like salt in Pierogies, is best used judiciously.

I’ve talked with some Varnish guys on the Varnish summit in Frankfurt in October. They are aware of this issue and plan to enhance the ESI capabilities to process ESI requests in parallel. With this you could even generate ESI blocks by different backends which should remove the issues you currently see completely.

Thanks Alan for the great overview. We are using Varnish with ESI since a couple of years and learned many things how to use it best. One thing you should consider is that ESI requests can be cached as well in Varnish. It allows to cache private data inside Varnish and insert it for the right person only (e.g. welcome message) which reduces requests to the backend server. Also there are non-private, reusable ESI blocks that can be shared for more than one user like “Last Viewed Products” or even the mini cart block. The trick here is to dynamically add the GET parameters to the ESI url in vcl_resc to get a unique hash for the ESI content.

I’m a big advocate for using ESI over AJAX. With a hot Varnish cache you won’t have a backend hit a for a long time. With the AJAX approach you often have a backend request for every page view and visitor. Also the page content may flicker because of the content inserted after the AJAX request is complete.

Great info Alan! A question about Varnish caching vs. built-in caching. With built-in, if I update the price of a product, the catalog page is updated right away. With Varnish caching, if I update the price, the catalog page does not update. I have to wait for the varnish cache to expire.

1. You mention that “Public cacheable content can be returned with tags for use by the cache”. Is this not built into Magento 2 currently? i.e. will Magento selectively auto flush the Varnish cache when public content is updated? Or is this something that has to be added into Magento?

2. For the built-in cache, is the flush of the data selective when a price is updated (just affecting those pages where the product appears), or is the whole FPC flushed / invalidated?

What you describe is not what should happen. Magento is meant to issue a PURGE request to Varnish to immediately flush it from the Varnish cache using tags. (There should be a tag including the product number or similar.) Could you check your configuration, and if you think it is right please raise a GitHub issue to get it checked. This is the whole purpose of tagging content. The built in cache is meant to act just like Varnish to make it easier for developers and to make moving from a built in cache to external Varnish a painless experience.

Raising it on GitHub is best. I know the overall design, but not every precise configuration details. Fastest to interact directly with the relevant engineers via GitHub. My guess is some configuration setting is not right and the PURGE requests are not reaching Varnish. But the fact that you cannot tell its misconfigured to me is also a “bug”.

I expected it to purge Varnish, but it does not seem to. As far as I can tell I have everything set up correctly. I see the tags in the http response headers. But the purge does not seem to take effect. vcl file being used that Magento generated, standard varnish startup:
/usr/local/sbin/varnishd -P /var/run/varnish.pid -a :80 -T 127.0.0.1:6081 -f /usr/local/etc/varnish/varnish.vcl -s malloc,1G

Alan, it seems Varnish purge is an Enterprise only feature. In CE, only the vcl is provided. There is no support for specific flushes using Varnish as the cache option. Seems odd built-in FPC supports auto tagging / flushing in CE, but Varnish does not. At least it solves my “banging my head against a wall” attempts to make it work. 🙂

As mentioned by Robert (and confirmed here https://github.com/magento/magento2/issues/948#issuecomment-70563048) purging Varnish cache is only available in Magento Enterprise addition. As a Magento developer, I find this very disappointing! I get it, you want to make money from the software but excluding a core feature like this is just insulting, when the community is helping find bugs, test code, submit patches and more, in order to help make the next version of Magento a success. I hope eBay re-thinks this!

Here’s a comment from Yoav Kutner (founder of Magento and former employee of eBay) which sums up the situation.

“I have learned eBay and the folks at X.commerce don’t really understand the meaning of open and have a hard time explaining and defining it to them selves and to others.”

Thanks for the feedback. We make no apologies that we charge for some features. It funds platform development. The platform is open so people can write and plug in their own extensions. I am chasing up internally exactly what is going where and will update this post appropriately.

Regarding openess I think you will find the M2 code base now under ebay is significantly more open than it has ever been in the past, including prior to the ebay purchase of Magento. I think that was an old comment and not reflective of where we are today. Ebay has made and is continuing to make improvements here.

Thanks for the reply, I understand leaving out features is necessary in order to make the Enterprise edition more appealing to the big wigs, but I strongly feel this is a feature that belongs in the code base for everyone to benefit from, not just the businesses with deep pockets.

I can only think of one reason why eBay would not provide the ability to purge the cache in the Community edition. So they can market this feature and attract large businesses wanting the best performance for their website. Which is fine, but has eBay weighed up the negative impact this might have if they were to exclude it?

If this segmentation continues and the devoted Magento developers see past the smoke and mirrors, would it be a fair assumption that they would loose interest, stop contributing ideas, improvements and bug fixes (E.g. https://github.com/magento/magento2/issues?q=varnish)?

While I have moved on from Wordress (to bigger and better thing aka Magento) I have great respect for what Matt Mullenweg has been able to achieve. I have no doubt someone could create a better blog/cms platform but why haven’t they? Because WordPress has an unbelievable community and support behind them!

I just ask that you think carefully about the features you leave out. Offering live page editing on the front-end makes sense as an Enterprise feature, being able to purge Varnish cache is not.

Just to follow up, the Varnish partial cache invalidation support is planned to go into CE, subject to resource availability. It is not there yet, but we have been discussing internally and do agree it makes sense in CE. Not a guarantee, I cannot give the date yet, but that is certainly is the intent.

If the Varnish module (as a replacement for EE’s FPC) is part of the CE in M2, it does not make sense to offer it without purging. You can’t put a caching layer in front of a ecommerce platform and don’t provide a utility to clean it’s cache.
Imagine a simple CMS page that a store owner can’t change in the frontend because the hosting administrator isn’t available to execute “varnishadm ban.url .” on the shell… That’s a basic use case everyone using Varnish has to deal with. And the implementation of the purging is neither a mystery nor is it hard to implement as M2 already “tags” all pages.

If it is a matter or resources we are happy to contribute purge capabilities for M2 as we already have that in our PageCache powered by Varnish module, which is already quite sophisticated.

Thanks for the feedback. I am using it in some internal discussions going on.

Ignoring that for a moment, I have heard of high traffic Magento 1 sites getting significant benefit from Varnish with just using a say 10 second TTL on the cache. If you get enough traffic then caching for a few seconds could still reduce your PHP web server load by a factor of 100. (Obviously this depends on the traffic level you are getting within the cache expiry time.) For this sort of site, the Merchant would never need to purge the cache – they just wait a few seconds.

For a low traffic site Varnish does not need to reduce the load on the server. There is by definition not much load in the first place! I believe here the desire for Varnish caching support is to improve the customer experienced response time for a page. Very roughly the server response time is say 1/3 of the time for user experience at present, so having Varnish might deliver say a 1/6th speed up in user perception of the page load time. Other features we are looking at including CSS and JS merging I think will have a greater impact (reducing the number of files to be downloaded) on the user’s perceived page load time. We got some great feedback recently that we are feeding into this design work at present.

Oh, and let me know if I have missed a point in the above. I would like to know if I am missing something!

Yeah, I can agree with you Alan!
I have seen small TTL values for Varnish cache on many projects (range between few seconds and few minutes).
In most of the cases product information don’t get changed very often (every X minutes), so the small TTL makes sense for such projects.

Yeah, I can agree with you Alan!
I have seen small TTL values for Varnish cache on many projects (range between few seconds and few minutes).
In most of the cases product information don’t get changed very often (every X minutes), so the small TTL makes sense for such projects.