Don’t Cache WP_Query Objects

WP_Query is one of the most complex classes in the WordPress codebase. It’s extremely powerful and flexible, but that flexibility often results in slower database queries, especially when working with metadata. To speed things up, WordPress developers tend to cache the results of such queries, but there are a few pitfalls you should be aware of.

Caching Queries

The Transients API is likely the top choice for caching WP_Query results, but what we often see is developers storing the whole WP_Query object in a transient, and we believe that’s a mistake. Translating that into code, it looks something like this:

Here you can see that we’re passing the $query object directly to set_transient(), so whenever we get a cache hit, we’ll have our query object available, along with all the useful WP_Query properties and methods.

This is a bad idea, and while this works (or at least seems to work), you’ll want to know what’s happening behind the scenes when you call set_transient() in this particular case.

Serialize/Unseriazile

By default, transients in WordPress translate into the Options API. If you’re familiar with how options work internally, you’ll know that the values are serialized before hitting the database, and unseriaziled when retrieved. This is also true for most persistent object caching dropins, including Memcached and Redis.

As an example, just look at what happens when we serialize a small object in PHP:

This allows us to store the object, along with all its properties, as a string, which works well in a MySQL table, in a Redis database, etc. When deserializing (or unserializing) such a string, the result is an identical copy of the object we previously had. This is great, but let’s consider a more complex object:

The first thing you’ll notice is that the output is extremely long. Indeed, we’re serializing every property of our WP_Query object, including all query variables, parsed query variables, the loop status and current position, all conditional states, a bunch of WP_Post objects we retrieved, as well as any additional referenced objects.

Whoops! But that’s not all. That wpdb object we’re storing as a string in our database will contain our database credentials, all other database settings, as well as the full list of SQL queries along with their timing and stacktraces if SAVEQUERIES was turned on.

The same is true for other referenced objects, such as WP_Meta_Query, WP_Tax_Query, WP_Date_Query, etc. Our goal was to speed that query up, and while we did, we introduced a lot of unnecessary overhead serializing and deserializing complex objects, as well as leaked potentially sensitive information.

But the overhead does not stop there.

Metadata, Terms, Posts & the Object Cache

Okay so now we have a huge serialized string containing the posts that we wanted to cache, along with a bunch of unnecessary data. What happens when we deserialize that string back to a WP_Query object? Well, nothing really…

When deserializing strings into objects, PHP does not run the constructor method (thankfully), but instead runs __wakeup() if it exists. It doesn’t exist in WP_Query, so that’s what happens — nothing, except of course populating all our properties with all those values from the serialized string, restoring nested objects, and objects nested inside those objects. It should be pretty fast, hopefully much faster than running our initial SQL query.

And after we’re done deserializing, even though at that point the WP_Query object is a bit crippled (serialize can’t store resource types, such as mysqli objects), we can still use it:

while ( $query->have_posts() ) {
$query->the_post();
the_title();
}

Which doesn’t cause any additional queries against the wp_posts table, since we already have all the necessary data in the $query->posts array. Until we do something like this:

The Object Cache

When running a regular WP_Query, the whole process (by default) takes care of retrieving the metadata and terms data for all the posts that match our query, and storing all that in the object cache for the request. That happens in the get_posts() method of our object (_prime_post_caches()). But when re-creating the WP_Query object from a string, the method never runs, and so our term and meta caches are never primed.

For that reason, when running get_post_meta() inside our loop, we’ll see a separate SQL query to fetch the metadata for that particular post. And this happens for every post. Separately. Which means that for 10 “cached” posts, we’re looking at 10 additional queries. Sure, they’re pretty fast, but still.

Now let’s add something like the_tags() to the same loop, and voila! We have another ten SQL queries to grab the terms now.

And finally… This is the best part. Let’s add something often done by a typical plugin that alters the post content or title in any way:

Now we’ll see an additional ten database queries for the posts. How did that happen? Didn’t we have those posts cached?

Yes we did, but we had them in our $query->posts array, and get_post() doesn’t know or care about any queries, it simply fetches data from the WordPress object cache, and it was WP_Query‘s job to prime those caches with the data, which it failed to do upon deserializing. Tough luck.

So ultimately, by caching our WP_Query object in a transient, we went from four database queries (found rows, posts, metadata and terms) to only two (transient timeout and transient value) and an additional thirty queries (posts_per_page * 3) if we want to use metadata, terms or anything that calls get_post().

To be fair, those thirty queries are likely much faster than our initial posts query because they’re lookups by primary key, but each one is still a round-trip to the (possibly remote) MySQL server. Sure, you can probably hack your way around it with _prime_post_caches(), but we don’t recommend that.

The Alternatives

Now that we have covered why you shouldn’t cache WP_Query objects, let’s look at a couple of better ways to cache those slow lookups.

The first, easiest and probably best method is to cache the complete HTML output, and PHP’s output buffering functions will help us implement that without moving too much code around:

This way we’re only storing the actual output in our transient, no posts, no metadata, no terms, and most importantly no database passwords. Just the HTML.

If your HTML string is very (very!) long, you may also consider compressing it with gzcompress() and storing it as a base64 encoded string in your database, which is especially efficient if you’re working with memory-based storage, such as Redis or Memcached. The compute overhead to compress/uncompress is very close to zero.

The second method is to cache post IDs from the expensive query, and later perform lookups by those cached IDs which will be extremely fast. Here’s a simple snippet to illustrate the point:

Here we have two queries. The first query is the slow one, where we can fetch posts by meta values, etc. Note that we ask WP_Query to retrieve IDs only for that query, and later do a very fast lookup using the post__in argument. The expensive query runs only if we don’t already have an array of IDs in our transient.

This method is a bit less efficient than caching the entire HTML output, since we’re (probably) still querying the database. But the flexibility is sometimes necessary, especially when you’d like to cache the query for much longer, but have other unrelated things that may impact your output, such as a shortcode inside the post content.

Profile

Caching is a great way to speed things up, but you have to know exactly what you’re caching, when, where and how, otherwise you risk facing unexpected consequences. If you’re uncertain whether something is working as intended, always turn to profiling — look at each query against the database, look at all PHP function calls, watch for timing and memory usage.