Garbage collect transients

Description

Per an IRC discussion and long-considered changes:

In wp_scheduled_delete(), look for expired transients and purge them.

On DB upgrade, purge all transients, regardless of expiration. (This should probably happen on auto-update as well, but let's let that part slide for now, as DB upgrade will run on all major releases these days.)

Should transients also be purged when an object-cache.php file is dropped in? Or at least should a task be scheduled which could do so? This way when someone eventually adds a custom object cache they are not stuck with hundreds of transient records in the database (even non-expired but now no longer used ones).

Is there anything we can do to help progress on this issue? It just bit me - I found my options table had bloated to 250,000 entries, after just 2 months of the site being live.

David

I can't answer as to when it will be looked at, but can I assume you've tried my plugin in the meantime (link above)? This will, for the time being, perform the same processing and keep the entries cleared.

I'm looking to implement a further option to report on which plugins are causing it. Technically they're not doing anything wrong but there may be an advantage to them re-thinking their caching strategy (as I've been doing with my own).

Yes - I'd seen the plugin and was about to deploy it. The code using transients was my own! I've used transients as expiring access-tokens on many and various sites. I thought it was great that I *didn't* then have to spend any time thinking about cacheing strategy since WordPress had core functionality to do that for me.

Deploying a plugin is little effort, of course. But having this in core seems like a no-brainer - is there any good reason to *not* garbage collect expired transients?

Yes - I'd seen the plugin and was about to deploy it. The code using transients was my own! I've used transients as expiring access-tokens on many and various sites. I thought it was great that I *didn't* then have to spend any time thinking about cacheing strategy since WordPress had core functionality to do that for me.

Deploying a plugin is little effort, of course. But having this in core seems like a no-brainer - is there any good reason to *not* garbage collect expired transients?

Not that I'm aware of.

I'm not a Core developer so not in a position to do it myself. However, I know Andrew (Nacin) is keen to get this included - I guess it's just a case of time and priorities.

The Australia Post API responses are cached using transients, and the cache key(s) depend on the volume/dimensions/destination of each customer's cart contents, which means there can be quite a few different transient cache keys on each store.

Garbage collection of expired transients would be a fantastic addition I think.

Expired transients should probably be removed using either delete_option() — akin to how get_transient() removes an expired transient, rather than announcing it with delete_transient()'s hooks — or since we are already doing straight database calls, with a DELETE. I don't think there is a need to do this noisily. It's kind of like something silently vanishing from a cache backend.

Expired transients should probably be removed using either delete_option() — akin to how get_transient() removes an expired transient, rather than announcing it with delete_transient()'s hooks — or since we are already doing straight database calls, with a DELETE. I don't think there is a need to do this noisily. It's kind of like something silently vanishing from a cache backend.

I totally agree. I was using a simple selection of the transient by SQL in my plugin but using delete_transient to then remove them - amongst other things this isn't very efficient at removing large numbers of un-housekeep transients. I'm coding a change this week to replace it with a single SQL to achieve it.

Having worked on the aforementioned plugin to provide this functionality can I highlight two concerns that need to be considered?

As a user has mentioned above some people have a LARGE number of expired transients. Can I suggest that whichever package this is implemented in should perform a total wipe of transient data even if a database upgrade is not due? Although it will only affect the first run, performance does need to be considered - a single SQL statement (or as few as possible) if at all possible.

I have also experienced issues with orphaned transients - where one of the transient record pairs has either gone missing or, due to truncation of the name, are now mis-matched. This will mean that the regular housekeeping is likely to miss these but I would suggest that the database upgrade 'clear all' should handle these. In my case I've, basically, performed a delete on the options table for all '_transient_' records - it's rough and basic but is needed to ensure that all transients truly are removed.

Deleting all transients, regardless of expiration? Won't that bite a lot of people?

e.g. In UpdraftPlus (​http://wordpress.org/plugins/updraftplus - highest rated backup plugin), a transient is used to track the job status. A backup job may need several attempts to complete (e.g. if it's creating a big zip file). Potentially a backup job may be half-complete when the transients get cleared. Potentially then I'll get dozens or hundreds of support requests (since the plugin has an install base of ~60,000) ("hey, what happened to my nightly backup job last night?") every time a new WordPress release causes a database upgrade and wipes out the job mid-progress.

I think any and all issues that boil down to "this needs to rely on transient because..." is up for refactoring. Transients are inherently designed and designated to be caching API, which explicitly cannot be relied on.

Between API design and different cache backends there are many things that can flush transient out there in the wild. Flush as rare as core upgrade is relatively rare and insignificant event for them.

I totally see what you're saying, but isn't the reason why this ticket has come to exist (and garnered so much interest) that lots of programmers are using the transient API simply as generic temporary data storage (without respect to the documentation stating that its intention is for cacheing?).

If programmers were only using it for cacheing, then I think the problem of huge amounts of un-cleared, expired transients wouldn't exist (or at least, not to the extent it does). If the ticket is responding to that on-the-ground situation, then the solution seems like the nuclear option so that programmers learn about the officially supported uses of the transient API the hard way.

i.e. if the transient API is not meant to be used for generic temporary storage, then this ticket should just be ignored, as it deals with an officially non-existent problem. If on the other hand we recognise that it *is* widely used that way, then the solution (killing off lots of data, a few times a year) is much worse than the problem (excessive growth in database size).

The issue is much like users keeping important files in recycle bin (which I actually encountered in real life :) It happens, but you cannot make decisions for everyone based on some people doing things that are Really Bad Idea.

If programmers were only using it for cacheing, then I think the problem of huge amounts of un-cleared, expired transients wouldn't exist (or at least, not to the extent it does)

You make assumption here that caching use cases do not produce transients garbage. There is no basis for that. Garbage produced comes from faults (or natural limits) in implementation, not the functional intent of that implementation (be it right or wrong in regards to API purpose).

My "or at least, not to the extent it does" was intended to acknowledge that there is transient garbage from caching use cases. But, from following the discussion of this issue since the start of this year, that doesn't seem to me to be where the interest in the bug is coming from.

The Recycle Bin analogy suggests that people using the Transient API for non-caching uses are novices or a bit stupid. That's not my experience from where I've seen people discussing the Transient API on mailing lists. Basically up until this point, "transient API = only for cacheing" has been a reality only in the documentation, not in the vast majority of implementations (e.g. unless you deploy memcached or another very specialised use-case, in which you'll know what you're doing). The patch for this bug enforces that reality in a way that will cause a lot of pain for programmers that I think is unnecessary. Sticking to principles has its place, but in the real world I foresee a lot of trouble from this patch.

Your experience might indicate that. Mine doesn't - as per link at start of thread I have been looking into the issue since before this ticket existed and had never got impression of rampant non-caching use cases for transients.

The solution is in line with API's intended and documented purpose. The alternative is to ignore the issue with API to cater to use cases which aren't in that line. I would find that detrimental to quality of API, not beneficial.

One thing to remember about transients is this - They're stored in a object cache if one is available, external object caches are not guaranteed data storages, set an expiration of an hour, and it may expire in 5 minutes, or 30 minutes, but it'll definately expire before, or on, the expiration time specified. If no expiration is specified, it may last 5 seconds, an hour, or maybe a year, it's entirely up to the object cache as to when it purges the transient data.

Anyone using a transient to store data about their plugin, or, to track when actions took place are _doing_it_wrong() and should be using options instead of transients, of course, this doesn't mean that people understand this and do the right thing, add any API and people will abuse it in weird and amazing ways.
That's the entire problem here, people have been abusing the transient API by expecting that their transients will be cleaned up, by using dynamic transient names, by setting no expiration (which really means, never expire this, but i've seen used as "Clean this up next time you come through"), and never cleaning up after themselves, or re-using the same transient. It's ended up with some people with thousands of unused transients, and others with hundreds of transients each with many 100K's of data in each (so suddenly MySQL is sending over 1MB of data to WordPress on every page view).

Something has to change, and as a result, plugins that expect their transients to always exist are probably the ones that have to break, they're using it wrong and negatively affecting others

Transient data is transient. Even if an expiration time is attached to cached data, it exists under the expectation that the data will disappear at any time and should therefore have a method available to immediately regenerate data or provide temporary data until it has been regenerated.

There are so many possible object caching configurations in the wild right now, any attempts to treat transients as anything other than an unreliable* and temporary data store are likely failing already.

Everyone seems to misunderstand how transient expiration works, so the long and short of it is: transient expiration times are a maximum time. There is no minimum age. Transients might disappear one second after you set them, or 24 hours, but they will never be around after the expiration time.

I'd suggest that we need to spread the word about this, as I've seen a lot of people think that expiration is used for both a minimum and maximum, then find out their plugins are breaking with object caches. (And I've spoken to experienced developers that believe this too.)

On db upgrades the delete process will have to be hard and simple.
When it comes to general garbage collection, only expired transients will be removed.

Plugins that uses transients for anything but caching (of data that can and will be regenerated in a few seconds when the transient is no longer available), are already broken (will malfunction).

Data stored in memory, as many persistent object caching systems, may disappear any time. If you're just a bit unlucky they disappear just before you planned to use them. So this rare "issues" that someone might experience from the latest patch is a good thing in the long run.

I think everyone on this ticket can understand the intended purpose of transients - to store data that is capable of being regenerated.

The issue I'm seeing is that because of the state of the documentation in Codex, many coders have come to use them another way. If you read through ​http://codex.wordpress.org/Transients_API, then you can easily come away with the understanding that the expiration time is the *maximum*. The fact that transients can (and should be expected to) expire at *any* time is implicit/in the background, and can easily be missed. If you read the "Overview" section through, trying to read it as if it spread first one view, then the other, you'll see what I mean. dd32 says that people are "abusing" the transients API. That's correct - but I contend that the documentation has led, and is still leading, people into that.

Note that I'm not saying that WP should change it's intended approach to transients. What I am suggesting is that:

a) The documentation needs changing to make the real intention clear. A second aspect of this is that the documentation for the Options API also needs clarifying. As dd32 says, people *should* be using the Options API for this. But again, this is easy to miss. If you read the summary of the Options API, you'll get the idea that it is for... options. The fact that it's actually the intended API for all kinds of stored data (including 'transient' data, where the transience is intended to be effected with a *maximum* expire time, and that the Transients API should *not* be used in that case), is something rather subtle - it is likely that only a seasoned WP developer will realise this, after some time and experience.

b) In the light of that, I am suggesting that WP should handle the change in the patch applied in this bug more gradually. The concern has been expressed that the handling of transients during a DB upgrade should be 'clean and fast' - preferably in one SQL command. In fact, as see in David Artiss's plugin, it is possible to clean expired transients (only) whilst using only one SQL command; here it is:

Thus, it seems to me that the main reason why people want to enforce "delete all transients" does not exist: it can already be achieved another way, without adding complexity. Look at the pros and cons on either side: yes, 'delete all transients' is in keeping with the true purpose of the Transients API, and will force programmers to learn that fact very quickly. However, I contend that it's too quick. Developers will realistically only wake up to this once the strange and hard-to-reproduce bug reports come in for their already-existing applications. Given that the current version of WP doesn't delete any expired transients (unless they are accessed), switching suddenly to force-deleting all transients in 3.7 instead of just the expired ones will give significant real-world pain for little gain. Much better to just delete the expired ones, whilst educating programmers about the intended purpose of the transients API over a period of time, before switching to deleting them all at a later stage once developers have had time to adapt (if it's felt that it is beneficial to enforce the purpose of the API). That way, you get all the gain without the pain.

David, I get your point of how the documentation might have lead some devs to use the Transients API in other/"wrong" ways. Those devs' problems will then however not so much come from [25416], but from external object caches being used on more and more servers.
Due to that, those plugins are heavily prone to breaking now already, so that there's no point in being extra conservative with that garbage collecting.

​Per IRC discussion, in which I asked: "Should we consider a slightly more conservative different approach to transient garbage collection? There is just one valid argument I've heard against this: we do not want updates to be blamed for breaking sites." —

For 3.7, we're going to go with deleting only expired transients, and we're going to do this in the same location (so, on every DB upgrade).

We also need to make sure we clear site_transients from the sitemeta table for networks. Forgot about that.

I've seen a few different SQL queries proposed to do this. We'd like to find the cleanest and clearest ones. It shouldn't lock up or crash the tables, and it should be safe to run on massive options tables.

I missed IRC, but I think this is the wrong decision. I agree we should not risk breaking sites by deleting unexpired transients, but I cannot see any reason (in the chat or on this ticket) as to why this can only be done on a db upgrade, rather than as a (weekly?) scheduled cron, particularly if an external object cache is being used.

If a plugin author sets an expiry date on a transient, he should not have to do garbage collection himself, nor wait three months or more for a database upgrade.

Deleting expired transients through a cron job will not break sites (and the more often it's done, the less likely it is to fail through a timeout).

Unexpiring transients, that happens to be stored in the database, may be garbage collected every few minutes. When stored in memory, plugin authors have absolutely no guarantee they will be there when retrieved, some those plugins/sites are broken by their own nature.

Garbage collecting expired transients should run often. On DB upgrades all transients should be flusehd, just like a cache flush may happen whenever reasonable to regain storage space, in memory or on disk.

The point that upgrades should not be blamed, rightfully or not, for breaking things, or adding weight to things already broken, is though a valid concern. But if that is important, then core should let this be plugin territory for now. Either you dare or you don't. Only doing expired transients is to have chickened out, at least for 3.7. Punt. Issue a warning in the release, that 3.8 will do it, brutally.

I'm cool with attachment:20316.diff​​. It took me a little while to parse how the query worked, so it'd be worth including a comment explaining what it's doing.

The other way of doing this (if making the query simpler was the goal) would be to have a query to find all the transients we need to delete, then generate the DELETE query in PHP. That felt fairly fragile, would result in a bunch more code, and is probably slower, so I think the single query DELETE is the best way to go.

Reverts [25416], which had all transients being cleared. This leaves much to be desired, but we don't want a core update to be blamed for breaking a site that incorrectly assumes transients aren't transient.