21:02:48 <wm-labs-meetbot> The meeting name has been set to 'e169_rfc_meeting__psr_6_cache_interface_in_mediawiki_core'

21:03:05 <addshore> *waves*

21:03:11 <SMalyshev> oh nice. I was wondering why nobody's here :)

21:04:08 <addshore> So. The outcome I would personally like from the RFC is to have a cache Interface to be used within MediaWiki core. I don't personally have strong opinions of what it should be / look like but I feel we should of course consider PSR6

21:04:54 <robla> #info <addshore> So. The outcome I would personally like from the RFC is to have a cache Interface to be used within MediaWiki core. I don't personally have strong opinions of what it should be / look like but I feel we should of course consider PSR6

21:05:11 <robla> #link https://phabricator.wikimedia.org/E169

21:05:12 <DanielK_WMDE> addshore: you mean an interface different from BagOStuff?

21:05:23 <robla> #link https://phabricator.wikimedia.org/T130528

21:05:32 <DanielK_WMDE> addshore: there's also the more recent WANObjectCache

21:05:44 <addshore> So, the issue that I ran into that started me off looking at this is that BagOStuff is not an Interface

21:05:48 <Scott_WUaS> Hi All

21:06:02 <bd808> addshore: does that matter?

21:06:14 <bd808> it is our common (and crappy) cache abstraction

21:06:33 <DanielK_WMDE> addshore: that's easy to fix. Just mkae it an interface, and rename the current implementation to bag=Stuff base or AbstractBagOStuff

21:06:36 <addshore> bd808, yes. I ran into issues while just trying to extend bagOfStuff to make a Taggable / Indexed BagOfStuff

21:07:19 <addshore> DanielK_WMDE: indeed, but we currently have 3 different caches with different appearances in core, so tieing them together (at least a bit) would probably make sense?

21:07:31 <DanielK_WMDE> addshore: one important querstion is - should we try to cover all zuse cases with a single interface (or hierarchy of interfaces)? Or should we have different interfaces for different use cases?

21:07:35 <SMalyshev> reading through the ticket and critique of PSR6, it looks to me like it's way overdesigned, and I don't see clear advantage to what we have now

21:07:44 <bd808> I explained the things I explictly dislike about PSR6 at https://phabricator.wikimedia.org/T130528#2245415

21:08:17 <robla> #info <bd808> I explained the things I explictly dislike about PSR6 at https://phabricator.wikimedia.org/T130528#2245415

21:08:39 <DanielK_WMDE> addshore: i'm not sure... if things look the same, people expect them to behave the same. the current BagOStuff system makes basically no guarantees, and there is no way for code to check that the object thea have meets the local requirements

21:08:43 <SMalyshev> I think having unnecessary cache object API just to be able to cache nulls is an overkill

21:09:32 <DanielK_WMDE> bd808: you kind of convinced me that this is an issue, yea

21:10:01 <bd808> switching from memcache to apc is not just a matter of changing the backend

21:10:05 <DanielK_WMDE> addshore: i'm all for improving the caching infrastructure and interface. but i'm not clear on what the best way is.

21:10:22 <addshore> yeh, right now its sounding like the best way is to not ;)

21:10:26 <DanielK_WMDE> so we need tagged caching. i *really* want tagged caching. Do we want a TaggedbagOfStuff interface?

21:11:12 <SMalyshev> Maybe. Since PSR-6 doesn't seem to address tagging anyway, that may be the way to go

21:11:46 <TimStarling> what is tagged caching?

21:11:46 <robla> one question we seem to be debating is do we a) need a better caching abstraction than BagOStuff? or b) is the status quo fine or c) should BagOStuff be deprecated, and not replaced with anything?

21:12:18 <TimStarling> in what sense is PSR-6 better than BagOStuff? it has far fewer features

21:12:26 <SMalyshev> my concern is that we don't seem to get any added value from PSR6 and since we'd need to extend cache model anyway, we'd better extend the devil we know (namely BagOfStuff)

21:12:29 <bd808> they want to have a way to purge N cache entries that are related somehow

21:12:44 <bd808> e.g. all cached things that are dependent on Q12345

21:13:02 <SMalyshev> TimStarling: tagged caching is basically assigning each item 0..N tags and being able to purge all items that have specific tag

21:13:11 <SMalyshev> (I hope that's what DanielK_WMDE meant :)

21:13:12 <DanielK_WMDE> TimStarling: tags = buckets

21:13:45 <TimStarling> but PSR-6 doesn't provide that either

21:14:11 <addshore> No, I think at this stage the consensus is to ignore PSR6

21:14:57 <SMalyshev> more restricted case is when you have hierachical tags i.e. you can have cache purge by prefix. It's a subset of tagging cache but may be easier to implement

21:15:03 <TimStarling> memcached doesn't have tagged caching does it?

21:15:09 <TimStarling> or redis?

21:15:24 <TimStarling> is the proposal to switch to some other server which would support this feature?

21:15:24 <DanielK_WMDE> addshore: ok, so rfc rejected ;) shall we repurpose the meeting to discuss alternatives / next steps? what are the needs we have?

21:16:11 <TimStarling> if we have an external library that requires PSR-6 then it would be fairly easy and harmless to add a PSR-6 adaptor for BagOStuff

21:16:32 <DanielK_WMDE> TimStarling: my impression was that there are psr-6 based implementations of tagging, but i didn't investigate. but discussing this use case in particular was probably not the intention of the rfc.

21:16:49 <TimStarling> but the idea of deprecating BagOStuff in favour of PSR-6, I think that is rejected

21:16:58 <addshore> TimStarling: indeed, that was my first plan, but the library I was planning on using turned out to not be great

21:17:11 <DanielK_WMDE> TimStarling: +1

21:17:39 <psychoslave> The topic was related to https://phabricator.wikimedia.org/T91162, right?

21:17:46 <robla> does my earlier proposed breakdown make sense? "a) need a better caching abstraction than BagOStuff? or b) is the status quo fine or c) should BagOStuff be deprecated, and not replaced with anything?"

21:17:53 <addshore> for the library see https://github.com/php-cache/taggable-cache and issue see https://github.com/php-cache/issues/issues/52

21:18:53 <DanielK_WMDE> addshore: so... we want tagging. and we want bagOStuff to be an actual interface. and we want to clarify the relationship between the different caching systems/interfaces we have (ProcessCacheLRU, BagOStuff, WANObjectCache...)

21:19:04 <addshore> DanielK_WMDE: yup

21:19:08 <bd808> addshore: I think you are going to have that class of problem with any backend that doesn't natively support lock for update

21:19:43 <DanielK_WMDE> i'd also love to see bd808's concern addressed: how do we distinguish between, say, memcached and hhvm's built-in cache? Calling code shouldn't know about the concrete implementation, but it *should* know about the totally different eviction behavior.

21:19:52 <TimStarling> when you say you want it, do you mean you want some crappy simulation of it like we have in ParserCache, JobQueue etc.?

21:20:32 <robla> DanielK_WMDE: do we need an abstraction?

21:20:59 * robla is concerned that DanielK_WMDE is racing off to abstraction-land when it's not clear that's wanted/needed

21:21:55 <bd808> How about we back up to the actual problem that needs a solution?

21:21:58 <addshore> My initial rough draft of some sort of indexed bagostuff can be seen at https://gerrit.wikimedia.org/r/#/c/278908/ indexing based on the parts that a key is constructed with and ekeping the same interface as bagostuff, but this failed due to having to extend the base rather than just implement something

21:22:35 <addshore> My plan was to use this in the WatchedItemStore to allow more caching as cache purging would actually be possible on mass

21:22:38 <bd808> right now this is a solution in need of a problem (or 3) to become a useful utility

21:22:50 <DanielK_WMDE> robla: all of programming is an abstraction :) but i'm indeed trying to figure out whether we want more abstraction of caching, or perhaps less.

21:23:32 <robla> #info question discussed: a) need a better caching abstraction than BagOStuff? or b) is the status quo fine or c) should BagOStuff be deprecated, and not replaced with anything?

21:23:47 <DanielK_WMDE> addshore: so, the use case driving this is indeed just tagging/buckets? i thought thjat was just the one i'm interested in ;)

21:23:51 <TimStarling> addshore, the usual way to do mass purging is to have a single key which invalidates the collection, which you check on fetch

21:23:59 <SMalyshev> I think we're still not clear which exactly problem we're solving. If it's tagging, then I'm not sure we know how we want to do it yet, do we?

21:24:26 <robla> #info question discussed: what problem was this RFC trying to solve?

21:24:57 <bd808> Agree with TimStarling that indirect keys are a common solution to mass invalidation assuming the cached data has a TTL based cleanup when abaindoned

21:25:21 <DanielK_WMDE> TimStarling: what happens when that key gets evicted=

21:25:43 <bd808> all the things it keyed are "lost"

21:25:44 <TimStarling> when the collection key is evicted? the whole collection is invalidated obviously

21:25:58 <TimStarling> but it never is because it's the hottest key in the collection

21:26:08 <TimStarling> since it is accessed as often as all the others put together

21:26:19 <DanielK_WMDE> if the algorithm is LRU or otherwise decent

21:26:32 <DanielK_WMDE> if it's FIFO, you lose

21:26:42 <DanielK_WMDE> but yea, i see the point

21:26:58 <TimStarling> this is how it is done in the parser cache, and the job queue has a similar concept of "root job"

21:27:21 <SMalyshev> so I think if we had an API implementing this as simple operations get/set/delete it may be helpful

21:27:23 <TimStarling> and the message cache has a similar concept of a hash key which invalidates a local store

21:27:24 <DanielK_WMDE> yea, it would be nice to generalize this a bit

21:27:35 <DanielK_WMDE> the way this is implemented in ParserCache is a bit... arcane.

21:27:48 <TimStarling> well, the use case is very special

21:28:18 <TimStarling> the code is only as complex as the reality it models :)

21:28:23 <DanielK_WMDE> also, this doesn't allow me to enumerate the stale keys, right? so i can't actually purge, or update

21:28:34 <TimStarling> right

21:28:54 <bd808> a cache that allows you to enumerate keys is a very special type of cache

21:29:04 <TimStarling> all available operations are O(1) which probably helps programmers to implement efficient code

21:30:18 <TimStarling> I called it a crappy simulation of tagged caching because invalidation is not quite the same as deletion

21:30:25 <DanielK_WMDE> though it's no guarantee ;)

21:31:06 <TimStarling> specifically, invalidation doesn't create free space in the cache, it relies on later eviction to make free space

21:31:45 <TimStarling> deleting invalidated keys would be better than LRU eviction since those keys are guaranteed to not hold useful data

21:31:56 <SMalyshev> I wonder what happens with sharded cache and collections... Would that mean whereever the collection key is stored, everybody talks to that place?

21:32:01 * robla put a list of questions in https://phabricator.wikimedia.org/T130528#2265793

21:32:25 <TimStarling> SMalyshev: yes

21:32:43 <DanielK_WMDE> TimStarling: so, the wikidata use case is triggering RefershLinks for all pages affected by a change on wikidata.org. We are currently doing this using a tracking table in the db. but the tracking table is effectively for tracking parser cache entries, and may need to be updated when things get rendered on the fly (causing a master insert in a GET request).

21:33:00 <DanielK_WMDE> for that use case, we need enumeration

21:33:06 <SMalyshev> that may create some undesired imbalances... not sure if it's a real problem or imagined

21:33:28 <TimStarling> DanielK_WMDE: you could use redis

21:34:45 <TimStarling> SMalyshev: memcached is pretty fast, in the olden days of 100 Mbps network connections we used to saturate ports by accident but it's not such a problem now

21:34:57 <TimStarling> a single memcached server can handle a very hot key

21:35:40 <DanielK_WMDE> anyway - i didn't want us to get hung up on that use case specifically

21:36:08 <DanielK_WMDE> #info <TimStarling> addshore, the usual way to do mass purging is to have a single key which invalidates the collection, which you check on fetch

21:36:40 <TimStarling> yeah, read the redis manual, it is quite nice, and several things are already using it

21:37:01 <TimStarling> you can always build a huge abstract hierarchy on top of it if it makes you feel better

21:37:11 <DanielK_WMDE> hehe, not really :P

21:37:48 <DanielK_WMDE> just a thin lyer or two... ;)

21:38:23 <addshore> okay, well, I think that may wrap everything up?

21:38:25 <SMalyshev> speaking of which, so do we want a generic tagging implementation doing this collection key thing?

21:38:44 <DanielK_WMDE> addshore: do you have everything you need to continue? are you happy with the meeting?

21:39:02 <DanielK_WMDE> SMalyshev: i'm for it :D

21:39:09 <addshore> as am I

21:39:26 <TimStarling> I would be happier with an implementation that has each key in a single collection

21:39:27 <SMalyshev> so that seems to be one meeting outcome/todo

21:39:30 <TimStarling> instead of 0..N

21:39:49 <TimStarling> until we actually have a use case for tagging

21:39:59 <SMalyshev> TimStarling: we could start with that... I didn't look into use cases

21:40:08 <TimStarling> assuming that is enough for addshore

21:40:37 <bd808> We have 1..N use cases for Varnish cache invalidation but that is a different beast

21:40:52 <TimStarling> for Daniel's use case he just needs to use redis :)

21:40:54 <bd808> And I think bb.lack is already thinking about that

21:41:23 <bd808> LADD my_thing

21:41:32 <addshore> would that work in the case of watched items? I would want to purge all items for a user while also being able to purge all items for a page?

21:42:35 <addshore> wait, yes, I guess :)

21:42:47 <TimStarling> addshore: by purge you mean make the cache key go away?

21:43:11 <addshore> yes ;) so essentially it would be 2 sets of collections

21:43:23 <TimStarling> yes, it would work I think

21:43:36 <bd808> s/LADD/LPUSH/ (memory fail)

21:43:56 <addshore> hmm, but perhaps in that case If I invalidate the colleciton key for say the Berlin article said key could also remain in some user collections?

21:44:50 <TimStarling> addshore: where is your use case documented again?

21:46:42 <DanielK_WMDE> TimStarling: the skin shows whether user U watches current page X. We don't want to hit the DB for that, so we cache. When the user (bulk?) edits their watchlist, we need to purge that.

21:46:56 <DanielK_WMDE> not sure when we'd need to purge all waicthes for a page, though... when moving it?

21:47:09 <addshore> DanielK_WMDE: when the notification timestamp is updated

21:47:13 <addshore> on edit

21:47:23 <DanielK_WMDE> ah, right, the timestamp.

21:48:57 <addshore> While looking at all of this was well I looked at the usage of the caching (which was carried over form User) https://grafana.wikimedia.org/dashboard/db/mediawiki-watcheditemstore and the level of cache hits is very low really IMO, thus another option could be just remove this particular caching? https://gerrit.wikimedia.org/r/#/c/286892/

21:51:15 <addshore> DanielK_WMDE: perhaps, to increase the level of caching, some form of tagging is needed, or to reduce complexity we can just remove it causing roughly a 10% increase in the slave db hits to retrieve watched item detials

21:51:19 <robla> addshore: were those links an answer to TimStarling's question about use cases?

21:51:33 <TimStarling> robla: probably nowhere

21:51:39 <bd808> addshore: a HashBagOStuff cache is just an in process cache for the duration of a request

21:51:47 <addshore> bd808: yes

21:51:53 <TimStarling> I think that is the answer

21:53:16 <robla> #info addshore agrees with bd808 that a HashBagOStuff cache is just an in process cache for the duration of a request

21:53:22 <DanielK_WMDE> #info watchlist caching might be worth an rfc on its own.

21:53:32 <bd808> addshore: you don't need to worry about invalidation at all in an in-process cache

21:53:34 <addshore> bd808: It can't cache over processes without being able to invalidate stuff, in process caching is just what has been taken from User

21:53:37 * bd808 is confused

21:53:43 <bd808> ah

21:54:03 <addshore> bd808: yes, but to actually make the caching do something usefull it needs to not be in process and thus you need to be able to invalidate things!

21:54:25 <bd808> a 10% hit rate on an in process cache is actually not bad. the cost of said cache is really really small

21:54:46 <TimStarling> I guess I need to do a review of WatchedItemStore

21:55:03 * robla notes we're coming up on the end of the hour. RFC is declined; rest of the discussion is about cache abstraction

21:55:23 <addshore> bd808: I'm quite surprised about the hits at all, I was wondering why the caching was in User