get_blogs_of_user() can be very slow when a user is a member of thousands of sites

Description

When a user belongs to thousands of sites, the array of $blogs can become rather large. Considering get_blogs_of_user is called in many places on every page-load when a user is logged in, Some optimization should happen here. Not counting the multisite admin pages, get_blogs_of_user() is called from the functions is_user_member_of_blog(), get_dashboard_url(), and WP_Admin_Bar::initialize(). I propose the results of the function to at least be stored to a static variable to prevent multiple lookups to the same information. I also propose that a pre-fetch filter (pre_get_blogs_of_user ?) be added so that a plugin can override that information and provide something more efficient if the need arises.

I recognize this area will not get much attention as it is pretty much an edge-case, but the proposed filter is the least invasive method and will allow custom handling for this situation. Would love to get some more eyes/feedback on this ticket.

I would also love to see this filter implemented as the default WP admin bar calls get_blogs_of_user() twice (once directly and once indirectly) AND admin-header.php calls it indirectly as well. That's a huge number of queries when a user has lots of sites.

The use of the static means this isn't unit testable when the function is re-called with a different user ID, and thus causes the tests in Tests_Multisite_User to fail. The static could be an array keyed by user ID, maybe.

If the pre_get_blogs_of_user filter returns an empty array because the user doesn't have any blogs, the rest of the processing in the function still runs. The empty() check should probably be removed.

Thanks for the feedback John. 31746.2.diff has been modified to 1) keep a keyed array of user blogs and 2) defaults the original pre_get filter value to null (which follows a pretty common precedent among other pre_ filters), and only check if the result of the filter is an array. The tests in Tests_Multisite_User now all pass, and I've added an additional test for the pre_get filter.

I think the patch proposed here is a good interim measure, it at least allows for hooking in to modify the number of blogs returned and some temporary caching of results.

However long term, I think it'd be better to introduce paging to this. I realize there would be a number of implications to any ui that uses this function but that's something that needs improved anyways (for user's that legitimately _belong_ to a large number of blogs anyways).

The good thing about this patch though, is it helps makes it possible for a feature plugin to be built that tests out using paging and what that affects.

Not sure off the top of my head. Someone would need to take a good look at the ways a user is added to and removed from a site. remove_user_from_blog() and add_user_to_blog() exist, but I'm not sure if there would be other places to tap into. cc @jeremyfelt

Blog membership is determined by caps, which is stored in wp_x_capabilities in usermeta. Invalidation could happen at a low-level, in WP_User->add_role()set_role()remove_role() etc, or even by intercepting update_option() calls if we really wanted to cover all possible points.

Alternatively, we could try to avoid calling get_blogs_of_user() in the places @jtsternberg mentioned in the original post.

There still would need to be a replacement for what its called for. Long term, there still needs to be something to allow get_blogs_of_user() to be more performant. The main reason why this function is so slow is because of this line:

$keys = get_user_meta( $user_id );. All info about the blogs a user belongs to is inferred from data used to store capabilities for a user on a blog AND thus ALL user meta has to be retrieved in order to grab that data. Then each meta key/value is LOOPED through and an additional query is done to get the blog details (querypalooza). This means (contrary to my earlier suggestion) efficient paging can't be done.

Long term, there needs to be a better way to indicate the blogs a user belongs to that can be

easily queried.

can be paged.

A couple ways (so far) that I can think of for doing this are:

Option 1: Track the blogs a user belongs to via its own user_meta key/value pair.

Each blog a user belongs to is a record with a common prefix (maybe user_blog => {$blog_id}). With that, the query for getting all the blogs a user belongs to can be simplified and pages (and even to some degree, joined with the blog table.

Option 2: Add a new table

Another way of accomplishing a more performant way to get the blogs a user belongs to is to rework how capabilities (gasp another core wp system) are done a bit so capabilities are tracked in a separate table called user_capabilities where we have:

user_cap_id | blog_id | user_id | capabilities

On single site installs blog_id will always be 1 and this table will have little information. However on multi-sites this table because immediately more useful because its more scalable. It serves as both a join table for user -> blog and tracks the capabilities the user has on that blog. Feasibly this could also be used to track user information that is specific to blog.

Wrap Up

Option one is definitely the quicker win because it requires less code changes elsewhere. However, option 2 gives us the benefits of less data stored in the database, more scalability, and clearer representation of the data. The tradeoff of course is the impact on the capability system (ugh). The ONLY reason I suggested using this for storing capabilities is because it doesn't make sense to create a brand new table just for joining blogs to users (although there's benefits to doing so to) and IF we create a new table, we should try to think of other problems this new table could help solve.

Just thought of an Option 3 add a blog_id column to the user_meta column. Then instead of having to prefix any meta_key specific to a blog you just put the blog_id in its OWN column. Then any queries retrieving the blogs a user belongs to can just grab distinct blog ids.

I guess I'm biased, but I really wish we would push the patch through. It's A) helping with performance by bringing those queries down to 1 per page load, and B) provides a filter whereby we can do alternate performance feasibility testing (new table, caching, etc). The patch provides an instant performance improvement on systems where users are part of many blogs, and provides an 'off' switch on systems where users belong to too many sites for it to be a useable UI.