What if performant cross-shard transactions is a red herring, and the thing that we should be looking more into is reliable automatic data colocation to avoid performing cross-shard transactions as much as possible? There's decent amount of academic research around this, with projects like SWORD [1] and Schism [2] that study shard load balancing as a problem of hypergraph partitioning. It seems like this might be worth incorporating into commercial distributed database projects.

Edgestore's API is set up to shepherd users into good collocation patterns by default, and a lot of work over the past year or two went into improving collocation and educating users about best practices. The collocation efforts were actually orthogonal to implementing cross-shard transactions, but they were obviously very beneficial.

Thanks -- just curious, am I correct to interpret this to basically mean that thus far, the performance of the system basically relies on users to explicitly define colos nicely within their application-level data model?

For some reason this reminds me of something like the entity group concept in Google's Megastore [1].

There is somewhere between where we are today and a completely uncollocated free-for-all where the system would fall over. There's a separate axis of the rate at which users request transactions (with locks) versus non-transactional writes using optimistic concurrency control that would come into play sooner. Our guidance is therefore that users try to reserve transactions for when there's correctness critical reasons why two objects need to be updated together and rely on asynchronous primitives to handle "eventually consistent" mutations.