Well, we could give up on regret bounds and instead just consider algorithms that asymptotically approach Bayes-optimality.

I am not proposing this. I am proposing doing something more like AIXI, which has a fixed prior and does not obtain optimality properties on a broad class of environments. It seems like directly specifying the right prior is hard, and it’s plausible that learning theory research would help give intuitions/models about which prior to use or what non-Bayesian algorithm would get good performance in the world we actually live in, but I don’t expect learning theory to directly produce an algorithm we would be happy with running to make big decisions in our universe.

Yes, I think that we’re talking about the same thing. When I say “asymptotically approach Bayes-optimality” I mean the equation from Proposition A.0 here. I refer to this instead of just Bayes-optimality, because exact Bayes-optimality is computationally intractable even for a small number of hypothesis each of which is a small MDP. However, even asymptotic Bayes-optimality is usually only tractable for some learnable classes, AFAIK: for example if you have environments without traps then PSRL is asymptotically Bayes-optimal.