An infographics entry in the New York Times about the distribution of a commute between two arbitrary subway stations in New York City, including a comparison of the distribution of a similar (?) commute by Tube in London. Showing that in most cases, the tail is thinner in London than in New York City. (Warning: the comparison may switch scales.)

Here is a bit of an outlier:

given that the two distributions hardly overlap and still share a similar median commute time!

A paper by Au and Beck (2001) was mentioned during a talk at MCqMC 2018 in Rennes and I checked Probabilistic Engineering Mechanics for details. There is no clear indication that the subset simulation advocated therein is particularly effective. The core idea is to obtain the probability to belong to a small set A by a cascading formula, namely the product of the probability to belong to A¹, then the conditional probability to belong to A² given A¹, &tc. When the subsets A¹, A², …, A constitute a decreasing embedded sequence. The simulation conditional on being in one of the subsets is operated by a random-walk Metropolis-within-Gibbs scheme, with an additional rejection when the value is not in the said subset. (Surprisingly, the authors re-establish the validity of this scheme.) Hence the proposal faces similar issues as nested sampling, except that the nested subsets here are defined quite differently as they are essentially free, provided they can be easily evaluated. Each of the random walks need be scaled, the harder a task because this depends on the corresponding subset volume. The subsets themselves are rarely defined in a natural manner, except when being tail events. And need to be calibrated so that the conditional probability of falling into each remains large enough, the cost of free choice. The Markov chain on the previous subset can prove useful to build the next subset , but there is no general principle behind this remark. (If any, this is connected with X entropy.) But else, the past chains are very much wasted, compared with, say, an SMC treatment of the problem. The paper also notices that starting a Markov chain in the set means there is no burnin time and hence that the probability estimators are thus unbiased. (This creates a correlation between successive Markov chains, but I think it could be ignored if the starting point was chosen at random or after a random number of extra steps.) The authors further point out that the chain may fail to be ergodic, if the proposal distribution lacks energy to link connected regions of the current subset . They suggest using multiple chains with multiple starting points, which alleviates the issue only to some extent, as it ultimately depends on the spread of the starting points. As acknowledged in the paper.