Michel wrote:This was based on a pure implementation of your method though. If I understood correctly you keep track in your computation of the number of accepted moves at each ply. I don't know how this affects the result.

Well, that makes a large difference, because it effectively changes the process into deciding which of the moves you pick, rather than whether you pick a move. So the variance of the single probe will become the intrinsic variance of the sub-tree sizes. While in the unnormalized case it would be the variance of the sum of N draws with 1/N acceptance probability, which is N times the variance of a single draw with distribution

So we get var(x) + (1 - 1/N)*E^2(x), which for large N means the variance get dominated by E(x). So if the frontier nodes have 20-30 moves, i.e. SD = 5, E(x) = 25, the unnormalized result has a 5 times larger SD than the normalized result, for the ply leading to them.

I think it is easier to resolve this issue by running tests. Can you run a test one with your default acceptance probability of 1/32 ,and another with Peter's modification of selecting exactly one random move and multiply by the number of legal moves? The test should use a fixed time control selected in such a way that it is equivalent to the time it takes you to run 100 mil simulations. One test starting from depth=0 and another one from depth=5. Is that alright ?

What is wrong with calculating the weights based back propagated mean and variance instead of back propagating the weights directly (unless I misunderstood you) ?

Nothing. Back propagating the weights seemed more elegant. But apparently hard to do for a directed graph which is not a tree.

Yes it is very difficult since it has a square root term in there. I am relying on the central limit theorem when I back propagate variances. May be there is some thing that can be improved there so I am re-reading cg2006 paper of Remi.

No this is not relevant. The back propagated weight of a node can be taken to be the sum of
the weights of the children. Then the leaf nodes are selected with the correct frequencies
(this is not quite the same as recalculating the weight from the back propagated
mean and variance, but it works).

The problem is that a node may have several parents and _all_ the parents should be updated. But you have arrived at this node from a particular parent. So you don't know what the other parents are.

No this is not relevant. The back propagated weight of a node can be taken to be the sum of
the weights of the children. Then the leaf nodes are selected with the correct frequencies
(this is not quite the same as recalculating the weight from the back propagated
mean and variance, but it works).

The problem is that a node may have several parents and _all_ the parents should be updated. But you have arrived at this node from a particular parent. So you don't know what the other parents are.

Well that is the second problem you have for updating weights _directly_. If you start breaking down the formula like updating (mean^2 + variance) instead of its square root, why not update mean and variance directly and do what you want with it later? Well we agree you need a tree data structure since we have multiple moves at a node. Any other doesn't make sense.