I haven't read your posts in detail yet (I am traveling today) but it is true that I have no heuristic to balance exploitation versus exploration. I have to
look into the UCT approach for this.

In my (proposed) method exploitation amounts to using the weights in the known subtree (and making them closer to optimal along the way).
Exploration amounts to splitting leaf nodes (selected by their estimated
defect). This is an "investment".

Your comment that splitting a leaf node in my method seems to lose information from prior visits is true. I have to look into that as well.

But before I do anything more I have to make an implementation.

Sorry I didn't intend the second comment for your method only. The investement is for any splitting method including UCT as well. If my measurement is correct I lose about 20% for that. I have tried to combine them together but the result could be bad depending on the parameters I choose. If you still want to proportion the probabilities with node ratio , the second formula gives exactly what you need. However I am now beginning to think taking log for the nodes count may be more appropriate so trying the following instead.

The idea is the tree size follows log-normal so taking log gives variances on summation of branching factors may be better. That is what I did when trying methods which try to estimate perft from BF estimate. First there are variances for each BF which increase from ply 0 to ply N, then I _added_ those variances to get the total variance. I am getting better result with this method after 85 iterations so far. Will post when it finishes..

Results after 100 million simulations for proportional to sub-tree size method. The uniform allocation gave slightly better result. It could be luck though.

Michel wrote:The reason I do not want to use UCB or UCT without proper justification
first is that these algorithms were designed for somewhat different problems.

In the multi armed bandit problem you ultimately want to find
the best arm to pull. This is a maximization problem with incomplete
information. The performance metric used is the "regret".

In perft you want to compute a sum with incomplete information.
I think here the performance metric is (cpu time)*variance.

I think it is similiar. Here also you want to pick the best node which will lead to the highest reduction of variance. And if it didn't there is regret. So quite similar. Anyway my point was even after eliminating UCB completely, and using a formula that we thought would reduce variance greatly i.e

I didn't get favourable result. It is even worse than uniform sampling. Why? That is the best dynamic allocation so far isn't it? It exactly allocates visits to each node proportional to their size at each iteration and yet is worse than uniform sampling. Maybe variance is not proportional to sub-tree size after all?
UCB allocates visits exponentially more to the best selected node. That is the only difference with the other formulas I provided.

This last one gave the best result for a non-splitting type of simulation.
But splitting adds more complexity infact I get the best result with the UCB formula with splitting. But I do not want to discuss that before we understand how the nodes should be proportioned given _exact_ sub-tree size for each move. I can even take the exact values from the perft(12) computation and construct proportions but I think they will exactly match with the first method I have which adjusts the probabilities to match the current sub-tree ratio estimates.

You can see the UCB formula is very similar to the rest except that it adds an inverse of the second term instead of subtraction as the rest do. But the effect is the same.

Edit:
I see that all the methods I came up with have in fact a formal name "probability matching variants" which is used for solving MAB problems.
These are methods which choose levers (moves) according to a certain probability distribution. The uniform sampling is I think the simplest of these methods. Now I have no doubt that it is indeed an MAB problem with the reward being a reduction in variance. Since that was difficult to quantify I was using sub-tree size ratio to estimate possible rewards.

Uniform selection now with _tree splitting_ seems to be good too. Even better than UCT did. Here is the result after 14 million iterations.
As you can see the amount of wasted nodes can reach as high as 40% when a lot of splitting is done at a given iteration.Edit: Despite promising results it ended up worse than its non-splitting counterpart with about 20% wasted simulations and expansion upto depth 4. All moves are probably expanded by the same amount.

Tree splitting is introducing more puzzles. The best performing formula
so far for the non-splitting version was log-proportioning followed by uniform.
But for tree splitting uniform performs the best and by a huge margin at that.
Second comes regular UCT with Constant = 2. Here are the results

The last two are orders of magnitude away from the true value after 30 million simulations.
I think I will go back to non-splitting version now and determine the best resource allocation
scheme i.e visit proportions.

Here is a summary in case you are lost in the dispersed posts. The result with tree splitting seem bad when I do many iterations (100 mil) compared to doing it at the root only. Also surprisingly tree size proportioning does the worst in both cases.

It has become the new number one for the non-splitting version. And the worst for the splitting version!!
I don't know what splitting is introducing that I can't grasp. The variance propagation is properly done as the summed variance seem to follow sub-tree sizes in general..

I am glad to see how Shawul's UCT run and also new Adam Hair's estimate based in estimating BF with polynomials and logarithms suit in my interval. I have worked a little on my 'method' and I discovered that is a little better than I expected! It can also estimates Perft in even cases of n (at first I thought it only could estimate odd values of plies). I have adjust another polynomial for even values, because the 'odd - even effect' is brutal when I work with my alpha - beta couple of parameters. The idea is splitting odd and even values of n (I think I said that some weeks ago) and also Reinhard exposed this effect with his inverse interpolation.

First I worked with cubic functions of alpha and beta in both even and odd values of n; secondly, I switched to a quartic function in alpha and beta (odd case) when maintained cubic function for alpha and beta (in the case of even values of n). The results were incredibly similar (sure: less than 1% of difference between cases). I had no access to true known Perft values so I did the estimates in a clumsy way: I remembered all the Perft except Perft(10), which I remembered it was around 6.9352e+13, and I did the calculations (with a normal scientific calculator) assuming Perft(10) ~ 6.93525e+13 (trying to reduce the difference). I have rounded the final values because it is too messy giving all the numbers when they are not exact, because of the issue involving Perft(10). The estimates up to Perft(20) are:

These estimates are like a copy of Labelle estimates, but all the math took me some hours (also adjusting by hand the polynomials, but it was fast) and the ways for reaching these numbers are completely independent.

It is amazing for me how similar are my values. BF's are almost the same as Labelle ones, and always my estimates are a little less than Labelle ones. These results are the arithmetic averages of the results given by alpha and beta estimates. They also have error bars that I have omitted, but there were smaller in even cases than in odd cases. Besides, error bars grow when n grows: Perft(14) to Perft(20) (only even values of n) gave error bars from ~ ± 0.0053% to ~ ± 0.0566%; Perft(15) to Perft(19) (only odd values of n) gave error bars from ~ ± 0.05% to ~ ± 0.17% (remember that error bar was ~ ± 0.01022% for n = 13).

Splitting polynomials is crucial: in a first momment I tried to estimate Perft(14) to Perft(20) with the help of an only polynomial and results were horrible! Althouh Perft(14) and Perft(15) were around 0.65% less than the final results I post, BF's from n = 16 to n = 20 were 26, 9(?), 18... and Perft(20) estimate was ~ 9.85912e+27 (horrible!).

Maybe it is better to increase one degree the polynomials ASAP it could be done (when new estimates are calculated). But it was simply too much work for me.

Fortunately, Mr. Edwards has posted again. His effort computing Perft(13) is very much appreciated here. Please keep the good work!

Here's another piece of perft(13):
[D]rnbqkbnr/1ppppppp/p7/8/8/2NP4/PPP1PPPP/R1BQKBNR b KQkq - 0 2
The perft(10) for the above is 165,358,518,306,919. How well does your estimator algorithm do with this?