Lyudmil Tsvetkov wrote:
Please see the difference at short and long TC. +220 games for no redundant rook at STC, -40 games at LTC! So that, I repeat again, knowledge scales well. At even longer TC, this would be even more evident.

Pretty much sums up your posts on talkchess:

Talk like you have expert understanding of a subject which in reality you understand nothing about.

To sum up:
1. You are not a titled chess player, or a master-level player, but you act like you have a GM-level understanding of chess.
2. You are not a programmer, and you claim that you do not understand code, yet you lecture programmers about code.
3. You haven't the faintest clue about statistics, yet you are happy to point out how fishtest is based on incorrect statistics.

'We' as in referring to yourself, I suppose, because I never agreed to any such thing.

There was sufficient game evidence for that. Jerry Donald also posted that 7 bishops perform better than 7 knights vs 3 queens.

I don't think this can be based on anything, as to my knowledge there doesn't exist any engine equipped with an evaluation that understands this end-game. It seems very easy for the Queens to draw against 7 Bishops: just sacrifice the Queens one on one against the Bishops on the minority color, and the remaining 4 Bishops have no mating potential. But engines not aware of that fact would of course never attempt it.

It is also about pawn spread, but in 80-90% of cases pawns will be spread on both wings, so this is the natural state of things. Having pawns just on one wing is an exception. With pawn spread on both wings, 2 knights perform even worse than a single knight, so redundancy is felt.

Grouping things together that need different treatment is a very bad idea. Evaluations must be precise and accurate, if you want an engine to ever rise above the level of micro-Max. An engine that cannot see the difference between the 10-20% where having Knights is good and the 80-90% where having them is bad, will almost always be suckered by an opponent that does know that difference into the 10-20%. The 10-20% is only "the natural state of things" when both players are aware of the difference. It won't remain so when one of the players is ignorant.

I have seen that in my Xiangqi engine. When it did not know the rules for perpetual chasing, only 1.5 of the games ended in a perpetual chase (declared a loss by the GUI). So I figured perpetual chasing was too rare to worry about it. But once I programmed the engine to recognize it, playing against the version that didn't resulted in 18% of the games being won by perpetual chasing. Because the engine that did know about it just suckered the engine that didn't into chasing him.

Lyudmil Tsvetkov wrote:
Why do you need a gigantic undertaking for 2 or 3 elo at most?
Those are your words, but I say there are more than 50 elo in imbalance and piece values, I have absolutely no doubt about that.

Correct. 2-3 elo only. Which is why it is not a priority by any means. 50 elo is simply ridiculous, no doubt about it. You are just naive.

Lyudmil Tsvetkov wrote:
I congratulate also Joerg for his efforts, but tuned imbalance can not substitute for everything. I repeat it again: you can never do queen vs 3 pieces with the imbalance tables, never. Joerg claimed to have simplified his ad-hoc rule when he improved the imbalance tables, but you now see SF fully does not understand queen vs 3 pieces, so that the improvement went somewhere else. You do not know quite what you are doing with the imbalance tables.

I will let Joerg explain his thoughts on what possibly went wrong there.

Lyudmil Tsvetkov wrote:
The same is true for merging redundant rook into imbalance: are you sure imbalance now handles that better? You can not in any way, you do not know where the change has gone, besides, removing failed at LTC.

It did not fail. It passed with high probability that it is a 0-elo change. That was the simplification test before SPRT (-3, 1) was introduced.

Lyudmil Tsvetkov wrote:
I will tell you simply why you can not do queen vs 3 pieces in the imbalance tables, but you will not understand me. Queen imbalances involve imbalances like Q vs R+minor and Q vs 2Rs, which are by far the most frequent ones. It also includes imbalances like Q vs 3 pieces (3 minors, R+2 minors or 2Rs+ minor), which are rare. How are you going to tune Q vs 2Rs and Q vs R+minor and Q vs 3 pieces at the same time with the same parameter values in the same tables, when in Q vs 2Rs or Q vs R+minor you will have to leave all piece values unchanged, while in Q vs 3 pieces the R, B and N value should go up, and very much at that?

Please tell me how, but first think a bit? Simply impossible. When you tune the whole, you will tune for the most frequent case, and that will leave Q vs 3 pieces imbalance unresolved for ever. Do you understand now?

That is why I have been urging and am urging Joerg again to resubmit a corrected Q vs 3 pieces patch to fishtest. You simply must have an ad-hoc rule for Q vs 3 pieces, there is no other way. Besides, imbalance tables and Q vs 3 pieces imbalance are totally unrelated.

Obviously these are hypothetical values with no other material on the board. Such endgames are theoretical draws, so they do not matter in this discussion. Most imbalanced positions have other material on the board too (like pawns and other minor pieces) so all those have to be considered too. Tracing which values are being added to what is a pain in the arse, and I don't have the time to do it. In either case, I see no reason how simply tuning the array cannot help. I would like to hear Joerg's thoughts about this. Your thoughts don't matter to me.

Lyudmil Tsvetkov wrote:
And Arjun, one last thing to you: I tell you again, it is knowledge that scales at longer TC and with bigger hardware. Hardware power increases with each year, what are you going to do at longer TC and huge hardware but calculate more terms? That is the right thing to do, that is why you have much time and big hardware. More knowledge will avoid randomness. If you have few terms, you simply calculate more, or much more, but, please note, random lines. Random lines usually bring you nothing, you need specific lines.

I thought beancounting has already been discarded.

Reiterating the same misconception again and again without proof does not make it right.

And your logic is also flawed. With better hardware, search depth also increases. Adding redundant nonsense does not increase strength. Searching more nodes almost certainly increases strength.

Lyudmil Tsvetkov wrote:
Please see the difference at short and long TC. +220 games for no redundant rook at STC, -40 games at LTC! So that, I repeat again, knowledge scales well. At even longer TC, this would be even more evident.

Pretty much sums up your posts on talkchess:

Talk like you have expert understanding of a subject which in reality you understand nothing about.

To sum up:
1. You are not a titled chess player, or a master-level player, but you act like you have a GM-level understanding of chess.
2. You are not a programmer, and you claim that you do not understand code, yet you lecture programmers about code.
3. You haven't the faintest clue about statistics, yet you are happy to point out how fishtest is based on incorrect statistics.

[d]2bq2k1/pppppppp/8/8/8/8/PPPPPPPP/1N1Q2K1 w - - 0 1
Anyone doubting in the redundancy theory should look at the above position. The bishop is a stronger piece than the knight, especially with pawns on 2 sides, but Q+N usually at least equal Q+B, and sometimes are even better. So there is some redundancy/complementarity effect here.

Of course, the answer is easy. Queen and knight lack absolutely any redundancy, while in queen and bishop you have the partial redundancy of the Q and B in their diagonal capacities. So not giving that partial redundancy penalty for Q+B would be wrong. But how many actually do it?

[d]8/5n2/3nk3/8/8/2R1B3/4K3/8 w - - 0 1
Another proof for the redundancy of knights. Did you know that this is won ... in 222 moves for white? Take a look at the interesting Pawnless endgames wikipedia article.

There is no pawn span there involved, nothing, so most of the burden for the loss with only 150cps less material is due to the badly coordinating knights. You can not say that the knights are slow above, because they do not need to be fast. But they coordinate badly.

So really, believe me or not, I would introduce a very simple redundancy system, and then try to perfect it. Nothing succeeds without tuning, but some people tune only mobility and salient pawn features.

zullil wrote:I really appreciate your many posts concerning evaluation. Although I haven't read each one---there are quite a few, after all ---I've thought about many of them.

Now please don't take this the wrong way, but the main effect of your posts has been to convince me that this is not the way to go. Too many terms, too much hidden overlap (non-orthogonality), too many artifacts of the human approach to chess, motivated by our very limited search abilities.

At a fundamental level, it seems to me that evaluation comes down to mobility and attack. Even the material value of pieces is simply more or less an encapsulation of their mobilities. Admittedly this is overly simplistic (and focused on the opening and midgame), but a position is good for us if we can move to lots of squares and if we attack a lot of enemy material. After all, the goal of the game is to reduce the opponent's mobility to zero while simultaneously attacking his most valuable piece!

You've almost inspired me to revisit my primitive engine and, after improving the basic move generation and search, to focus on an evaluation based on mobility + attacking. Mostly as an academic exercise, since any engine I create would be just for fun (and also pretty weak).

Thanks Louis.

And I thought I posted just a few messages...

It depends on how you look upon it, I understand that very well. Different persons see different things. Nothing bad about that. But we are talking here how to perfect things. You are completely right, mobility is the most important thing (I would say immediately after space advantage ), but mobility comes in different ways: it comes with attacking, it comes with space, it comes with pawn features and it comes with imbalance evaluation. As mobility is reflected and substantiated in different ways, you can have a whole grasp of it only if you consider those different ways. And the more ways you consider, the closer you are to understanding mobility in-depth.

Now, I started hating that word - orthogonal. Lucas talks about orthogonality, Arjun talks about orthogonality, and now you also started doing this. You do not know what is orthogonal and what not until you test it. However, I completely agree that you do not need too many features in eval; but you need the most important ones. You can certainly get rid of unimportant terms.

I tell you again: the more terms you have in eval in a resonable way, the better, as they omit less possibilities of game development. This has been proven in engine history to be true, but people are always suspicious and unaccepting of new suggestions.

So you have an engine, hope to play a game against it one day.

Your messages and especially output also always inspire me.

There is a key point you miss here. These programs literally search millions (or even tens of millions) of nodes per second. Basically an exhaustive search although to a variable depth. Every evaluation term you add has to be 100% correct, because otherwise your huge search space is guaranteed to walk over those positions where it doesn't work.

As you add more terms, there are more and more unexpected interactions that will cause things to add (or subtract) in a completely unexpected way and produce a score that makes you drop a pawn (or worse) in the blink of an eye.

Most of us that have done this for a while agree that one can't get very far on a minimalistic evaluation, but also one can't get very far with an overly complex evaluation either, because it is so difficult to debug an evaluation where some times the terms work in a coordinated way, while at other times they diametrically oppose each other and lead you to think things are equal when they are anything bug.

The search space has gotten so big, your "more terms is better" can quickly and easily backfire and generate something so complex it is difficult for a human to make heads or tails as to what is going on inside the tree.