Why we can't use humans to measure Bayesian Regret of election methods

The "Bayesian regret" of an election method E is the "expected avoidable human unhappiness"
caused by using E.

So, many people have asked: "why, then, do you evaluate Bayesian Regret through
computer simulations? Why not use actual humans and genuine elections??"

There's good reasons for that.
The trouble with humans is that you can't easily measure "utility" of different
election alternatives for them. Why? Because there are no tangible, commonly agreed
units (like "money") for measuring "utility" or human "happiness."
And even if the units were there,
you still could not measure it.
And if you ask humans, they lie to you (or don't even know their own utilities).
How do you distinguish the utility-lies from the
utility-truth?
And imagine the controversies we'd get into if we said something like
"Electing Chirac clearly would have been better for the people of France, by 765 utility units."
This would not be "science."
It would be "a mess."

And even if we somehow could do all that (and one can to a very partial extent in some cases),
then we'd have another problem:
Human elections are very infrequent and expensive, meaning we could
only acquire a small amount of
data (every bit of which would be battled over forever);
not enough data to answer difficult questions with high statistical confidence.

Computers don't have those problems.
We can make artificial "candidates" and "voters"
inside our computer. We can read their "minds" without any possible lies
to determine their exact happiness values in agreed-upon units. There is no
controversy and any data you get has an exactly known meaning and is valid for
all future time. The expense is tiny and the speed is huge, so we can gather enormous
amounts of data.

Of course... we have to admit... computerized "voters" and "candidates" are not the real thing.
They are only as good as the (oversimplified) models we program.
The best response to this criticism is to program lots of models with lots
of parameters, and have the computer simulate elections in all of them.
That way, you can be reasonably confident you've captured a lot of kinds
of situations in your data, and at least some of them are realistic.
Range voting was the best (i.e. lowest Bayesian Reget) voting method in our
experiments in all of
720 different models.