Author Misreads Expert Re Crowds?

James Surowiecki starts his book The Wisdom of Crowds telling how Francis Galton in 1907 used a crowd to guess an ox’s weight:

Galton borrowed the tickets from the organizers and ran a series of statistical tests on them Galton arranged the guesses (which totaled 787 in all, after he had to discard thirteen because they were illegible) in order from highest to lowest and graphed them to see if they would form a bell curve. Then, among other things, he added all the contestants’ estimates, and calculated the mean of the group’s guesses. That number represented, you could say, the collective wisdom of the Plymouth crowd. If the crowd were a single person, that was how much it would have guessed the ox weighed. … The crowd has guessed that the ox, after it had been slaughtered and dressed would weigh 1,197 pounds. After it had been slaughtered and dressed, the ox weighted 1,198.

David Levy and Sandra Peart say Surowiecki got it all wrong. Galton did not [AT FIRST] even bother to calculate a mean, as he saw his data was clearly not normally distributed. He used the median (of 1207), which was much further off than the mean, but by modern standards clearly the better estimator. It was Karl Pearson in 1924 who calculated the mean. (Line crossed out, and [clarifier] added later.)

This description of what Galton did with the guesses misrepresents what Galton actually did. Galton was clear that the distribution of guesses was not normal, writing that "The abnormality of the distribution of the estimates now becomes manifest, …” (Galton 1907b, p. 451). Surowieki has replaced Galton’s statement with the claim that Galton "graphed them [the guesses] to see if they would form a bell curve" – allowing the remaining possibility that the guesses might be normal. Galton’s principled opposition to the mean as the voice of the people, which Pearson supplemented by the use of the mean, is now described as Galton’s use of the mean. Finally, the reported estimate of the vox populi has been changed from 1207 to 1197.

Several authors, "Sunstein (2006, p. 24), Solomon (2006), Caplan (2007, p. 8)", copied Surowiecki’s error [VERSION], and several recent papers have argued about how close prediction market prices are to mean beliefs. The original Galton paper can be found reprinted in Levy and Peart’s book Vanity of the Philosopher and online.

Added: In the comments, Surowiecki says Levy and Peart are very wrong: Galton did too mention the mean, when responding a few weeks later to a letter that mentioned the mean. He cited this letter in his book footnotes. Hopefully we can get Levy and Peart to respond.

Added: Here are Surowiecki’s comments and Levy and Peart’s responses in full:

"Galton did not even bother to calculate a mean, as he saw his data was clearly not normally distributed. He used the median (of 1207), which was much further off than the mean, but by modern standards clearly the better estimator. It was Karl Pearson in 1924 who calculated the mean."

Robin, before repeating falsehoods, you might want to go back to the original sources — or, in this case, to the footnotes to my book. Galton did, in fact, calculate the mean, long before Karl Pearson did. Galton’s calculation appeared in Nature, Vol. 35, No. 1952 (3/28/07), in a response to letters regarding his original article. One of the correspondents had gone ahead and calculated a mean from the data that Galton had provided in his original piece, and had come up with the number 1196. Galton writes, "he makes it [the mean] 1196 lb. . . . whereas it should have been 1197 lb."

I find the fact that Levy and Peart wrote an entire article about Galton (and, to a lesser extent, about my use of him), and never went back and checked the original sources is astounding in its own right. (They actually wonder in the paper, "However the new estimate of location came to be part of Surowieki’s account," as if the answer isn’t listed right there in the footnotes.) What makes it even more astounding, though, is that they’ve written an entire paper about the diffusion of errors by experts who "pass along false information (wittingly or unwittingly)" while passing along false information themselves.

It also seems bizarre that Levy and Peart caution, "The expectation of being careful seems to substitute for actually being careful," and yet they were somehow unable to figure out how to spell "Surowiecki" correctly. The article is a parody of itself.

I’m happy to enter into a discussion of whether the median or the mean should be used in aggregating the wisdom of crowds. But whether Galton himself thought the mean or the median was better was and is irrelevant to the argument of my book. I was interested in the story of the ox-weighing competition because it captures, in a single example, just how powerful group judgments can be. Galton did calculate the mean. It was 1197 lbs., and it was 1 lb. away from the actual weight of the ox. The only "falsehood" being perpetrated here are the ones Levy and Peart are putting out there, and the ones that you uncritically reprinted.

There’s no reason for debate here. Levy and Peart say "Pearson’s retelling of the ox judging tale apparently served as a starting point for the 2004 popular account of the modern economics of information aggregation, James Surowieki’s Wisdom of Crowds." It wasn’t the starting point. The starting point was Galton’s own experiment, and his own reporting of the mean in "The Ballot Box." Robin writes: "Galton did not even bother to calculate a mean." He did calculate it, and he did report it. This fact shouldn’t be listed as an "addendum" to the original post. The original post should be rewritten completely — perhaps along the lines of "Surowiecki and Galton disagree about which estimate is a better representation of group judgment" rather than "Author Misreads Expert" — or else scrapped.

David Levy and Sandra Peart respond (by email):

Surowiecki is correct that Galton reports the mean in his letter to Nature of March 28, 1907. He reports it there in response to a query. And that letter to Nature is in the references to the Wisdom of Crowds which (ironically, in a note about carefulness and checking) we did not check. Pearson required both the mean and the standard deviation to compute the calibrating normal. So, he needed to do the recomputations. Our next version will clarify this with thanks to Surowiecki, who has rightly made the point that Galton reported the mean.

We can now focus on what the larger point; that the account which reports Galton’s mean (but not his defense of the median) leads to a conflation of what Galton defended with what we may wish him to defend, the mean. When people quote Galton through Surowiecki, they tell Surowiecki’s tale, not Galton’s. Though Galton reported the mean in response to a question, he did not defend the use of the mean or use it in his report of the ox tale either before or afterwards.

Here are the results and the conclusion in the original Vox Populi article.

the middlemost estimate expresses the vox populi, every other estimate being condemned as too low or too high by a majority of the voters (for fuller explanation see "One Vote, One Value," Nature, February 28, p. 414). Now the middlemost estimate is 1207 lb., and the weight of the dressed ox proved to be 1198 lb.; so the vox populi was in this case 9 lb., or 1 per cent, of the whole weight too high. …. (p. 450)

This result is, I think, more creditable to the trust-worthiness of a democratic judgment than might have been expected. (P. 451).

This conclusion is reproduced in the later Memories and is quoted by Surowiecki (p. xiii). Here is the conflation of what Galton did what Surowiecki evidently thinks he should have done.

The crowd had guessed that the ox, after it had been slaughtered and dressed, would weigh 1,197 pounds. After it had been slaughtered and dressed, the ox weighted 1,198 pounds. In other words, the crowd’s judgment was essentially perfect. Perhaps breeding did not mean so much after all. Galton wrote later: "The result seems to creditable to the trustworthiness of a democratic judgment than might have been expected." That was, to say the least an understatement.

Here’s the "Ballot Box" where Galton defends the median on 1) the basis of democratic theory and 2) as a way to bound the influence of the estimate. After the defense he reports the sample mean.

Mr. Hooker, in Nature of March 21, seems not to have quite appreciated my principal contention in the letters "One Vote, One Value" and "Vox Populi" of February 28 and March 7 respectively. It was to show that the verdict given by the baliot-box must be the Median estimate, because every other estimate is condemned in advance by a majority of the voters. This being the case, I examined the votes in a particular instance according to the most appropriate method for dealing with medians, quantiles, &c. I had no intention of trespassing into, the technical and much-discussed question of the relative merits of the Median and of the several kinds of Mean, and beg to be excused from not doing so now except in two particulars. First, that it may not be sufficiently realised that the suppression of any one value in a series can only make the difference of one half-place to the median, whereas if the series be small it may make a great difference to the mean ; consequently, 1 think my proposal that juries should openly adopt the median when estimating damages, and councils when estimating money grants, has independent merits of its own, besides being in strict accordance with the true theory of the ballot-box. Secondly, Mr. Hooker’s approximate calculation from my scanty list of figures, of what the mean would be of all the figures, proves to be singularly correct; he makes it 1196 lb. … whereas it should have been 1197 lb.

Did Galton change his mind? Here’s the 1908 account in the Memories, 280-1 in which the vox populi clearly the median. The same concern with outliers is found. The mean is nowhere in sight.

A little more than a year ago, I happened to be at Plymouth, and was interested in a Cattle exhibition, where a visitor could purchase a stamped and numbered ticket for sixpence, which qualified him to become a candidate in a weight-judging competition. An ox was selected, and each of about eight hundred candidates wrote his name and address on his ticket, together with his estimate of what the beast would weigh when killed and "dressed" by the butcher. The most successful of them gained prizes. The result of these estimates was analogous, under reservation, to the votes given by a democracy, and it seemed likely to be instructive to learn how votes were distributed on this occasion, and the value of the result. So I procured a loan of the cards after the ceremony was past, and worked them out in a memoir published in Nature [176-7]. It appeared that in this the vox populi was correct to within 1 per cent. of the real value; it was 1207 pounds instead of 1198 pounds, and the individual estimates were distributed in such a way that it was an equal chance whether one of them selected at random fell within or without the limits of -3.7 per cent, or +2.4 per cent of the middlemost value of the whole.

The result seems more creditable to the trustworthiness of a democratic judgment than might have been expected. But the proportion of the voters who were practised in judging weights undoubtedly surpassed that of the voters in ordinary elections who are versed in politics.

I endeavoured in the memoirs just mentioned, to show the appropriateness of utilising the Median vote in Councils and injuries, whenever they have to consider money questions. Each juryman has his own view of what the sum should be. I will suppose each of them to be written down. The best interpretation of their collective view is to my mind certainly not the average, because the wider the deviation of an individual member from the average of the rest, the more largely would it effect the result In short, unwisdom is given greater weight than wisdom. In all cases in which one vote is supposed to have one value, the median value must be the truest representative of the whole, because any other value would be negatived if put to the vote. If it were more than the median, more than half of the voters would think it too much; if less, too little.

We were in error not to check all of Surowiecki’s citations. The result he reported is something which Galton computed. On this important issue he is right, we were wrong. But our larger point remains: that Galton defends the use of the median and attacks the use of the mean for the basis of democratic judgment in his first and his last words on the subject. Indeed, in the letter in which he reports the mean, he defends the use of the median for juries and councils when they are making decisions involving money.

I appreciate Levy and Peart admitting their mistake. But they seem not to recognize that their mistake undermines the critique that’s at the center of their paper. Their paper, they write, is about the misconstruing of Galton’s experiment. "A key question," they write, "is whether the tale was changed deliberately (falsified) or whether, not knowing the truth, the retold (and different) tale was passed on unwittingly." But the account of Galton’s experiment was not changed deliberately and was not falsified. It was recounted accurately. Levy and Peart want to use my retelling of the Galton story as evidence of how "experts pass along false information (wittingly or unwittingly) [and] become part of a process by which errors are diffused." But there’s no false information here, and no diffusion of errors, which rather demolishes their thesis. If they really want to write a paper about how "experts" pass along false information, they’d be better off using themselves as Exhibit A, and tell the story of how they managed to publish such incredibly shoddy work and have prominent economists uncritically link to it.

I appreciate Levy and Peart admitting their mistake. But they seem not to recognize that their mistake undermines the critique that’s at the center of their paper. Their paper, they write, is about the misconstruing of Galton’s experiment. "A key question," they write, "is whether the tale was changed deliberately (falsified) or whether, not knowing the truth, the retold (and different) tale was passed on unwittingly." But the account of Galton’s experiment was not changed deliberately and was not falsified. It was recounted accurately. Levy and Peart want to use my retelling of the Galton story as evidence of how "experts pass along false information (wittingly or unwittingly) [and] become part of a process by which errors are diffused." But there’s no false information here, and no diffusion of errors, which rather demolishes their thesis. If they really want to write a paper about how "experts" pass along false information, they’d be better off using themselves as Exhibit A, and tell the story of how they managed to publish such incredibly shoddy work and have prominent economists uncritically link to it.

The paper has been taken down at Adam Smith Lives for rethinking. We offer our apologies to James Surowiecki.

http://adamsmithlives.blogs.com/thoughts/2007/10/experts-and-i-1.html

One paragraph which will go into the next version is this:

One of Galton’s defenses for the sample median as the vox populi was it that bounds the influence of any individual voter. Replication and checking of the work of experts may be a way to bound the influence of experts. It is important for reader to know that in an earlier version we denied the existence of Galton’s mean. This emphasizes the importance of replication and competition precisely to bound the influence of such error

Here’s what we are prepared to defend :

The majority-rule context of Galton’s publications is lost when the sample median, upon which Galton put such stress, is no longer reported.

I have had time to reflect and now I would like to offer a more detailed personal apology than what we’ve jointly posted before. When I failed to find Galton’s mean, in spite of your sufficient directions, I should have asked you directly for help. From these two failures of mine, and because Sandy trusted my work, we were led to the wrong conclusion that your account of Galton’s mean was false instead of the right conclusion that your account was simply different than our accounts of Galton’s median. If the accounts are merely different then we have many ways of asking which of the two estimators one might prefer. We began that helpful exercise. We did not stop there. When we said that your account was false, and asked a rhetorical question of how this came to be, we called into question my own intentions. We also wrongly called into question the care which scholars took in citing your work.

“Galton did not even bother to calculate a mean, as he saw his data was clearly not normally distributed. He used the median (of 1207), which was much further off than the mean, but by modern standards clearly the better estimator. It was Karl Pearson in 1924 who calculated the mean.”

Robin, before repeating falsehoods, you might want to go back to the original sources — or, in this case, to the footnotes to my book. Galton did, in fact, calculate the mean, long before Karl Pearson did. Galton’s calculation appeared in Nature, Vol. 35, No. 1952 (3/28/07), in a response to letters regarding his original article. One of the correspondents had gone ahead and calculated a mean from the data that Galton had provided in his original piece, and had come up with the number 1196. Galton writes, “he makes it [the mean] 1196 lb. . . . whereas it should have been 1197 lb.”

I find the fact that Levy and Peart wrote an entire article about Galton (and, to a lesser extent, about my use of him), and never went back and checked the original sources is astounding in its own right. (They actually wonder in the paper, “However the new estimate of location came to be part of Surowieki’s account,” as if the answer isn’t listed right there in the footnotes.) What makes it even more astounding, though, is that they’ve written an entire paper about the diffusion of errors by experts who “pass along false information (wittingly or unwittingly)” while passing along false information themselves.

It also seems bizarre that Levy and Peart caution, “The expectation of being careful
seems to substitute for actually being careful,” and yet they were somehow unable to figure out how to spell “Surowiecki” correctly. The article is a parody of itself.

I’m happy to enter into a discussion of whether the median or the mean should be used in aggregating the wisdom of crowds. But whether Galton himself thought the mean or the median was better was and is irrelevant to the argument of my book. I was interested in the story of the ox-weighing competition because it captures, in a single example, just how powerful group judgments can be. Galton did calculate the mean. It was 1197 lbs., and it was 1 lb. away from the actual weight of the ox. The only “falsehood” being perpetrated here are the ones Levy and Peart are putting out there, and the ones that you uncritically reprinted.

Here is an extract of an article by Francis Galton published in “Nature” march 7 – 1907 :
“According to the democratic principle of “one vote one value,” the middlemost estimate expresses the vox populi, every other estimate being condemned as too low or high by a majority of the voters .Now the middlemost estimate is 1207 lb., and the weight of the dressed ox proved to be 1198 lb.; so the vox populi was in this case 9 lb., or 0.8 per cent. of the whole weight too high. The distribution of the estimates about their middlemost value was of the usual type, so far that they clustered closely in its neighbourhood and became rapidly more sparse as the distance from it increased.”

There’s no reason for debate here. Levy and Peart say “Pearson’s retelling of the ox judging tale apparently served as a starting point for the 2004 popular account of the modern economics of information aggregation, James Surowieki’s Wisdom of Crowds.” It wasn’t the starting point. The starting point was Galton’s own experiment, and his own reporting of the mean in “The Ballot Box.” Robin writes: “Galton did not even bother to calculate a mean.” He did calculate it, and he did report it. This fact shouldn’t be listed as an “addendum” to the original post. The original post should be rewritten completely — perhaps along the lines of “Surowiecki and Galton disagree about which estimate is a better representation of group judgment” rather than “Author Misreads Expert” — or else scrapped.

James, I’ve put a line through the clearly incorrect sentence of mine re Pearson, and added a [clarifier] to my first sentence. I also edited the added in the direction of Eliezer’s suggestion, and added a question mark to the title. That’s about as far as I think it reasonable to go to correct the post.

I agree Levy and Peart were wrong about Pearson 1924 having first calculated the mean. But I do think they have a point that it is ironic that Galton made quite an effort to emphasize and prefer the median, in part because the data did not look like a bell curve, while your retelling focuses on him calculating a mean after checking for a bell curve.

I do not think Galton’s experiment is valid; there is far too little systematic bias, given the undoubtedly huge variance in the individual estimates, and the difficulty of trying to weigh an object by sight. I defy the data.

Douglas Knight

Thank you for the opportunity to learn about the power and subtlety of indignation.

It looks like there wasn’t much variance in the individual estimates that Galton checked! Perhaps people really are unbiased weighers, with experience at county fairs. (I would previously have agreed with Tom McCabe.) Post facto rationalization: Perhaps this is because if people systematically overestimate or underestimate at county fairs, they can correct this with relatively little experience: “Gee, that’s the third time in a row my estimate was too high.”

In my book, I reference Surowiecki’s “guess-the-weight-of-an-ox” anecdote. My colleague David Levy and his co-author Sandra Peart show that isn’t…

James Surowiecki

I appreciate Levy and Peart admitting their mistake. But they seem not to recognize that their mistake undermines the critique that’s at the center of their paper. Their paper, they write, is about the misconstruing of Galton’s experiment. “A key question,” they write, “is whether the tale was changed deliberately (falsified) or whether, not knowing the truth, the retold (and different) tale was passed on unwittingly.” But the account of Galton’s experiment was not changed deliberately and was not falsified. It was recounted accurately. Levy and Peart want to use my retelling of the Galton story as evidence of how “experts pass along false information
(wittingly or unwittingly) [and] become part of a process by which errors are diffused.” But there’s no false information here, and no diffusion of errors, which rather demolishes their thesis. If they really want to write a paper about how “experts” pass along false information, they’d be better off using themselves as Exhibit A, and tell the story of how they managed to publish such incredibly shoddy work and have prominent economists uncritically link to it.

James Surowiecki

To finish, Levy and Peart insist that their really important point still stands, which is that “When people quote Galton through Surowiecki, they tell Surowiecki’s tale, not Galton’s,” and that this is a problem because Galton’s thinking is being misrepresented. But as I said earlier, “The Wisdom of Crowds” was not intended to be a discussion of Francis Galton’s opinions on what’s the best method to capture group judgment, nor, as far as I know, has anyone who’s “Surowiecki’s tale” used the Galton example since used it to analyze Galton’s opinions. People aren’t quoting the Galton story because they’re interested in what Galton himself thought about the median vs. mean. They’re quoting it because they’re interested in the bigger idea, which is that group judgments (and this is true whether you use the median, the mean, or a method like parimutuel markets) are often exceptionally accurate. Levy and Peart have constructed a straw man — and, in this case, a straw man based on a falsehood — and then tried to knock it down.

Robin writes: “it is ironic that Galton made quite an effort to emphasize and prefer the median, in part because the data did not look like a bell curve, while your retelling focuses on him calculating a mean after checking for a bell curve.” What’s ironic about this? He did check for a bell curve, and he did calculate the mean. It’s the data themselves, not Galton’s interpretation of them, that I was writing about. (If he hadn’t calculated the mean, I would have happily told the story with the median, since it was also remarkably accurate, and demonstrated the same point about the wisdom of crowds.)

Finally, on the substantive question, Robin (and Levy and Peart) seem to think that because the distribution of guesses wasn’t normal, that makes using the mean a mistake. But this is precisely what’s so interesting: if the group is large enough, even if the distribution isn’t normal, the mean of a group’s guesses is nonetheless often exceptionally good.

David Levy

The paper has been taken down at Adam Smith Lives for rethinking. We offer our apologies to James Surowiecki.

One of Galton’s defenses for the sample median as the vox populi was it that bounds the influence of any individual voter. Replication and checking of the work of experts may be a way to bound the influence of experts. It is important for reader to know that in an earlier version we denied the existence of Galton’s mean. This emphasizes the importance of replication and competition precisely to bound the influence of such error

Here’s what we are prepared to defend :

The majority-rule context of Galton’s publications is lost when the sample median, upon which Galton put such stress, is no longer reported

I have had time to reflect and now I
would like to offer a more detailed personal apology than what we’ve jointly posted before. When I failed to find Galton’s mean, in spite of your sufficient directions, I should have asked you directly for help. From these two failures of mine, and because Sandy trusted my work, we were led to the wrong conclusion that your account of Galton’s mean was false instead of the right conclusion that your account was simply different than our accounts of Galton’s median. If the
accounts are merely different then we have many ways of asking which of the two estimators one might prefer. We began that helpful exercise. We did not stop there. When we said that your account was false, and asked a rhetorical question of how this came to be, we called into question my own intentions. We also wrongly called into question the care which scholars took in citing your work.

Are Levy and Peart the pair who have been smearing the Galtonians for years? Jeez, you’d think they would have something better to do with their time…

Galton was 85 years old when he did the guess-the-weight study and realized that his preconception was completely wrong. So, rather than forget about it and take a nap, he reported his new finding to Nature, which became the starting point for the “Wisdom of Crowds” school of thought that James S. wrote about.

Galton was a great scientist and a great man.

For a less tendentious assessment of Galton’s many achievements (and fewer, but still real, shortcomings), see Jim Holt’s article in The New Yorker: