tag:blogger.com,1999:blog-5841910768079015534.post4171790506760518244..comments2018-08-14T13:17:32.470+01:00Comments on BishopBlog: The Amazing Significo: why researchers need to understand pokerdeevybeehttp://www.blogger.com/profile/15118040887173718391noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-5841910768079015534.post-9247842892779550982016-02-05T14:05:53.700+00:002016-02-05T14:05:53.700+00:00I think that much of the problem is based on the a...I think that much of the problem is based on the artificial bisection of effects into significant vs. non-significant while ignoring effect sizes much of the time. The rules of the game are that you&#39;re allowed to claim that &#39;X does Y&#39; if the corresponding p value is &lt;0.05. If &#39;X does Y&#39; pertains to something funky that the general audience can get exited about, you might be able to publish it in the &#39;glamour&#39; journals and the media are likely to pick it up. Great for the career!<br /><br />However, p&lt;0.05 (or any other alpha) DOES NOT mean that X does Y. It means that X does Y sometimes, but at other times it may do the opposite or nothing at all. How consistently X does Y is determined by the effect size, not by the p-value!<br /><br />In my ideal world, there would be an automatic editing process that all papers in all journals would be subjected to that replaces all statements of &#39;X does Y&#39; (especially in the title) with a more accurate assessment based on the consistency of the effect in the data. Many papers that have attracted interest and raised controversy would then have titles like &#39;X does Y sometimes, although it has the opposite or no effect at all in many other cases and we don&#39;t really know why&#39;. Papers with p-values close to 5% for the main conclusion would likely be entitled &#39;Limited evidence that we did not just measure random numbers in our attempt to examine whether X does Y&#39;. That way, overselling weak effects to &#39;glamour&#39; journals, the media and the public would become an entirely new challenge...<br /><br /><br />Lowering alpha thresholds would exert higher pressure to study bigger, more reliable, effects, and I think that is where the real benefit is. Even with alpha at 1%, sustaining a career or research field based on spurious effects would become much harder (the current &#39;1/20+inflation by p hacking&#39; chance of finding unexpected and exiting results may be too high to deter this).Søren K. Andersenhttp://www.abdn.ac.uk/psychology/people/profiles/skandersennoreply@blogger.comtag:blogger.com,1999:blog-5841910768079015534.post-9536123131830385212016-02-03T07:11:11.427+00:002016-02-03T07:11:11.427+00:00Absolutely. That is how it is supposed to work - i...Absolutely. That is how it is supposed to work - initial exploratory study, generate hypothesis, test hypothesis with new data.deevybeehttps://www.blogger.com/profile/15118040887173718391noreply@blogger.comtag:blogger.com,1999:blog-5841910768079015534.post-8981621792825110502016-02-01T20:22:55.445+00:002016-02-01T20:22:55.445+00:00Relevant: http://xkcd.com/882/Relevant: http://xkcd.com/882/Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5841910768079015534.post-46004217936298451742016-02-01T19:14:29.755+00:002016-02-01T19:14:29.755+00:00I have not studied statistics but had heard of thi...I have not studied statistics but had heard of this prohibition once and now you have explained it; thank you!<br /><br />One question: suppose a researcher notices, among multiple variables, one that is unexpectedly significant. Then, they do a new study, sound in other ways, to investigate that one variable. Would that be reliable?Cynthia Cheneynoreply@blogger.comtag:blogger.com,1999:blog-5841910768079015534.post-50486508635360543442016-01-27T17:14:21.823+00:002016-01-27T17:14:21.823+00:00I understand the point and agree that, in practica...I understand the point and agree that, in practical terms it would solve much of the current problem in psychology and, perhaps,in a number of other areas. Moving to 5 sigma or, as Colquhoun seems to suggest, 3 sigma would likely be a vast improvement. (Though, of course, Bayes is,really,the way to go :)).<br /><br />There certainly is nothing sacred about .05. A significance level selected for agricultural research in the early 1900&#39;s may not be totally appropriate anymore. <br /><br />However, I still hold (stubbornly) to my position, which I did not explain clearly, that it does not provide a cure for the actual behaviour of p-hacking. That is why I disagreed with Anonymous. There probably is no complete cure for p-hacking but probably better training in graduate school might have some effect as the implication in some papers I read suggests that the researcher is just not aware of the issue rather than deliberately trying to game the system.<br /><br />I enjoyed David Calquhoun&#39;s article. It seems one of the best, straight-forward, explanations of the problem that I have seen. The accompanying R-scrip is interesting.<br />jrkrideauhttps://www.blogger.com/profile/04869979887929067657noreply@blogger.comtag:blogger.com,1999:blog-5841910768079015534.post-34370071432463160392016-01-27T11:06:18.670+00:002016-01-27T11:06:18.670+00:00Thanks to you and @anonymous for comments. On the ...Thanks to you and @anonymous for comments. On the basis of simulations, though, I share the view of Anonymous that shifting to require p &lt; .001 instead of p &lt; .05 would pretty much fix the problem in psychology. For typical experimental parameters in psychology, there would just be too few low p-values to p-hack. See also David Colquhoun&#39;s article here: http://rsos.royalsocietypublishing.org/content/1/3/140216deevybeehttps://www.blogger.com/profile/15118040887173718391noreply@blogger.comtag:blogger.com,1999:blog-5841910768079015534.post-14674121282645322042016-01-26T19:00:53.519+00:002016-01-26T19:00:53.519+00:00Great post and thank you for writing!
We are try...Great post and thank you for writing! <br /><br />We are trying to encourage more people to preregister their analyses with the Preregistration Challenge: 1,000 researchers will win $1,000 for publishing the results of their preregistered work: https://cos.io/prereg David Mellorhttps://www.blogger.com/profile/17828418862149203477noreply@blogger.comtag:blogger.com,1999:blog-5841910768079015534.post-55548952900304383662016-01-26T15:56:30.931+00:002016-01-26T15:56:30.931+00:00Not really a Bayesian but I did notice it .)
I was...Not really a Bayesian but I did notice it .)<br />I was reading Ben Goldacre&#39;s COMPare info the other day and was quite impressed. <br /><br />It is amazing what some good digging can turn up. The problems in doing so are a)it is usually as exciting as doing new research and b) it can take a lot of time and resources.<br /><br />@Anonymous<br />No switching to 5 sigma just means that we need a bigger effect but the multiple tests without corrections and p-hacking problems remains. <br /><br />It&#39;s the actual practices not specific criteria that are the problem.jrkrideauhttps://www.blogger.com/profile/04869979887929067657noreply@blogger.comtag:blogger.com,1999:blog-5841910768079015534.post-69221465052111773162016-01-26T07:46:01.585+00:002016-01-26T07:46:01.585+00:00Really nicely explained. We&#39;d get rid of a lot...Really nicely explained. We&#39;d get rid of a lot of these problems if, as a profession, we just switched from 2 sigma to 5 sigma like they do in physics...Anonymousnoreply@blogger.com