Statistics conclusions are probably wrong !

Once a university assistant told a student to write in his masters thesis : “the result of this statistical test is not significant, but probably, if we had data on more cases, it would have been significant”. When I heard this, and also that a real statistics professor would judge the quality of the masters thesis, I strongly advised the student not to write this crap in his thesis.

And, yes, to start with : some have no problem at all with statistical significance because they don’t care. Example : I know people who test their direct marketing campaigns with a much to small control group with only a handfull “hits”. Result : in a lot of of the cases they shout : hurray, our campaign does a lot better than our control group. In fact what they see is merely random patterns.
Say you run an email campaign with 100.000 e-mails and get a 5% conversion rate (5000 people bought your product).
Then you compare this result with a much to small control group of 100 who got no email and only 3 people (=3%) did buy your product. Conclusion : the campaign delivered two extra percents. WRONG. With a simple 2×2 table analysis You have one chance out of 4 tot get that result if your email campaign had no influence whatsoever on the buying behaviour !

And this brings us to the other problem with statistical significance. If the probability of finding a pattern just by chance is < 5% generally we find this statistically significant, which means : we conclude : this is no chance, but the observation of a real existing pattern. A lot of people forget that this threshold of probability, of statistical significance really means that in 100 tests, you will find on average 5 of them which show a “significant” pattern, although in reality there is no pattern !

So the lesson is : can you repeat the test, the treatment and still find the same pattern ?
That is why, in data mining, we should always test the model on another hold-out dataset, to verify if the conclusion still holds on new data.