Safe is Risky...

January 20, 2009

The Lady Tasting Tea : Visual Summary

This post is going to serve as a good visual narrative of the development of frequentist world. If you love stats and have worked on identifying patterns in data, it is but obvious that you would have met a host of tests, with different names from different fields. Statistics is one field where the contribution has been made from all kinds of fields ranging from agriculture, clinical psychology, math, finance etc.

WHY do you think so many people contributed to this field ? Just pause for a few seconds and think about it.

I had never paused and thought about it.... and the opening remarks of this book provided that insight,though very obvious in the hindsight. Statistics, a significant part of it deals with the way experiments need to be conducted ,and experiment/test/learn is the way of life for any scientist irrespective of the domain he/she is working. So ,the contributions to the field of stats are going to come from all possible fields.

Ok,let me attempt to do a visual summary of this book, for the simple reason that this book is a fantastic narrative of the history of statistics, the people who contributed, a peek in to their idiosyncrasies, their likes,their dislikes, is something that is a delight for the readers. I will try to cover most of the personalities mentioned in the book and their contributions.

Karl Pearson

Pearson was the first to look at the world from the eyes of a "distribution". What we are see are nothing but realizations of a distribution. If we collect more data, then we know more about the distribution. Under this assumption , he went on to create families of distributions based on mean, standard deviation , skewness and kurtosis. Experimental results are thus a distribution of numbers and distribution equations tell the probability of occurrence of these numbers. Measurements themselves have a probability than the errors in measurements which was the prevalent thought ( Galton , Abraham De Moivre, Carl Gauss).

Pearson also needed a tool to fit the measurements to distributions and he came up with a powerful tool called "Goodness of fit" which is used till date. (An example, if you have to select amongst a host of ARIMA models , for a given realization, one uses goodness of fit like AICC and chooses the model)

Gosset

Gosset was a classical empiricist. He felt Pearson theory is good but difficult to practice in reality as one needs to deal with small sample sizes. He worked on this problem after his work hours( classic success of "sex and cash" theory) . He said two estimates mean, standard deviation are enough to say something about the distribution. He published under a pseudonym - "Student t". This was very useful ..think about it. With pearson's case, you estimate 4 parameters, then you estimate the estimate of 4 parameters, ..it is an infinite loop. But with Gosset's insight, you stop at the first computation....wow!! It is a marvelous achievement.

Ronald Aylmer Fisher

Fisher, a personality, has had tremendous influence in the development and usage of statistics. His experiments, "Studies of crop variation", produced gem of results. ANOVA, MANOVA,ANCOVA, etc. Separating the main effects of the experiment was the underlying philosophy behind randomization. "Statistical Methods for Research Workers", a book stripped of complex math equations and made available for easy usage was instrumental in the adoption of Fisherian thought everywhere. Fisher was the person who first introduced the words "Degrees of freedom". may be becoz, he was always inclined to think geometrically.

The fundamental difference between Person's philosophy and Fisherian view is that : Pearson believed that if data represented distribution. Fisher believed that distribution is an abstract concept and he believed that all one can do is find a statistic describing the abstract concept. This statistic can be anything, mean, median, iqr etc. All these statistics will be random and hence one needs to study these estimates as such. He was also instrumental in coming up with "Maximum Likelihood estimates" , a way to iteratively figure the best values for the estimates give the data and a distribution .Today with the advent of computers, mle is a command away to this powerful and time consuming procedure which was very mathematical and laborious to do by hand.

Fisher also introduced p values, which went on to become the basis for hypothesis testing

Tippett Gumbel, Emil Julius

100 year flood prediction is difficult using Fisherian concepts . Hence Tippet and Gumbel studies this aspect and made a significant contribution. Today we know the work by the name Gumbel distribution.

Neyman

Neyman was the first person to ponder on the question, of connecting p values to hypothesis statements. He figured out that one needs to have a null hypothesis and alternate hypothesis for an experiment to use p values. One can only reject null hypothesis or fail to reject null hypothesis. No causality is being talked in here. This was his biggest contribution. He also developed and gave the interpretation for the word "confidence intervals"

Bayes

Bayes , a priest by profession, made an outstanding contribution to the world of stats. He dealt with the world of inverse probability. Look at the data and adjust the prior hypothesized distribution was the thought, a thought which was very very radical. For some reason, Bayesian stats is still not taught properly in MBA courses, Finance courses across the world, in spite of the fact that Google made tons of money using baye's fundas. There are umpteen disciplines that are crying for the application of bayesian principles . Risk management, for certain!

He was the first mathematician to use measure theory and lift probability from a step sister treatment in math, to the grand status of what it is today. With out Kolmogorov, it is very unlikely that development would have happened at this rate. He pondered on 2 questions and spent his entire life on them

What are the mathematical foundations of probability ?

If there is a set of data in a time interval, how does one interpret them ?

He also coined the term "stochastic process" which is the bread and butter of quants

Florence Nightingale David

Lot of people think that she was founder of nursing profession. That is just one part of story.But for stats guys, her contribution to the theory of statistics is path breaking. Any paper you pick on stats, with in 2 -3 handshakes, you will find a reference to F.N. David's work

Wilcoxon, Mann of Mann-Whitney test

Why not do away with parameters, a revolutionary thought, in the hind-sight was first envisaged by Wilcoxon, Prof Mann and his student Donald Whitney. Their efforts gave rise to an entire branch on non parametric statistics.

Prasanta Chandra Mahalanobis

Mahalanobis is credited for coming up with "Randomized sampling", instead of "opportunity" Or "judgment" sampling. Under the then Indian PM, Nehru, he went to create economic indices which became crucial for tracking the performance of various five year plans.

Jerome Cornfield

The first example of inverting a 24*24 matrix and helping a noble prize winning economist are some of the few contributions of Jerome Cornfield. In a way one can say that he helped in bringing out the popular Input-Output economic analysis.

Many of Professor Cornfield's numerous contributions to both biostatistics and public health grew out of his research on the health effects of smoking. He became interested in the use of case-control studies after reading seminal papers by Doll and Hill (1950) and Wynder and Graham (1950), which used this methodology in early discoveries of the association between smoking and lung cancer. Professor Cornfield then demonstrated that case-control studies can be used to estimate the risk of disease as a function of smoking status so long as the rate of disease in the population is known. He also showed that the odds ratio, an approximation to the relative risk for rare diseases, can be estimated either prospectively or retrospectively. These results form the basis for much of the modern era's epidemiologic research.

Gertrude Cox

Cox is the first woman to be elected into the International Statistical Institute.In 1950 Cox and William G. Cochran wrote the book Experimental Design that became a classic in the design and analysis of replicated experiments.

Stella Cunliffe

Another towering woman in the statistical world

Samuel S Wilks

Samuel Stanley Wilks was an American mathematician and academic who played an important role in the development of mathematical statistics, especially in regard to practical applications.Wilks worked with the Educational Testing Service in developing the standardized tests like the SAT that have had a profound effect on American education. He also worked with Walter Shewhart on statistical applications in quality control in manufacturing.

Famously remembered for the Box-Cox transformation, a technique used to reduce data variation, make the data more normal distribution-like, improve the correlation between variables and for other data stabilization procedures.

Deming

Deming is credited to have brought quality movement. His contribution to the quality was from a statistical standpoint where he looked at variation in output as a combination of common cause variation and special cause variation. He championed for the reduction of common cause variation and that changed the face of Japan

LÉVY, Paul Pierre

Levy, was dissatisfied with counting methods in probability. He developed and applied Martingales to clinical trial studies and today Martingales are used in so many domains, finance, modeling, you name it...Martingale has entered the common vocabulary, thanks to Levy

David Dickey ( Dickey-Fuller unit root test - Stationarity test)

Stationarity of residuals , one of the methods to check is the dickey fuller test.

BAHADUR, Raghu Raj

India born mathematical statistician considered by peers to be "one of the architects of the modern theory of mathematical statistics". He is popularly known in the context of Anderson-Bahadur algorithm

Wald

Inverse Gaussian distribution, also called Wald Distribution , came from Abraham Wald.Also popularly known for Wald Chi square Test which is a statistical test, typically used to test whether an effect exists or not. In other words, it tests whether an independent variable has a statistically significant relationship with a dependent variable.

Brad Efron

Credited for discovering "Resampling method", one of the greatest breakthroughs in field of statistics

Finally, the author who has has put a fascinating account of the above personalities and many more, all in one book - Dr. David Salsburg

As someone who worked for over 36 years as a test engineer in the semiconductor industry, I found this book wonderfully fascinating. It's a superb historical introduction to statistics and probability theory. The global semiconductor industry would have been impossible without statistical tools that were pioneered by the men and women described in this book.