The Fourth Paradigm

Microsoft Research published a book based on Dr. Jim Gray’s research into Computer Science, called “The Fourth Paradigm”. It’s about 300 pages long, and it fascinated me so much I stayed up reading it last night. I have so many notes to decompress it will take me weeks. If you’re a data professional, I highly advise that you take some time to read it.

The title refers to the Dr. Gray’s theory on scientific paradigms. The first, called the empirical paradigm, deals with what we can see, feel and touch. Greek and Chinese scholars dealt in this realm, all the way up through the times that the Arabic numeral system became prevalent in the West.

The second paradigm was theoretical – meaning that you could take a model or guess about some natural subject and test it.

The third paradigm is computational. Here simulations became paramount, so mathematics becomes the basis for theories and testing.

And so that brings us to the fourth paradigm – a time when we can gather more data than the human mind can effectively deal with, so we use computers to mine and analyze the data. This technology, which Dr. Gray was working on when he died, represents the next scientific frontier. He worked with the SQL Server team and helped us develop and test SQL Server 2008. A version of 2008 is what is used in the Pan-STARRs project at Berkley, taking in Terabytes of data, well on it’s way to becoming a multi-Petabyte system within a year. (Who says SQL Server can’t scale!)

This book covers natural science, biology and health sciences, and also literary science. I highly recommend it.

There’s a certain section in it that deals with the 1/f distribution, which I’ll talk about tomorrow. It has real-world application in how you develop a database.