Pages

Monday, November 29, 2010

SPSS Guru embraces the freeware, R

From Forbes:

Power in the NumbersQuentin Hardy, 05.24.10, 12:00 AM ET

We clock up more new data every couple of weeks than humanity made from its start until the year 2000. This mass of data promises freedom, efficiency and power to the societies that know how to manipulate it. Norman Nie taught us how to manipulate data.

While a newly minted political science Ph.D. in 1968, Nie co-invented the Statistical Package for Social Sciences. Along with innovations from a few other pioneers like the SAS Institute, it marked the birth of analytic and predictive statistical software, stuff that tells corporations how to make, price and sell things. Now 67, Nie is championing a new style of fast and cheap prediction analytics that will give statistical tools to laymen, just as word processors made us all publishers and YouTube made us all film producers.

"We are at the beginning of a process that will change the nation-state," he says. "People will look at reactions to a product or a policy in ways never before known. It could mean highly empowered individuals or better manipulation and control." Either way, he figures, we need to get smart about it.

Statistics is a science of comparatively recent invention, beginning in the 17th century with analysis of gamblers' odds and flowering in the early 20th with tools for averaging, weighing and guessing. The human brain is a statistical manipulator in that it detects patterns and likelihoods from a large collection of observations. You might decide whether to eat at a new restaurant by judging factors like how crowded it is, whether there is a maitre d', whether you've liked that type of food before. What our brains do with a hunch, computers do by looking at data points, sometimes trillions of them.

Nie's SPSS started out as an academic exercise and soon became a serious business (one that IBM bought for $1.2 billion last summer, two years after Nie left). It's a collection of software programs that let experts quickly detect patterns and thus make predictions, whether of how pricing affects Mother's Day Internet flower sales or how demographic changes determine the number of prisons needed.

Combine that commercial success with the open-source software movement and you have a new company, Revolution Analytics. The firm, which was founded in 2007 in Palo Alto, Calif., now has around 30 employees.

Revolution takes as its starting point the statistical programming language R, a freebie invented in 1993 by some academics in New Zealand and since then enriched by many volunteers. The public-domain library of R software numbers 2,500 routines.

Using an R package originally for ecological science, a human rights group called Benetech was able to establish a pattern of genocide in Guatemala. A baseball fan in West Virginia used another R package to predict when pitchers would get tired, winning himself a job with the Tampa Bay Rays. An R promoter in San Francisco, Michael Driscoll, used it to prove that you are seven times as likely to change cellphone providers the month after a friend does. Now he uses it for the pricing and placement of Internet ads, looking at 100,000 variables a second.

R is a powerful tool but difficult for novices to use. Nie's Revolution Analytics aims to make it more accessible with a better-organized library, capabilities for bigger jobs and a user interface that lets users drag and drop statistical analyses into place, outputting easily read charts. Revolution offers a free stripped-down version for academia and a deluxe version for business, which Nie says will undercut SPSS' and SAS' prices by 80%. Revolution has won customers like Pfizer, Yale Cancer Center, Bank of America and Motorola.

Nie may even hope to do some good. He got rich from SPSS but remained an academic at both the University of Chicago and Stanford. An early book, The Changing American Voter, used statistical analysis to display a growing sophistication and tolerance among the electorate. A later volume, Education and Democratic Citizenship in America, charted how our more educated population was more tolerant of difference but not more participatory in government.

Nie thinks his own creation has played a role in a current problem in America, the rise of extremes in political parties. These days, he says, cheaper publishing technologies and, yes, sharper statistical analysis have given us a more riven politics.

"Polling has been a blessing because you can find out what people really want," he says, "but you change the standard deviation around the middle when you can slice and dice more and more. ... Technologies are more efficient, but they seem to pull apart society, identifying the most ideologically active."

The primary system was supposed to take power from the special interests in smoky rooms but created low-participation polls that attracted zealots on both sides of the aisle who could be found using social science statistics. They elected candidates as extreme as themselves. Between those two factors, says Nie, "we are in deep trouble."

His answer: Change politics further with R-powered statistics. Nie posits that statisticians can act as watchdogs for the common man, helping people find new ways to unite and escape top-down manipulation from governments, media or big business.

"Everyone can, with open-source R, afford to know exactly the value of their house, their automobile, their spouse and their children--negative and positive," he says, presumably joking about the last bits. "It's a great power equalizer, a Magna Carta for the devolution of analytic rights."

He is also a realist and says the opposite could also happen, particularly as more and more of what we do and say is captured as statistical phenomena. "The customer pressure on business has never been better, but knowledge of manipulation and control has never been better, either," he says.

Predictably, both SAS and IBM have taken steps to accommodate R and posit themselves in the middle of the predictive analytics revolution. IBM says that algorithms written in R will be accessible in their system, and SAS is making room for R results to be displayed in its technology. "The more perspectives you have on analysis, the better," says Anne Milley, a senior director at SAS. "Science used to be considered deterministic, but we live in a probabilistic world."

Nie may also take flak from R's open-source community, which includes plenty of fanatics who think software wants to be free. Says R co-inventor Robert Gentleman, who sits on the Revolution Analytics board: "If he comes out saying that he's better and faster, he's just going to annoy people."

Nie grew up in St. Louis, dropping out of high school to go to Mexico, where he became a published fiction writer while in his teens. He came home to take a degree in sociology and political science at Washington University in St. Louis. His Ph.D. dissertation at Stanford involved heterogeneous data from seven different countries.

"There was no way I could get through all of that by hand," Nie says. "I had to figure out a way to program for statistical procedures." Working with Hadlai (Tex) Hull, a recent M.B.A., and Dale H. Bent, a doctoral candidate in operations research, he developed SPSS as a shortcut to crunching different types of data. SPSS quickly caught on, and people began asking if they could use it on their university's computers. The developers figured they could sell tapes of the code for $400, about what a junior professor could pay without departmental permission.

Nie went to the University of Chicago in 1969 and continued working with Hull on developing SPSS. In 1973 the Internal Revenue Service told Chicago it had a profitmaking company inside the school, and Nie was advised to take a year off to develop his project. SPSS was incorporated in 1975, with no venture capital or financial backing but strong ties to academic buyers. Nie and Hull shared ownership after buying out Bent, who returned to Canada.

It is all in the rearview mirror. "R is an absolutely massive advancement on the kind of analytics I invented," he says. "It's an opportunity to change the game in the fastest-growing field in software."

Nie is working on a new book about great disruptive technologies in history: the printing press, the cotton gin, birth control pills, the Internet. Analysis software, he feels, will change the world again, likely in ways we still do not understand.

Across the street from Revolution's office, Stanford offers a graduate course titled "The Elements of Statistical Learning," whose 700-page textbook is dominated by R functions. Many of its graduates are building social sites like Facebook and Twitter.

"Large corporations creating data, people in international social groups, Internet translation of languages--it erodes national boundaries," Nie says. "What's all this mean--highly empowered individuals or better control? It does possibly create a more anarchical and unethical world. Maybe business can't be observed as much, but maybe it can be observed better. All technologies are two-edged blades."

No comments:

Post a Comment

About Me

I am an associate professor at the Ted Rogers School of Management at Ryerson University. I am the author of Getting Started With data Science: Making Sense of Data with Analytics.
My academic interests are analytics, housing and transport markets in urban contexts. My other interests are South Asian culture, politics, and economics.