Comments, observations and thoughts from two left coast bloggers on applied statistics, higher education and epidemiology. Joseph is a new assistant professor. Mark is a marketing statistician and former math teacher.

Thursday, April 18, 2013

Part of the Reinhart-Rogoff fall-out (see here for Joseph's take) has been a discussion of the role of Excel and similar programs in analytic work. Andrew Gelman has a post up on the subject that includes this quote from an unnamed statistics professor:

It’s somewhat surprising to see Very Serious Researchers (apologies to Paul Krugman) using Excel. Some years ago, I was consulting on a trademark infringement case and was trying (unsuccessfully) to replicate another expert’s regression analysis. It wasn’t until I had the brainstorm to use Excel that I was able to reproduce his results – it may be better now, but at the time, Excel could propagate round-off error and catastrophically cancel like no other software!

Followed by this assessment by Gelman himself:

Microsoft has lots of top researchers so it’s hard for me to understand how Excel can remain so crappy. I mean, sure, I understand in some general way that they have a large user base, it’s hard to maintain backward compatibility, there’s feature creep, and, besides all that, lots of people have different preferences in data analysis than I do. But still, it’s such a joke. Word has problems too, but I can see how these problems arise from its desirable features. The disaster that is Excel seems like more of a mystery.

Microsoft Excel is one of the greatest, most powerful, most important software applications of all time.** [** But, like many other Microsoft products, it was not particularly innovative: it was a rip-off of Lotus 1-2-3, which was a major improvement on VisiCalc.] Many in the industry will no doubt object. But it provides enormous capacity to do quantitative analysis, letting you do anything from statistical analyses of databases with hundreds of thousands of records to complex estimation tools with user-friendly front ends. And unlike traditional statistical programs, it provides an intuitive interface that lets you see what happens to the data as you manipulate them.

As a consequence, Excel is everywhere you look in the business world—especially in areas where people are adding up numbers a lot, like marketing, business development, sales, and, yes, finance. For all the talk about end-to-end financial suites like SAP, Oracle, and Peoplesoft, at the end of the day people do financial analysis by extracting data from those back-end systems and shoving it around in Excel spreadsheets. I have seen internal accountants calculate revenue from deals in Excel. I have a probably untestable hypothesis that, were you to come up with some measure of units of software output, Excel would be the most-used program in the business world.

But while Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets—badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way.*** [*** PowerPoint has an oft-noted, parallel problem: It’s so easy to use that people with no sense of narrative, visual design, or proportion are out there creating presentations and inflicting them on all of us. ]

To the extent there's a conflict here, I'm with Kwak on this one. For all their problems, I'm still a big fan of Excel and similar programs (such as the OpenOffice version I have on my laptop). They are indispensable in business for just the reasons Kwak lists.