Building Models with Pivot Tables

Pivot Table Love @ Smurf on Spreadsheets
This is a really interesting post, mainly because of the discussion it’s generated – there’s definitely a diversity of opinion out there about the use of pivot tables. Personally, I have used them more often than not when building models, for the simple reason that once you’ve gotten over the initial learning curve on the way to using them they’re a very easy way to make sense out of large amounts of data.

My use of pivot tables has evolved over time, in tandem with my growth in understanding of best practice in spreadsheet modelling. When I first started using them, all I would do was put a truckload of data on one worksheet and use the wizard to build a table and present some summary stats on another. Which can be fine, but I think what I failed to appreciate at that point is the fact that there are lots of ways to extract summary stats, many of which are far less expensive in terms of file size. For instance, array functions, SUMIF/COUNTIF, or one of the many variations on the use of SUMPRODUCT (if you’re not using too many of them), can be very convenient in that they will save on the expense of creating the pivot cache. The database functions (DCOUNT, DSUM, DAVERAGE) are also very useful, and often enable the extraction of stats using far more complex criteria than would otherwise be available, directly from the dataset.

What I find myself doing more now is using pivot tables, in combination with GETPIVOTDATA, as an intermediate step on the way to building the inputs and calculations that do the work in the model. The question of file size often comes up, but unless the dataset is under, say, 10K rows, I’ll keep it outside the model. The SAS add-in for Office is a great option in that regard, with the option of opening a SAS dataset directly into a pivot table, or running stored processes or even adhoc queries on the server from inside the worksheet.

The really attractive feature of pivot tables, though, is that once they’re built they’re very flexible, and reasonably intuitive for the end user to change as needed. Of course, whether I use them in that way or not is going to depend on the objective of the model and the needs of the end user.