Fast money, slow data

The other day I bought lunch from a food truck on the Harvard campus, paying with a debit card that my server swiped through one of those little plastic doodads attached to an iPhone. Ten minutes later, while I ate my báhn mì in a shady corner of Harvard Yard, I opened my laptop and logged on to my bank’s website. The $7 transaction had already been posted to my account.

So here’s a question to think about: If the credit card networks and the banks can track the movement of money minute by minute, why does it take months to calculate the overall level of economic activity in the U.S.?

The Department of Commerce reports on gross domestic product (GDP) at quarterly intervals. This number (or its first derivative) is often cited as a kind of benchmark of economic wellbeing. The initial announcement comes a month after the close of each quarter; for Q1 2015, the news was released April 29. That news wasn’t cheering: a measly 0.2 percent rate of growth. The New York Timescoverage began, “Repeating an all-too-familiar pattern, the American economy slowed to a crawl in the first quarter of 2015….”

But that wasn’t the last word on the first quarter. At the end of May, a revised report came out, saying the winter months had been even bleaker than first thought: not +0.2 percent but –0.7 percent. Then yesterday a third and “final” estimate was released (PDF). It was a Goldilocks number, –0.2 percent, pretty near the mean of the two earlier figures. Said the Times: “While nothing to brag about, the economy’s performance in early 2015 was not quite as bad as the number-crunchers in Washington had thought.”

If we take it for granted that the end-of-June GDP estimate is somewhere near correct, then the first two reports were worse than useless; they were misleading. Taking action on the basis of those numbers—making an investment, say, or setting an interest rate—would have been foolish. It seems that if you want to know how the economy is doing in the first quarter, you have to wait until the second quarter ends. And of course we’re about to repeat the cycle. How did business fare this spring? Check back at the end of September, when my $7 food-truck sandwich may finally register in the government’s books.

Why so slow? To answer this, I thought I ought to learn a little something about how the Department of Commerce calculates GDP. Here’s where I learned that little something:

It’s worse than I had guessed. The baseline for all these “national accounts estimates” is an economic census conducted every five years (in years congruent to 2 mod 5). For the 19 quarters between one economic census and the next, numbers are filled in by extrapolation, guided by a miscellany of other data sources. Landefeld et al. write:

The challenge lies in developing a framework and methods that take these economic census data and combine them using a mosaic of monthly, quarterly, and annual economic indicators to produce quarterly and annual GDP estimates. For example, one problem is that the other economic indicators that are used to extrapolate GDP in between the five-year economic census data—such as retail sales, housing starts, and manufacturers shipments of capital goods—are often collected for purposes other than estimating GDP and may embody definitions that differ from those used in the national accounts. Another problem is some data are simply not available for the earlier estimates. For the initial monthly estimates of quarterly GDP, data on about 25 percent of GDP— especially in the service sector—are not available, and so these sectors of the economy are estimated based on past trends and whatever related data are available. For example, estimates of consumer spending for electricity and gas are extrapolated using past heating and cooling degree data and the actual temperatures, while spending for medical care, education, and welfare services are extrapolated using employment, hours, and earnings data for these sectors from the Bureau of Labor Statistics.

Is this methodology really the best way to measure the state of the economy in the age of Big Data?

Actually, there’s a lot to be said for the quinquennial economic census. It goes to a large sample (four million businesses), and the companies are compelled by law to respond, which mitigates selection bias. The data series goes back to 1934; maintaining continuity with past measurements is valuable. Furthermore, the census and other survey-based instruments probe much more than just transactional data. They try to quantify what a company manufactures (or what services it provides), what labor and material inputs are consumed, and where the company’s revenue comes from. The analysis includes not just income and expenditures but also depreciation and amortization and other scary abstractions from the world of accounting. You can’t get all that detail just by following the money as it sloshes around the banking system.

Still, can we afford to wait five years between surveys? Three months before we get a reliable (?) guess about what happened in the previous quarter? Consider the predicament of the Federal Reserve Board, trying to walk the narrow line between encouraging economic growth and managing inflation. This is a problem in control theory, where delayed feedbacks lead to disastrous instabilities. (Presumably Janet Yellin and her colleagues have access to more timely data than I do. I hope so.)

Could we really create an up-to-the-minute measure of national economic health by mining credit card data, bank accounts, supermarket inventories, and food-truck receipts? I really don’t know. But I’ll quote a 2014 Science review by Liran Einav and Jonathan Levin:

Whereas the popular press has focused on the vast amount of information collected by Internet companies such as Google, Amazon, and Facebook, firms in every sector of the economy now routinely collect and aggregate data on their customers and their internal businesses. Banks, credit card companies, and insurers collect detailed data on household and business financial interactions. Retailers such as Walmart and Target collect data on consumer spending, wholesale prices, and inventories. Private companies that specialize in data aggregation, such as credit bureaus or marketing companies such as Acxiom, are assembling rich individual-level data on virtually every household….

One potential application of private sector data is to create statistics on aggregate economic activity that can be used to track the economy or as inputs to other research. Already the payroll service company ADP publishes monthly employment statistics in advance of the Bureau of Labor Statistics, MasterCard makes available retail sales numbers, and Zillow generates house price indices at the county level. These data may be less definitive than the eventual government statistics, but in principle they can be provided faster and perhaps at a more granular level, making them useful complements to traditional economic statistics.

I suppose I should also mention worries about giving government agencies access to so much personal financial data. But that horse is out of the barn.

5 Responses to Fast money, slow data

There’s always the question of how representative this transactional Big Data is. It’s not the full population of all transactions—just a non-random sample, and it’s hard to extrapolate accurately from those.

First of all, it misses out on all cash transactions.

Second, who’d be in charge of collecting the data?
If the government buys this transactional data from a commercial vendor (say from MasterCard), they won’t have control or detailed knowledge about how it gets collected and whether it’s representative of the population (even just the population of non-cash transactions).
Or, the government could require all such vendors to give them all this data. Good luck pushing that legislation through.

I don’t mean to say that it’s a bad idea to get faster GDP updates from such data! It’s just a harder-than-it-seems problem, which hasn’t been solved yet in a statistically-responsible and politically-acceptable way :)

Currently focusing on Big Data as a group for an IT module in university and it is quite nice to see an article I read, “The Pathologies of Data”, linked to this blog that we are following. Correlates well and substantiates that Big Data is not necessarily Fast Data.