How do regulators cope with terabytes of data?

Traditionally, securities regulators have coped with the deluge of high frequency data by not asking for the data in the first place. The exchanges are supposed to be the front line regulators and leaving the dirty work to them allows the US SEC and its fellow regulators around the world to avoid drowning under terabytes of data.

But the flash crash seems to be changing that. The US SEC had to figure out what happened in those few minutes on May 6, 2010. When it attempted to reconstruct the market using data from different exchanges, it ended up with nearly 10 terabytes of data. The SEC says in its joint report with the CFTC on the preliminary findings about the flash crash:

To conduct this analysis, we are undertaking a detailed market reconstruction, so that cross-market patterns can be detected and the behavior of stocks or traders can be analyzed in detail. Reconstructing the market on May 6 from dozens of different sources and calibrating the time stamps from each source to ensure consistency across all the data is consuming a significant amount of SEC staff resources. The data are voluminous, and include hundreds of millions of records comprising an estimated five to ten terabytes of information. (page 72)

It turns out that the CFTC which regulates the futures exchanges is well ahead in the learning curve as far as the terabytes of data are concerned:

The CFTC also collects trade data on a daily, transaction date + 1 (“T+1”), basis from all U.S. futures exchanges through “Trade Capture Reports.” Trade Capture Reports contain trade and related order information for every matched trade facilitated by an exchange, whether executed via open outcry or electronically, or non-competitively (e.g., block trades, exchange for physical, etc.). Among the data included in the Trade Capture Report are trade date, product, contract month, trade execution time, price, quantity, trade type (e.g., open outcry outright future, electronic outright option, give-up, spread, block, etc.), trader ID, order entry operator ID, clearing member, opposite broker and opposite clearing member, order entry date, order entry time, order number, customer type indicator, trading account numbers, and numerous other data points. Additional information is also required for options on futures, including put/call indicators and strike price, as well as for give-ups, spreads, and other special trade types.

All transactional data is received overnight, loaded in the CFTC’s databases, and processed by specialized software applications that detect patterns of potentially abusive trades or otherwise raise concern. Alerts are available to staff the following morning for more detailed and individualized analysis using additional tools and resources for data mining, research, and investigation.

Time and sales quotes for pit and electronic transactions are also received from the exchanges daily. CFTC staff is able to access the market quotes to validate alerts as well as reconstruct markets for the time periods in question. Currently, staff is working with exchanges to receive all order book information in addition to the executed order information already provided in the Trade Capture Report. This project is expected to be completed within the next year; at present such data remains available to staff through “special calls” (described below) requesting exchange data. (page B-15 in the Appendix)

However, the flash crash did not put the CFTC’s data handling abilities to the test because most of the action was in the cash equity market and the only action in the derivatives exchanges was in a handful of index futures and options contracts.

Finally, I am puzzled by the statement of the SEC quoted above that “calibrating the time stamps from each source to ensure consistency across all the data is consuming a significant amount of SEC staff resources.” Regulators should perhaps require that exchanges synchronize their computer clocks with GPS time to achieve accuracy of a few microseconds. With the exchange latency times close to a millisecond these days, normal NTP (internet) accuracy of 10 milliseconds or so is grossly inadequate. I would not be surprised if some exchanges do not even have formal procedures to ensure accuracy of their system clocks.

All of which goes to show that traditional securities regulator strategies of not dirtying their hands with high frequency data is a big mistake. This should be a wake call for regulators around the world.