How to High Frequency Trade

Monday, July 16, 2012

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

The first step in any data-centric analysis is to understand the input data, and subsequently assess its quality. The common refrain, "Garbage In, Garbage Out" becomes more than a cliche if you have ever made the mistake of not understanding input data prior to use in whatever model you are building. Often, garbage data will yield results that show great promise, only to be revealed as noise upon further, time-expensive, emotionally draining analysis. Worse, the garbage data may mask the genuine validity of a model, leading to erroneous rejection.

The remainder of this post assumes you have downloaded an ITCH file from TradingPhysics. (Note: They occasionally gift 5 download credits for a signup -- chances are you can follow along without paying anything for one file.) I will be working with the ITCH data for SPY on July 9th, 2012. I have not provided the code used to produce this post, as it is embedded in a larger (proprietary) system I have built over time; however, none of the statistics discussed are difficult to run in your preferred language / environment.

Data Checks

Referential Integrity

A test was conducted to assert that no message refers to an order id prior to the display of the initiating AddBuyOrder or AddSellOrder. This test passed. There are no references to orders not present in the ITCH file.

Ascending Timestamp

A test was conducted to assert that the messages were in chronological order; that is, no message existed in the file that was timestamped before any message previously seen. This test passed.

Ascending Order Id's

A test was conducted to assert that all AddBuyOrder and AddSellOrder messages had order ids in ascending order; that is, when an AddBuyOrder or AddSellOrder was printed, the order id was greater than any order id previously seen. This test failed.

Why? The order id is a unique reference within the context of a single day. Although I have found no text explicitly stating that order id's are incremental, the assumption seems likely. Looking more closely, it is apparent that there was only one offensive order: an AddBuyOrder with a 09:30:00.135 timestamp, an order id of 527,679, and a price of $124.33.The recorded low for the day was 134.70. Separately, simulation of the order book shows that this order would never have been hit. This order was deleted at 16:00:01.815, and was not hit for a fill. If this order id was excluded, the minimum observed order id was 764,002 and occurred as the first message in the file.

I'm not sure why this order appeared, but I believe it can safely be ignored.

Extreme Quotes

After observing the single, out-of-order order id, I dug deeper in the the distribution of order prices. The minimum price quoted bid was $0.01 and the maximum quoted ask was $190,000.00. I'm not entirely sure why these quotes exist, but I suspect they are not errors. Instead, I suspect they represent a form of disinformation, targeting naive traders who look at summary statistics operating on the quotes.

As a contrived example, assume a trader placed sell orders as a function of the size weighted average bid price. The maximum size of bid orders existing at once on the book was 1,939,707. If all but 100 shares of that size were quoting at $0.01, and the 100 shares were quoting at $135.00, the average would drop $0.007 -- almost a full penny. Again, it's contrived, but almost a penny might evoke exploitable action.

Delete Full and Cancel Part

Investors who are unaccustomed to viewing order flow may look at the relative frequency of Delete Full messages with skepticism, or at least caution. Nearly fifty percent of all messages are deletes. While it may evoke skepticism, this is a reality it contemporary markets. Traders are continuously jockeying for position. To do so, they are constantly placing and canceling orders. (Although I plan on writing subsequent posts on why this happens, the No BS Trading eBook does an excellent job for the uninitiated.)

Of the buy orders, 215,433 (29.1%) are canceled within one second; 216,975 (27.1%) of the sell are canceled within one second.

Add Sell Order and Add Buy Order

The number of buy orders and sell orders are roughly in balance, with sell orders dominating slightly.

Execute Full and Partial Execute

Again, in contradiction with an investor's instincts, only 3.31% of all posted messages are executions. Looking from a different perspective, only 6.73% of buy or sell orders result in any form of execution(7.00% with non-displayed executions). Even more impressive, only 0.61% of all AddBuyOrder and AddSellOrder size results in an execution. (The sum of all AddBuyOrder and AddSellOrder size was 1,092,134,379 shares; the number of shares traded was 6,615,069, excluding the opening and closing crosses.)

Also note, looking at the total size of all executions including non-displayed and bulk crossings (6,621,094.0) an SPY trader might notice that this is only a fraction of the daily SPY volume. SPY is listed on the NYSE Arca; he Nasdaq-ITCH order flow only represents SPY traded on Nasdaq.

Execute Non-Displayed

Looking at frequency relative to all message types, ExecuteNonDisplayed messages are rare; however, the frequency of non-displayed executions relative to all executions is more impressive. By number of messages, roughly 3.24% of all executions are the result of non-displayed orders; however, by volume, roughly 20% of all executions are the result of non-displayed orders. (5,301,218 shares were executed on displayed orders; 1,313,851 were executed on non-displayed orders.)

Non-displayed orders are interacted with indirectly -- that is, by executing against them. A message of the executed order's price and size is printed, but there is no complementary, preexisting AddBuyOrder or AddSellOrder. Additionally, the order may have reserves, obfuscating the underlying order's size.

Bulk Volume Cross

Two bulk volume crossings are present. One occurs at 09:00:00.135 and the other occurs at 16:00:00.540, representing opening and closing crosses, respectively.

Looking at the (CSV) file positions of the messages, the opening cross is at line 90,697 and the closing cross is at line 3,097,695; that is, the opening and closing crosses do not exist at the head and tail of the file, as might be naively expected. This expectation fails to consider that the TotalView-ITCH stream prints extended hour orders (7:00am to 8:00pm) as well as regular market hours (9:30am to 4:00pm). The first message in the file occurs at 07:00:03.069 (AddSellOrder); the last message in the file occurs at 20:00:00.159 (DeleteFull).

Notice: The only relationship I have with TradingPhysics is as a customer. I receive no benefit from them for this post.

Wednesday, February 2, 2011

I set up this blog a few days ago (July, 2012) and noticed an unusual amount of incoming traffic given that I had nothing out of draft mode yet. It turns out that someone (WK Selph) had this blog registered before me and had a few popular posts. How to Build a Fast Limit Order Book was one of them.