Working with delisted or bought-out companies poses some difficulties in how to organize price data for backtesting. I've been backtesting some investment algorithms on free data from Yahoo! where I don't have the issue of working with delisted companies. I can simply take adjusted closes as authoritative prices. I can keep this data in memory using a (Python) pandas DataFrame.

For delisting, the situation is a bit more complicated, and I wonder how to handle it so that when I start buying data that doesn't have survivor bias, I can put it into a uniform format for purposes of objective backtesting.

The simplest case is that the company simply went broke. For example, LEH (Lehman Bros.) at some point just went to 0. But there are other cases:

Buyouts for shares of buying company (as will presumably happen with TSLA and SCTY). How does non-survivor-biased data deal with this situation? If I'm backtesting, say, since 1996, and such an event had happened in 2000, would the adjusted closes for SCTY-X simply track TSLA-X after the buyout date? Otherwise, one would have to somehow manually groom the data, which would be a pain.

Buyouts for cash (PFZ's buy of MDVN, for example) also aren't really the same as just keeping the adjusted close at a fixed price following the event. The event somehow needs to be tagged so that an investment strategy can now distinguish the cash from a stagnating investment.

My question boils down to the best way to organize a DataFrame with a TimeSeries index for an individual stock so that an algorithm can handle such events in addition to dividends and splits, which the 'Adj Close' column handles for you. The DataFrame returned by pandas gives back as columns ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']. What would need to be added to this data structure to enable backtesting that includes delisted companies of various kinds?

2 Answers
2

The problem you describe isn't trivial. Mainly because once you have it solved for all current known cases someone will figure out a way to do something different and mess up your system.

Here are some approaches that I have seen somewhat successfully used. I won't claim they will give you complete coverage of every case, but they may give you some ideas to start with.

The first suggestion is to not use the stock symbol as the primary ID for a company. You want to have your own form of data ID for every company that you deal with. This would be paired with a system for translating to and from your ID and possible symbols. This system will need to be aware of the symbology used by different data sources and the symbol changes over time.

This addresses the false assumption that a stock symbol never changes and is consistent between data sources. This gives you better control over ticker changes, shifts to pink sheets and many other situations. It is extremely easy to mistake a symbol change for a company getting delisted if you don't track the company separately from the symbol.

My next suggestion is to always maintain a set of RAW data to work from. This would be the actual price as listed at that moment in time. Adjusted data sets can be great to work with, but it is extremely easy to gloss over what events really mean when you have them pre-baked into the data. As a quick example, a trading system that functions on a stock with dividends may not care about the dividend as a factor in the trade, but the dividend event can have a significant tax impact depending on the timing of the exit. It is tricky in pre-adjusted data sets to spot the situations.

The next part is to have a system of storing and retrieving events in your analysis system. The way my group approached this was to have a dataset of company actions linked to the company ID and the date-time of the event. This would include descriptive information about the event, but also an attached action.

The actions were a simplified programmatic description of what happened to the stock at that moment in time. The action contained instructions that the program could use to modify the current holdings. This could include cash disbursements, splits, symbol changes, etc... Each instruction told the program what to do in that specific situation.

As a specific answer to your delisting question, we would have an action that specifically notes the delisting and have our analysis system take actions on the trade based on that situation. These need to be separate for other similar situations such as trading holds that may or may not lead to delistings.

The system would take a query for data from a date range and compare it to actions during that time and adjust everything based on the specific need. This would then output the analysis dataset and any annotations about events needed for the analysis.

Edit - I am giving details to help with the question below.

Here is a simplified example of the data structure used;

Companies are in a table and denoted by a specific ID. Each company has a one to many relationship to StockSeries(Each with a unique ID). There is also a utility table that translates data source symbols to a specific StockSeries based on the point in time referenced. This can be used bidirectionally for lookups.

Each StockSeries has a raw data time series linked to it. (note - We don't use a database for this, but that is irrelevant.)

Each Company has a one to many relationship to an Events table that holds Descriptive, Qualitative, and basic event specific Quantitative information about corporate events. This allows us to group and categorize events, but it isn't specifically used for acting on data.

Each event has a one to many relationship to an Actions table. Each Action is a specific programmatic action to take on any trade holdings for a specific StockSeries.

The way this is set up means that you do need to pay attention to what series the action applies to, but it also allows us to see what is going on from the company level.

Honestly, this is a massive oversimplification. As you build a system like this you will find case after case that doesn't fit or requires some special handling.

I described everything in terms of database structure because your question seemed to infer that is how you are thinking, but much of this was done without a formal database. You should find the data management system that works best for you.

If you have specific questions I can share what I have done in the past. I am sure others will have good suggestions also.

$\begingroup$This makes a lot of sense. Is there any kind of standard format in which providers like Norgate or Quandl (which seems to be a unifying platform for various data sources) handle this? I'd like to get set up in a way that would allow me to convert fairly easily to my own standardized format (assuming there isn't an "industry standard" already in place). Your technique would seem to map to a db with 3 tables: 1 matching an entity to a symbol (multiple symbols allowed for the same entity), another with the price data for entities, and another with events (for which the format is likely tricky).$\endgroup$
– Marshall FarrierSep 4 '16 at 3:35

1

$\begingroup$It ends up being a bit trickier and quite a few more data structures. I was just trying to state the concept above. The edit I just made added a lot more detail, Hopefully, it is helpful. I have never found a "Standard" way of doing this. If you find a well thought out approach please post it back.$\endgroup$
– drobertsonSep 4 '16 at 4:17

1

$\begingroup$You got me exploring standard ways of doing this and I did find one source for a reasonable methodology. The Chicago Booth School does extensive research into financial markets. They also curate research datasets. This is their data methodology; crsp.com/products/documentation/…$\endgroup$
– drobertsonSep 4 '16 at 17:56

$\begingroup$Are you using Mongo, then to store your source data? For my own backtesting, I was hoping to buy a complete historical data set (which in most cases seems to be set up as a database) and just have an in-memory data structure of my own. The latter would need to be able to accommodate just about anything so that I could switch providers if I don't like the first one I try and, most importantly, would need to provide all information needed for my system to take the appropriate actions. I like your idea of mapping an event to a list of actions.$\endgroup$
– Marshall FarrierSep 4 '16 at 18:16

$\begingroup$An action might also be set up as an abstract base class that would then allow complete flexibility in adding a new implementation if something a bit different comes up.$\endgroup$
– Marshall FarrierSep 4 '16 at 18:18

If you have a stock selected by whatever rule and the company is acquired or has merged, one could argue, that the investment case (fundamental / technical rules) is not in place anymore and you do not want to keep the successor stock, but instead sell the old one.
The standard price pattern for these cases would look like this:

Airgas was aquired by Air Liquide. The transaction was completed at 23rd of May this year (see http://investor.shareholder.com/arg/releases.cfm). But the real economic impact of this happened already in November of last year when the agreement between the two companies was announced.

If one had traded this in some kind of quantitative strategy, you usually would check if there is a possibility of a counter bid and if not or unlikely, you would sell the stock very soon afterwards. Because, as one can see clearly in the chart, the price won't do much anymore.

For backtesting purposes only, the easy way would be to sell every stock which has no further price history (e.g. for Airgas at the end of April). More sophisticated would be to check daily/weekly every stock in your portfolio for an abrupt change to an ultra-low volatility regime (e.g. through historic volatility and daily bar size) and sell every stock immediately if this ultra-low volatility criteria is fullfilled, because it indicates with a very high probability an acquisition/merger/etc.

$\begingroup$Anything that just has "very high probability" is inadequate. What I'm looking for is data that describes the market definitively. There is paid data that claims to do this, but since I haven't purchased any yet, I don't know how they organize it. NaNs after a certain point might work, but it isn't clear to me that that really captures all cases.$\endgroup$
– Marshall FarrierSep 3 '16 at 19:08

$\begingroup$In a professional quality database if a stock is delisted there will be on that date a 'delisting code' that gives the reason for the delisting and an 'after delisting value' that represents the value of any cash or securities that you receive. If the security is utterly worthless the 'after delisiting value' would be zero. There may be a 'delisting return' expressed as a percentage (= -100% if you lose everything).$\endgroup$
– Alex CSep 3 '16 at 21:46

$\begingroup$An important feature of such a d.b. is that the stocks are not identified solely by their ticker because tickers can be reused (C used to be chrysler and is now citibank). Therefore there is a need for a Unique Identifier as well as tickers. The Unique Identifier will never be reassigned.$\endgroup$
– Alex CSep 3 '16 at 21:53

$\begingroup$Most professional grade data providers do give historical data about corporate actions. Basically it will tell you which corporate actions happened (delisting, merger, stock split etc...), on which date, the impact on the price etc... and all you have to do is do some sequential treatment on a case by case basis.$\endgroup$
– BS.Sep 4 '16 at 0:34