Quandl Creates a Town Square for Data Access

by Nick WilletTwo
Plus Two Magazine, Vol. 14, No. 1

Technology and information retrieval options often move forward and then fall back again. I thought about this a few weeks back when I found a worn handbook with codes for the Dow Jones databases. Over twenty-five years ago, the company had a service called Dow Jones Spreadsheet Link. For the price of thirty bucks a month, you could access Dow Jones data during the evening and hours before dawn. For a group of companies of your choosing, it was possible to download everything from fundamental information to detailed earnings estimates and analyst revisions. There was not a lot that was off limits. You could also use codes to get information on which companies were buying back their own stock, raising dividends, and a number of different investing scenarios.

Then there was the news. I remember I would download the full text of the New York Times, Los Angeles Times, and USA Today each day. I can still remember sitting in front of my CRT monitor while eating cereal each morning and reading articles. The memory of it is both cutting edge and Stone Age. I had access to the news I wanted in plain text files. I would love instant access to the text of every business section in a major newspaper today. The reason is speed. You can find what you want very quickly as opposed to picking through a large number of sites or papers. I’m sure it would cost a lot more money these days, however.

When you look at the whole package, it was an interesting trove of data at a time when that information was not being used by every investor. Unfortunately, the service was eventually discontinued.

I was reminded of this when I examined Quandl (www.quandl.com) in recent days. Geared to quants of every stripe from professional to amateur, it is a data market as well as a warehouse. As such, it offers another leap forward. In the past as well as the present, database oriented investors had to scrape, purchase, and compile information they found useful. Often, data could only be analyzed after tracking it for a long period of time. According to a recent article in Canada’s National Post, the Toronto based Quandl was formed in 2014, so it is of fairly recent vintage. Quandl offers an enormous number of databases, both subscription based and free covering a broad range of factors and markets.

Machine Learning has been a popular phrase in recent years and in the investing world has been used to drive hedge funds in addition to the more newsworthy function of driverless cars. Of course, this is nothing new, with each new generation of computing technology comes an adaptation to investing. From black box driven mutual funds to the incredible success of D.E. Shaw and James Simons, technology and programmers have been front and center. The key factor in Machine Learning and AI is data. So, much larger databases are being parsed now by investors and traders. A central source with information accessible to programmers and analysts using a broad spectrum of tools makes sense and is the idea behind Quandl.

Attacking Survivorship Bias

In the past, I have talked about the survivorship bias issue when analyzing equities over multi-year periods. For investors, this can become a huge issue given the pace of change in business. To illustrate the problem, I looked at stocks in the Value Line Investment Survey from five years ago in January 2013. I filtered by stocks trading on NYSE and NASDAQ and ranked them by trading volume in the previous month. Choosing the top 1,000, I looked for current quotes by ticker symbol. 82 stocks had no match. Having no results data for 8.2% of your dataset introduces a lot of distortion when you are trying to analyze data. The problem gets worse the longer your time frame and the broader the group of stocks you wish to analyze.

I have often noted the difficulty with tracking equities over long periods of time was building a proper results database. Unless you are following companies, you don’t realize just how many tickers are delisted or changed over multi-year periods. The CRSP (Center for Research in Security Prices) database, of course solves this, but is out of the price range of most investors. The best solution I have had is downloading quote data on a regular basis, and using the last data you have for tickers. It is not perfect but it allows you to do analysis. The availability of a reasonably priced database with change and delisted ticker information greatly improves the process.

Quandl offers a variety of end of day databases to help deal with this issue. There is a free Wiki EOD database that covers over 3,000 United States stocks, some of them delisted. This database is created by the Quandl community and includes adjusted prices for dividends and splits.

A premium database from Sharadar is also available on Quandl. It currently covers over 10,000 tickers with data from the year 2000 and the company is continually expanding coverage. The data includes delisted companies and Sharadar provides tables with fields for previous ticker symbols. You won’t find every defunct ticker symbol but the product seems to be moving towards that goal. The cost is $399 a year for individual users and there is also a cost-efficient bundle that includes fundamental, insider, and 13F databases.

Quotemedia provides a database of US Stocks and ETFs that trade on NYSE, NASDAQ, AMEX, and ARCA. The history on each ticker goes back to 1996. It contains splits and dividends and contains original price and volume data along with adjusted figures. The cost for individuals is $49 per month or a discounted rate of $449 for yearly subscribers.

If you are really in an investigative mood and building your own dataset, Zacks offers a database of historical daily maintenance for stocks the company has tracked since 1987. (7,000 US and Canadian stocks.) The price of this is $1,000 per year for individual users. Thus, you could track down the tickers you are missing.

The upshot of all of this is creating a quote database to get meaningful return values for the factors and companies you track is now well within reach for individual investors. Like information in every venue, what used to be available only to professionals and academics is quickly in the front windshield of everyone.

For traders with a much shorter time frame, minute by minute price data is available for Nasdaq 100 as well as the S&P 500 on a subscription basis. There is five years of history as well as new information each day. With a never-ending news cycle, traders can gauge the impact of events in the minutes after they happen. You can also see how overnight news affects equities from the open throughout the day. If you want to look for strategies based on the time of the trading day, the data is useful here too. Of course, given the amount of data involved some database skill is required. In some cases, this may be the point. Often when you are developing data analysis skills with new tools it is nice to have a large amount of data to work with.

The Value of your Time

Of course, data is only useful if it is unique enough to offer an edge. Database offerings will often save time here because the history is already available and if you have a results database it can be evaluated quickly. There are quite a few databases with multi-year histories in them on Quandl. This is nice because whether you are scraping, downloading, or purchasing data you are usually building your dataset over time. Being able to analyze immediately is a huge advantage.

Throughout the years, I have often found myself building implied volatility databases for option trades. In the beginning, this involved using code with various pricing models and combining it with available quote data. Eventually, implied volatility became a field in option chains available from various vendors and brokerage sites. In each case, you were building your database over a period of time and maintaining it for changes such as stock splits or mergers.

There are multiple databases on Quandl with built in histories containing implied volatility figures for equities. Historical Volatility information fields are also a part of the data set. While they are subscription based, someone doing analysis immediately has access to over a decade’s worth of data. Thus, whether it is useful to you or even if an edge cannot be generated, the time frame for finding out has been compressed greatly. Ultimately, it allows you more avenues of research.

Finding your Format

Whether you are old school or new, there is likely a data format you can use. The API offers everything from data export in text files or JSON to customizable calls using your tool of choice. If you want to code it yourself, you can find tools and resources on the website. There are add-ins and libraries for popular tools such as R, Python, and Excel. There are also libraries for statistical packages and programming languages.

For people who love data analysis, the site feels like a playground. Even if you just want to sharpen your skills, you will find something useful to test them on. I have a personal bias here. Watching the increasing availability of industrial strength databases for programmers and the speed benefits of faster processors and solid-state drives has been enjoyable. Not every talented investor or trader comes through traditional channels. At their best, financial markets bend towards meritocracy. Finding meaning in the oceans of data being produced will produce the next generation of great traders and investors.

There is data available geared to the entire spectrum of financial markets. We have covered just a small piece devoted to equity and option prices. In the free datasets, there are many different databases of international equity data as well as domestic economic indicators. Futures and cryptocurrency exchange data is also available. An interesting free database contains real estate data for different areas from Zillow. This data can be used in a number of ways, from tracking growing areas to finding value in the real estate market.

For equity and options traders, the premium databases track the variables that have often been cited as the drivers of returns. This can be combined with personal data to see if a viable advantage can be gained. As stated earlier, deriving an edge is difficult when variables are well known. Still, the access to this information makes a decision as to value much quicker and more definitive.

Given this, making sense of alternative data has been a focus lately for many hedge funds, investment banks, and well-heeled investors. The Quandl site also has a gateway to alternative data, though the cost of much of this from vendors has traditionally been beyond the scope of retail investors.

Summary

In this article, we reviewed Quandl, a site that acts as both a marketplace and repository for a huge amount of financial data. Much of the data is free and subscription based offerings cover factors of interest to investors as well as short term traders. The flexibility of data access methods and formatted output allows most analysts an easy way to obtain the information they need.