Log returns are commonly used in quant research as real data tends to be distributed with a fat tail. This is common in self-reinforcing/autocorrelated systems like financial markets, but the result is that normal distributions will heavily underestimate the likelihood of rare events. See our full lecture on this here:

Some stock prices are closer to a normal distribution when log transformed, so in the course of doing research it can be helpful to log transform your data before fitting models. Remember that at the end of the day prices are still prices, so don't assume just because log transformed returns are well behaved that you're not vulnerable to tail events.

In this example, you'll see that the asset is not truly log-normal, instead it was just a smaller sample size that caused it to pass the test.

This notebook is just a quick piece showing how to log transform prices.

In [1]:

importnumpyasnpimportpandasaspd# This is a plotting library for pretty pictures.importmatplotlib.pyplotasplt

In [2]:

# Research environment functionsfromquantopian.researchimportreturns,log_returns,symbols# Select a time range to inspectperiod_start='2012-01-01'period_end='2012-06-01'# Query returns data for AAPL# over the selected time rangeR=returns(assets=symbols('XLE'),start=period_start,end=period_end,)log_R=log_returns(assets=symbols('XLE'),start=period_start,end=period_end,)# Display first 10 rowsR.head(10)

Often though, returns will still not be normally distributed even with a log transform. Don't apply this blindly without checking.

Here we see that just by expanding the window and gathering more data, the test gains more power and differentiates the returns distribution we observe from a normal one. It would seem that the true process was likely not normal in the first place, it was just that we had too few samples to realize this.

In [12]:

# Select a time range to inspectperiod_start='2012-01-01'period_end='2016-01-01'# Query returns data for AAPL# over the selected time rangeR=returns(assets=symbols('XLE'),start=period_start,end=period_end,)rational_R=R+1log_R=np.log(rational_R)log_R.tail()

num_assets=data_output.shape[1]num_normal=0foriinrange(num_assets):# Get the series for the assetlog_R=data_output.iloc[:,i]result=normaltest(log_R)ifresult.pvalue>=significance_level:num_normal+=1

In [19]:

print'The percent of stocks which are likely normally distributed: %s%%'%(float(num_normal)/num_assets*100)

The percent of stocks which are likely normally distributed: 12.2470220886%

This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.