Removal of SMA being applied to the signal, since the signal is already the return of SMA data.

Illustration of a different company, at a later date, to show both the general high-volatility of AAPL, but also avoiding the initial data, which, since it uses MAs to calc the signal, is going to be very noisy.

dates: This runs start_date='2015-01-01', end_date='2016-01-01'. This a full year that contains a neutral market overall, but also includes ups and downs that give us some good environments to test in.

New backtest attached via PyFolio which works straight off signals (removing the SMA factor, and then also not working with buckets since we can't currently organize those right).

Further issues:

Currently, the quantiles in Quantopian's Alphalens tear sheet (.tears.create_factor_tear_sheet) is not lending itself to proper sorting, and we have an issue where an entire bucket can be filled and never change rank, since many companies may fall in there. Alphalens already has a new property for bins which lets you explicitly define their edges (rather than evenly dividing your data). This is absolutely necessary to properly test this, since we do not have an even spread of sentiment signals. That said, with these dates, we can at least get away with 3 buckets like was initially shown, but this isn't quite how you'd trade this, and these buckets aren't really fairly divided.

To start out, let's investigate a partner dataset using Blaze. Blaze allows you to define expressions for selecting and transforming data without loading all of the data into memory. This makes it a nice tool for interacting with large amounts of data in research.

aapl_sid=symbols('BAC').sid# Look at a sample of AAPL sentiment data starting from 2013-12-01.sentiment[(sentiment.sid==aapl_sid)&(sentiment.asof_date>='2013-12-01')].peek()

Out[14]:

symbol

sentiment_signal

sid

asof_date

timestamp

0

BAC

5.0

700.0

2013-12-01

2013-12-02

1

BAC

5.0

700.0

2013-12-02

2013-12-03

2

BAC

2.0

700.0

2013-12-03

2013-12-04

3

BAC

6.0

700.0

2013-12-04

2013-12-05

4

BAC

2.0

700.0

2013-12-05

2013-12-06

5

BAC

2.0

700.0

2013-12-06

2013-12-07

6

BAC

6.0

700.0

2013-12-07

2013-12-08

7

BAC

6.0

700.0

2013-12-08

2013-12-09

8

BAC

6.0

700.0

2013-12-09

2013-12-10

9

BAC

2.0

700.0

2013-12-10

2013-12-11

10

BAC

6.0

700.0

2013-12-11

2013-12-12

Let's see how many securities are covered by this dataset between 12/2013 and 12/2014.

In [15]:

num_sids=bz.compute(sentiment.sid.distinct().count())print'Number of sids in the data: %d'%num_sids

Number of sids in the data: 586

Let's go back to AAPL and let's look at the sentiment signal each day. To do this, we can create a Blaze expression that selects trading days and another for the AAPL sid (24).

In [16]:

# Mask for trading days.date_mask=sentiment.asof_date.isin(get_trading_days(pd.Timestamp('2014-06-01'),pd.Timestamp('2014-12-01')))# Mask for AAPL.stock_mask=(sentiment.sid==aapl_sid)# Blaze expression for AAPL sentiment on trading days between 12/2013 and 12/2014sentiment_2014_expr=sentiment[date_mask&stock_mask].sort('asof_date')

The sentiment signal tends to jump quite a bit. Let's try smoothing it by plotting the 5-day mean using the pandas.rolling_mean function. Note that we set the index of the Dataframe to be the asof_date so that the x-axis would be nicely formatted.

defmake_pipeline():# 5-day sentiment moving average factor.sentiment_factor=SimpleMovingAverage(inputs=[sentiment.sentiment_signal],window_length=5)# Filter for stocks that are not within 2 days of an earnings announcement.not_near_earnings_announcement=~((BusinessDaysUntilNextEarnings()<=2)|(BusinessDaysSincePreviousEarnings()<=2))# Filter for stocks that are announced acquisition target.not_announced_acq_target=~IsAnnouncedAcqTarget()# Filter for stocks that had their sentiment signal updated in the last day.new_info=(BusinessDaysSincePreviousEvent(inputs=[sentiment.asof_date.latest])<=1)# Our universe is made up of stocks that have a non-null sentiment signal that was updated in# the last day, are not within 2 days of an earnings announcement, are not announced acquisition# targets, and are in the Q1500US.universe=(Q1500US()&sentiment_factor.notnull()&not_near_earnings_announcement&not_announced_acq_target&new_info)# Our pipeline is defined to have the rank of the sentiment_factor as the only column. It is# screened by our universe filter.pipe=Pipeline(columns={'sentiment':sentiment_factor.rank(mask=universe,method='average'),},screen=universe)returnpipe

Now we can analyze our sentiment factor with Alphalens. To do this, we need to get pricing data using get_pricing.

In [23]:

# All assets that were returned in the pipeline result.assets=result.index.levels[1].unique()# We need to get a little more pricing data than the length of our factor so we # can compare forward returns. We'll tack on another month in this example.pricing=get_pricing(assets,start_date='2015-01-01',end_date='2016-02-01',fields='open_price')

Then we run a factor tearsheet on our factor. We will analyze 3 quantiles, looking at 1, 5, and 10-day lookahead periods.