"Trading is statistics and time series analysis." This blog details my progress in developing a systematic trading system for use on the futures and forex markets, with discussion of the various indicators and other inputs used in the creation of the system. Also discussed are some of the issues/problems encountered during this development process. Within the blog posts there are links to other web pages that are/have been useful to me.

Wednesday, 14 December 2011

Sunday, 11 December 2011

Following on from my previous posts the chart below is a snapshot of the last 150 days of the S&P E-mini continuous futures contract, up to and including the 9th December.

The top pane is obviously a plot of price showing triangles to indicate cyclic highs and lows when the Perfect Oscillator (middle pane) crosses its SMA (see previous post for explanation). As explained earlier the algorithm used in calculating the PO makes use of projected/predicted prices, and since these "forward prices" are available from the .oct function the bars are coloured according to the 5 and 10 day "forward prices." Based on a very simple if/or statement in the code essentially we should be long when the bars are blue and short when red. The few whites bars that are visible are bars where the conditions for being blue or red are not met.

Superimposed over the middle pane plot of the PO are the full period EMA and plus and minus 1 exponential standard deviation lines, a sort of Bollinger Bands over the indicator. The bottom plot is of the bandpass indicator which is used in the "forward prices" algorithm.

Based on the logic and theory of the PO the rules for use are essentially:-
1) go long/short according to the PO SMA crossings
2) confirm the above by the bar colours
3) exit/close up stops when the PO crosses its exponential standard deviation lines

and based on these simple rules a short entry is indicated for 12th December, particularly since the bandpass is also indicating a short trade with its leading function cross. Will be interesting to see how this indicated trade works out.

Friday, 18 November 2011

Below are screenshots of the PMA and PO based upon the improved price projection algorithm discussed in the previous post, shown on an idealised time series of a sinewave (plus trend) in cyan in all screenshots.

The first 3 are plots of the 10 bar PMA in red with a normal 10 bar EMA in green shown for comparative purposes.

As the theory suggests it can be seen that the PMA has the equivalent smoothing of the 10 bar EMA but without any lag.

The next series of screenshots are of the PO, where again the "price" is cyan, the various red lines are 2 bar, 3 bar etc. POs up to an 11 bar PO, and the green PO is a simple average of all these POs.

As I had anticipated these POs lead price by a quarter period which means the cycle highs and lows are indicated by the green "average PO" crossing its centre line, which in turn is a full cycle period length simple moving average of the average PO. My leading oscillator indicator should give timely warnings of these upcoming turns in price. It is also interesting to note that the individual, longer length PO crossings (greater amplitude) also give advance warning of the upcoming turn and the fact that the shorter length POs NOT crossing the centre line and being entirely above or below said centre line seem to indicate the strength of any trend.

Wednesday, 16 November 2011

It has been some time since my last post and the reason for this was, in large part, due to my changing over from an Ubuntu to a Kubuntu OS and the resultant teething problems etc. However, all that is over so now I return to the Perfect Moving Average (PMA) and the Perfect Oscillator (PO).

In my previous post I stated that I would show these on real world data, but before doing this there was one thing I wanted to investigate - the accuracy of the price projection algorithm upon which the average and oscillator calculations are based. Below are four sample screenshots of these investigations.

Sideways Market

Up Trending Market

Down Trending Market

The charts show a simulated "ideal" time series (cyan) with two projected price series of 10 points each (red and green) plotted such that the red and green projections are plotted concurrently with the actual series points they are projections, or predictions, of. Both projections/predictions start at the point at which the red "emerges" from the actual series. As can be seen the green projection/prediction is much more accurate than the red, and the green is the improved projection algorithm I have recently been working on.

As a result of this improvement and due to the nature of the internal calculations I expect that the PMA will follow prices much closer and that the PO will lead prices by a quarter cycle compared with my initial price projection algorithm. More on this in the next post.

Friday, 7 October 2011

Readers of this blog will probably have noticed that I am partial to using concepts from digital signal processing in the development of my trading system. Recently I "rediscovered" this PDF, written by Tim Tillson, on Google Docs, which has a great opening introduction:

" 'Digital filtering includes the process of smoothing, predicting, differentiating,

integrating, separation of signals, and removal of noise from a signal. Thus many people
who do such things are actually using digital filters without realizing that they are; being
unacquainted with the theory, they neither understand what they have done nor the
possibilities of what they might have done.'

This quote from R. W. Hamming applies to the vast majority of indicators in technical
analysis."

The purpose of this blog post is to outline my recent work in applying some of the principles discussed in the linked PDF.

Long time readers of this blog may remember that back in April this year I abandoned work I was doing on the AFIRMA trend line because I was dissatisfied with the results. The code for this required projecting prices forward using my leading signal code, and I now find that I can reuse the bulk of this code to create a theoretically perfect moving average and oscillator. I have projected prices forward by 10 bars and then taken the average of the forwards and backwards exponential moving averages, as per the linked PDF, to create a series of averages that theoretically are in phase with the underlying price, and then taken the mean of all these averages to produce my "perfect" moving average. Similarly, I have done the same to create a "perfect" oscillator and the results are shown in the images below.
Sideways Market

Trending up Market

Trending down Market

The upper panes show "price" and its perfect moving average, the middle panes show the perfect oscillator, its one bar leading signal, and exponential standard deviation bands set at 1 standard deviation above and below an exponential moving average of the oscillator. The lower panes show the %B indicator of the oscillator within the bands, offset so that the exponential standard deviation bands are at 1 and -1 levels.

Even cursory examination of many charts such as these shows the efficacy of the signals generated by the crossovers of price with its moving average, the oscillator with its leading signal and the %B indicator crossing the 1 and -1 levels, when applied to idealised data. My next post will discuss the application to real price series.

Sunday, 2 October 2011

I have recently posted a reply to a thread on the TradingBlox forum here which readers of this blog might be interested in. The Octave code below is that used to generate the graphic in the linked forum reply.

I have now incorporated the revised code for my classifier in my production code and a screen shot of the output on recent S & P 500 data, as of last Friday, is shown below.

The most recent yellow candlesticks indicate that the market is classified as a "down with retracement" market, and the oscillator leading signals (downwards pointing triangles) indicate that the retracement is likely to be over. It will be interesting to see how the classifier performs, in real time, over the coming days.

Tuesday, 23 August 2011

It has taken some time, but I have finally been able to incorporate the Trend Vigor indicator into my Naive Bayesian classifier, but with a slight twist. Instead of being purely Bayesian, the classifier has evolved to become a hybrid Bayesian/clustering classifier. The reason for this is that the Trend Vigor indicator has no varying distribution of values but tends to return values that are so close to each other that they can be considered a single value, as mentioned in an earlier post of mine. This can be clearly seen in the short 3D visualisation animation below. The x, y and z axis each represent an input to the classifier, and about 7 seconds into the video you can see the Trend Vigor axis in the foreground with almost straight vertical lines for its "distributions" for each market type. However, it can also be seen that there are spaces in 3D where only combined values for one specific market type appear, particularly evident in the "tails" of the no retracement markets ( the outermost blue and magenta distributions in the video. )

The revised version of the classifier takes advantage of this fact. Through a series conditional statements each 3D datum point is checked to see if it falls in any of these mutually exclusive spaces and if it does, it is classified as belonging to the market type that has "ownership" of the space in which it lies. If the point cannot be classified via this simple form of clustering then it is assigned a market type through Bayesian analysis.

This Bayesian analysis has also been revised to take into account the value of the Trend Vigor indicator. Since these values have no distribution to speak of a simple linear model is used. If a point is equidistant between two Trend Vigor classifications it is assigned a 0.5 probability of belong to each, this probability rising in linear fashion to 1.0 if it falls exactly on one of the vertical lines mentioned above, with a corresponding decrease in probability assigned to the other market type classification. There is also a boundary condition applied where the probability is set to 0.0 for belonging to a particular market type.

The proof of the pudding is in the eating, and this next chart shows the classification error rate when the classifier is applied to my usual "ideal" time series.

The y axis is the percentage of ideal time series runs in which market type was mis-classified, and the x axis is the period of the cyclic component of the time series being tested. In this test I am only concerned with the results for periods greater than 10 as in real data I have never seen extracted periods less than this. As can be seen the sideways market and both the up and down with no retracement markets have zero mis-classification rates, apart from a small blip at period 12, which is within the 5% mis-classification error rate I had set as my target earlier.

Of more concern was the apparent large mis-classification error rate of the retracement markets ( the green and black lines in the chart. ) However, further investigation of these errors revealed them not to be "errors" as such but more a quirk of the classifier, which lends itself to exploitation. Almost all of the "errors" occur consecutively at the same phase of the cyclic component, at all periods, and the "error" appears in the same direction. By this I mean that if the true market type is up with retracement, the "error" indicates an up with no retracement market; if the true market type is down with retracement, the "error" indicates a down with no retracement market. The two charts below show this visually for both the up and down with retracement markets and are typical representations of the "error" being discussed.

The first pane in each chart shows one complete cycle in which the whole cycle, including the most recent datum point, are correctly classified as being an up with retracement market ( upper chart ) and a down with retracement market ( lower chart. ) The second pane shows a snapshot of the cycle after it has progressed in time through its phase with the last point being the last point that is mis-classified. The "difference" between each chart's respective two panes at the right hand edge shows the portion of the time series that is mis-classified.

It can be seen that the mis-classification occurs at the end of the retracement, immediately prior to the actual turn. This behaviour could easily be exploited via a trading rule. For example, assume that the market has been classified as an up with retracement market and a retracement short trade has been taken. As the retracement proceeds our trade moves into profit but then the market classification changes to up with no retracement. Remember that the classifier (never?) mis-classifies such no retracement markets. What would one want to do in such a situation? Obviously one would want to exit the current short trade and go long, and in so doing would be exiting the short and initiating the possible long at precisely the right time; just before the market turn upwards! This mis-classification "error" could, on real data, turn out to be very serendipitous.

All in all, I think this revised, Mark 2 version of my market classifier is a marked improvement on its predecessor.

Tuesday, 16 August 2011

Some time ago (the file was last edited in July 2010) I wrote an Octave .oct function to create synthetic data for testing and optimisation purposes. I was inspired to do so by the December 2005 issue of The Breakout Bulletin and it has recently come to mind again due to a posting on the StackExchange Quantitative Finance Forum here. I have posted the code for my .oct function in the code box below.

In writing this function I wanted to extend the ideas presented in the Breakout Bulletin and make them more applicable for the purposes I had/have in mind. By randomly scrambling the data any bar to bar dependency is destroyed (by design of course), but what if you want to preserve some bar to bar dependencies? This .oct function is my solution to preserving this dependency and a brief discussion of the theory behind it follows.

Firstly there is an assumption that any single bar and the market forces that caused the bar to be formed the way it did (up bar, down bar, doji etc.) are dependent on the immediately preceding market activity and the "current mode" of the market. Implicit in this assumption is that certain "types" of bars are more likely to be seen depending on market "mode," i.e. the types of bar in an up trend are likely to be distinctly different from those in a down trending or sideways trending market, so what is needed is some way to bin the bars which reflects this.

My solution is to apply a 21 bar moving median of the close and median absolute deviations from this median as bands above and below it, similar to Bollinger Bands. There are 3 levels; 1 x MAD, 2 x MAD and 3 x MAD above, and 3 below; to give a total of 8 "zones" as they are called in the code. Furthermore, a 21 bar moving median of the True Range and a 4 bar WMA of the True Range are also calculated. The first part of the code ("Code Block A Loop"), after all the required declarations, loops over the input time series data calculating all the above and assigning each bar to a specific bin based upon the "zone" in which the previous bar resides, and further assignation depends on whether the previous bar is a high or low volatility bar decided by the True Range 4 bar WMA being above or below the True Range 21 bar moving median. This gives a total of 16 different bins to which a bar can be assigned. On assignation to a bin, the open, high, low and close are recorded in that bin by their relation to the previous close thus: log10(close/previous_close), log10(open/previous_open)... etc.

The next part of the code ("Code Block B Loop") actually creates the synthetic data by randomly drawing a bar's relationships to its previous close from the "relevant bin" and calculating a "new" bar based upon these relationships. This "relevant bin" is determined by the "zone" position and volatility of the most recently calculated synthetic "new" bar. After a new, "new bar" has been created, the median, MADs and True Range calculations are updated to include this new, "new bar," which becomes the previous bar on the next iteration of the loop for Code Block B Loop.

Finally, a small part of the code adjusts the input data in the case of negative values due to the possible use of continuous back-adjusted futures contracts as the input data. This is necessary to avoid errors in trying to calculate the log10 of a negative number.

The above method of binning the input data and subsequent randomisation is my attempt to ensure that dependencies/characteristics of the original data are preserved - for example - assume a bar is above the upper 3 x MAD level and is determined to be a high volatility bar, then the next synthetically created bar will be drawn only from the binned distribution of bars that in the real data also follow a bar above the upper 3 x MAD level and is determined to be a high volatility bar.

This code is offered as is and comes with no warranty whatsoever. However, if you like it and use it I would be interested to hear from you. In particular, if you have any suggestions for the code's improvement, extension, optimisation etc. or see any errors in the code, I would really appreciate your feedback.

A final thought: although not implemented in the above code it would be possible to apply some form of "quality control" to the output. Statistical measures of the input time series could be taken and thresholds established and only those synthetic outputs that fall within these threshold conditions could be accepted as a valid synthetic time series output.

Below is a screenshot of a time series and synthetic data generated from it using the above function code. For the moment I won't say which is the original and which is the synthetic data - perhaps readers would like to post their guesses as comments?

Friday, 29 July 2011

I have now completed some preliminary Monte Carlo testing of the Trend Vigor indicator and I thought this would be a good opportunity to share with readers the general approach I have been taking when talking about my Monte Carlo testing on predefined, "ideal" market types.

Firstly, I create my "ideal" market types by using an Octave .oct function, the code for which is given below. In this code can also be seen my implementation of the code for the Trend Vigor indicator, which I think is slightly different from Elher's. The code is commented, so no further description is required here.

where the top plot shows the distributions of the uwr, unr, dwr and dnr markets, and the lower plot the sideways market. In this particular case, the spread of each distribution is so narrow ( measured differences of the order of thousandths of a decimal place ) that I consider that for practical purposes the distributions can be treated as single values. This simple R boot strap script gets the average value of the distributions to be used as this single point value.

For readers' interest, the actual values are 0.829, 1.329, -0.829 and -1.329 with 0 for the sideways market.

This final plot is the same Natural Gas plot as in my previous post, but with the above values substituted for Ehler's default values of 1 and -1.

What I intend to do now is use these values as the means of normal distributions with varying standard deviations as inputs for my Naive Bayes classifier. Further Monte Carlo testing will be done such that values for the standard deviations are obtained that result in the classifier giving false classifications, when tested using the "ideal" markets code above, within acceptable limits, most probably a 5% classification error rate.

Tuesday, 26 July 2011

I have recently come across another interesting indicator on Ehler's website, information about which is available for download in the seminars section. It is the Trend Vigor indicator and below is my Octave .oct function implementation of it, first shown on the continuous, back-adjusted Natural Gas contract

and next the EURUSD spot forex market, both daily bars.

My implementation is slightly different from Ehler's in that there is no smoothing of the trend slope prior to calculation of the indicator. My reason for this is that smoothing will change the probability distribution of the indicator values, and since my Naive Bayes Classifier uses Kernel Density Estimation of a Mixture model, I prefer not to "corrupt" this density calculation with the subjective choice of a smoothing algorithm. As before, all parameters for this part of the classifier will be determined by using Monte Carlo techniques on "ideal," synthetically modelled market prices.

In the Natural Gas chart above it can be seen that the latter half of the chart is mostly sideways action, as determined by the Trend Vigor indicator, whilst my current version of the classifier ( the colour coded candlesticks ) does not give a corresponding similar determination for the whole of this later half of the chart. This is also the case in the second EURUSD chart. It is my hope that including the Trend Vigor indicator as an input to the classifier will improve the classifier's classification ability.

Coding of my suite of indicators for the Metatrader 4 platform using C++ DLLs is on-going and below is a screen shot of my installation as it stands at the moment.

The indicators are the Cybercycle, with its 1, 2, 3 and 4 day leads in the sub plot, and the instantaneous trendline, in red, and the Tukey contol lines, blue, in the main chart.

One thing that I have come up against in this coding are some of the apparent limitations of the Metatrader 4 platform; for instance there is a limit to how many lines can be drawn, per indicator, in any chart window. Also I am not yet able to access the values of one indicator, in the sub plot for example, for plotting in the main chart, i.e. plotting arrows above/below the candlesticks to indicate where the leading Cybercycle functions cross each other. Either this requires some substantial "hack" or much more advanced Metaquote language programming skills than I possess at the moment.

Thursday, 7 July 2011

I have finally managed to teach myself, with help from various online sources including this very helpful forum thread, how to code a C++ DLL for the Metatrader 4 trading platform. As practise I have coded a simple moving average indicator using a recursive algorithm, the code for which I make freely available as a PDF download on the Dekalog website here.

Now that I have learned this I can begin, as I had hoped I would be able to, to "drag and drop" my C++ .oct functions into the Metatrader platform for testing and trading using live intraday forex data.

Tuesday, 21 June 2011

I am pleased to say that my .oct coding of the Naive Bayesian classifier is now complete. The purpose of the function is to take as inputs various measurements of the current state of the time series, and then for the classifier to classify the time series as being in one of the five following states:-

trending sideways with a cyclic action

trending upwards with 50% retracements of previous up-legs

trending upwards with no retracements

trending downwards with 50% retracemenst of previous down-legs

trending downwards with no retracements

These classifications will determine the most appropriate trading approach to take given the current state of the "tradable," and two screen shots of the classifier are given below, rendered as a "paint bar" study in the upper candlestick chart.

This first image just shows three classifications; trending sideways with a cycle in cyan, trending either up or down with 50% retracement in green, and trending up or down without retracement when the bars are blue or red (blue when close > open, red when close < open). Additionally, the coloured triangles show when the Cybercycle leading functions in the first subgraph cross, giving a count down to the predicted cyclic highs and lows, the blue "0" triangle being the predicted actual high or low. The green "fp high" and red "fp low" lines are the respective full period channel highs and lows of the price series, with the dotted yellow line being the midpoint of the two.

The second subgraph predicts turning points based on zero line crosses of the Cybercycle in the first subgraph. Read more about this indicator here.

The third subgraph is a plot of the sine of the phase of the Cybercycle, with leading signals, superimposed over a stochastic of the price bars. I may discuss this particular indicator in more depth in a future post.

This second screen shot is similar to the first, except that the classifications for retracement and no retracement are separated out into upward and downwards (chart key uwr = up with retracement, unr = up no retracement etc.). In this graph the coloured triangles represent the leading function crosses of the sine in the third subgraph.

Personally I am pleased that the effort to produce this indicator (almost six weeks of Monte Carlo simulation and statistical analysis in R, plus C++ coding) has resulted in a useful addition to my stable of indicators. I think there could still be some tweaks/improvements/additions that could be added, but for the moment I will leave the indicator as it is. The next thing I have set myself to do is implement my collection of C++ .oct functions as DLLs for Metatrader 4 (maybe an EA?) on a demo account so that I can utilise/test them intraday on forex data. Hopefully this will turn out to be a relatively simple case of "dragging and dropping" the already written C++ code.

Saturday, 14 May 2011

After having abandoned my work on Bayesian analysis late last year I am now working on this again, and have been for a few weeks now. I resumed work on this because a response to a question I asked on a forum here led me to the mixdistR package which has now enabled me to model kernel density estimates for a naive Bayes classifier. For more on using a kernel density estimate in a naive Bayes classifier see the link in the answer here.

I expect that it will be a few weeks yet before all the coding and testing of this is complete.

Sunday, 10 April 2011

Following on from the previous post, the coding of the AFIRMA trend line using the leading function values as a proxy for the "peek into the future" values of price is complete, and I have to say that the results are quite disappointing. Using the rules outlined in the previous post it soon became obvious from cursory scanning of the equity curves that my version AFIRMA trend line did not live up to its early, potential promise. In fact the equity curves were so disappointing that I did not even bother to do the Monte Carlo permutation and bootstrap tests. The equity curves were no better than those produced by my simple benchmark suite of "systems" and the draw downs were such that I would never trade the AFIRMA as a stand alone "system," at least with the rules outlined in the previous post.

Below is a screen shot of the AFIRMA with a window length of 21 (peeks 10 days into the future) shown on the last 150 daily bars of the S&P E-mini. The red line is my version of it, with a Blackman-Harris window, and the yellow and green lines are two original versions with a Blackman and Blackman-Harris window. As can be seen, the original versions are smooth and accurately pick out major turning points whilst my version is not as smooth and gives many false signals that result in losses, even during a trending period.Simple analysis of this chart, knowing the reasoning and coding behind it, shows why my version fails as a directional system. Each time the price moves contrary to any immediately prevailing trend, even those of short duration, the leading functions project this small movement as if it were the turning point of a major cycle and hence one ends up with trades in the opposite direction of the major trend, the result being that one is whipsawed in and out of this major trend. My version of the AFIRMA is simply too sensitive to minor price direction changes and I do not really have an idea as to how I can dampen this sensitivity. Smoothing it would probably be pointless as one might as well just smooth the prices directly.

However, all is not completely lost. The fact that my AFIRMA is so sensitive could be useful in identifying pullbacks in trends, acting as a set up to add to positions or for continuation trades. This is something I may investigate in the future, and this idea has been added to my "to do" list. For the moment I do not think that working more on the AFIRMA would be productive.

For interest, this second chart shows AFIRMA trend lines with a window length of nine (peeks four days into the future). It can be seen that my version of AFIRMA is quite robust in that the two trend lines (this chart and the one above) have different length windows but are almost identical.

Sunday, 3 April 2011

As per my previous post, the exploratory tests of the AFIRMA trend line are now complete and the results are amazing. The tests in question were:

a Monte Carlo permutation test to accept or reject the null hypothesis that the results of the AFIRMA "system" are no better than could be expected from a random re-ordering of the system's position vector

a Monte Carlo bootstrap test to accept or reject the null hypothesis that the returns of the AFIRMA "system" are randomly centred around a zero return

Both the above tests are those that are described in Aronson's Evidence Based Technical Analysis and the AFIRMA passed both tests on all historical data series it was tested on, with the exception of the Dollar Index contract, with a p-value of zero. I suspect that it failed on the Dollar Index because of errors in the data I have for this contract. The AFIRMA that was tested had a window length of 9, which means that it "peeks into the future" for 4 bars, and a Blackman windowing function was used. The rules were quite simple: if the current bar's AFIRMA value is greater than that for the previous bar, go long at the open of the next bar or remain long if already so; and the reverse logic for shorts. No money management or stops were employed - it is a pure, one contract, always in the market test. The tests were conducted on daily bars covering the period from March 2001 to last week. The number crunching was done in Octave.

The next test was a simple visual check of the tick return equity, a simple plot of the cumulative number of ticks that the "system" would have returned. For this no allowance was made for commissions and slippage and a typical plot is shown below. This happens to be the S&P E-mini contract.

The AFIRMA is the blue line and the other lines are a simple benchmark suite I knocked up for comparative purposes, the benchmarks being

the equivalent of a buy and hold strategy

price closing above/below the 20 period simple moving average

price closing above/below the 50 period simple moving average

crossovers of the 20 and 50 period moving averages

a Donchian breakout system with a parameter of 20 periods to enter and 10 to exit

This second chart was created in R using the PerformanceAnalytics package and shows the log return equity, the daily log returns and the draw downs of the above, the AFIRMA being the dark blue lines, labelled V2 in the legend.

This final shot is a screen capture of the R session, using RStudio, used to create the performance summary chart. This was the first time I had used RStudio, and I am quite impressed with it.

In summary I can say that the AFIRMA has passed the above tests sufficiently well that I am going to code the AFIRMA using the leading functions as described in my previous post.