I was thinking about writing my own backtester and I realize I have to make some assumptions. So I was hoping I could post what I am planning on doing and hopefully some of you can give me some ideas on how to make it better (I'm sure there is a lot that can be improved).

First of all, my strategy involves holding stocks for usually some days, I am not doing (probably any) intra-day trading.

So here is what I was thinking. First, I would buy some minute OHLC stock quotes covering the stocks I am interested in (thinking about buying some from pitrading.com, is their quality acceptable?). Then if the algorithm triggers a buy or sell at some bar, I would "execute" the order using the high or low of the very next bar (attempting to be as pessimistic as possible here). One thing I am curious about is bid/ask, so I was thinking about maybe adding/subtracting a few cents to take this into account when buying/selling. I would just see what these values have been recently (difference between bid/ask and quote for some recent data on these stocks and then just use these numbers as I wouldn't be backtesting that far back). I would assume that I can buy/sell all I want then at that price.

Lastly I would include the cost of commission in the trade. I would neglect any effect my trade would have on the market. Is there any rough guideline using volume to estimate how much you would have to buy/sell to have an effect?

I would also simulate stop-loss sell orders and they, too, would be executed at the next bar low after the price passed the threshold.

That's it, it will be pretty simple to implement. I am just hoping to make it conservative so it can give me some insight into how well my program works.

Any thoughts or criticisms about this program would be greatly appreciated. I am new at this and I am sure there are some details I am missing.

There are many details you need to take into account in order to do a proper backtest. Besides the correction regarding bar entry/exit prices mentioned earlier, you will also need to correct for the bid/ask spread, which may be wider than the high/low for some stocks at some times. Some other details are:

Splits

Dividends

Ticker changes

Warrants/rights issuances

Merger/acquisition activity

Be careful to include stocks that have been delisted in your backtest, and generally only use criteria that would have been known at the time to construct your sample (see here).

As for volume limits, a good proxy is to never be trading more than 5% of the typical volume that takes place in a given period. For example, if you intend to buy $1MM of some stock over, say, a half hour period, be sure that at least 20MM is typically traded during that half hour of the day. From your question, though, it sounds like you are thinking of a strategy that is executed minute-to-minute, so you are probably better off just looking at the order book and how many shares are typically available to trade. You can somewhat get a sense for that by looking at minute volume bars, and assume you will not be able to trade more than the 10th percentile of the distribution.

Having developed many custom backtesting programs in the past, I wish I would have just started out by purchasing a decent commercial backtester for a few hundred dollars. The cost of buying one already completed will save you hours of time on learning the nuances and problems that come with backtesting implementation. Once you are adept at the ins and outs and limitations of the software, you can move on to more flexible programming. At the very least, you have a production implementation (with thousands of professional man-hours of effort) to compare your own custom software results and debug issues that surface. The cost of buying an actual professional product to learn the ins and outs is trivial in the long run.

Stop loss simulation with strictly OHLC is not possible since you need to assume the ordering of the high and low (which one came first)? You can make an assumption when there is a straight line thru the price (open at low, close at high for example), however, in general you will have days where it is not clear what low/high came first.

You also need to consider the logic for market vs. limit orders. If you use market orders you need some slippage assumptions. If you use limit orders there is a probability your order will not be filled.

Also, using OHLC may lead you to design a strategy that trades on the Open. However, the open price is extremely volatile and you may face steep slippage and inability to achieve the quoted open price.

For the stop loss issue, you should assume the worst case scenario. Ideally you should not work at a resolution where the stop loss would be so close that it would be a common occurrence that the order would matter.
–
BlueTrinJan 9 '14 at 13:57

It depends on what do you need. Basically, an efficient backtester allows you to study all the market microstructure. In other words, you have to simulate a scenario that gives you all the information of bid-ask prices, volumes, traded price, volume traded, etc.

Two years ago I developed my own market in Python. I used a Poisson process to simulate trades. I also analyse the supply and demand to define the aggressiveness of the volumes and the size of the spread.

I recommend to check tradestation.com Their EasyLanguage allows to build and backtest much more advanced strategies. They also have pre-build indicators and data for many years.
It is not cheap, about $100 a month but most of the time they have a special offers for first 3 or 6 month free and monthly fee is dropped if you reach a certain level of trading.
Here is a brief video: http://www.tradestation.com/education/university/school-of-easylanguage/video-tutorials

I'll answer as if the backtester design goals were driven by specific system in development or planning. There's a lot of data to process in the market and atleast for my own system development I like to focus on data that I think may provide value in the system development. For that reason I chose to capture data from my broker - they offer so much data through the API that before even writing my own backtester I actually had to write tool to capture the data that I thought would be relevant to my testing:

One way (to justify EOD backtest) is to keep your position sizes and amount of total positions such that if one stock goes into a state where you are forced to exit at really horrible price eg. 0, this happening won't be a huge blow. How many such cases could be expected to occur? Well if you limit your symbols to ones that have financial status such that this would be extremely unlikely, then not often enough to matter.

Then, if you use margin, factor in whether you can be auto-liquidated by your broker if you loaned money through them. Then, don't have your system ever exit on anything but on close (few last minutes of RTH), that's where symbols are most liquid as everyone else use this approach + you have intra-day traders liquidating. Except for pre-announced reasons, the stocks tend to trade "thick" for the reasons mentioned above during the close. This thickness is just another word for lots of quotes in the book and those quotes prevent high volume stocks from spiking, which ensures that using the daily close price and possibly volume around the last few minutes of the day will be sufficient.

If you allow for exiting at other times then the suggestion to limit your systems maximum volume traded at intra-day level other than close to eg.that 5% of the volume in your data makes sense. And unless you are risking too much or have instrument selection of random small caps (for unhedged long hold time trading), then I'd consider not to even factor in the worst cases of intra-day movement - atleast not until your strategy is showing the kind of profits that you'd hope to weather few disasters over time. Being too pessimistic initially might actually bias the system development to hitting stops during intra-day anomalies related to market structure (eg. low liquidity during most of the day -> higher spread -> more volatility due to wide real spreads) that you'd be selling at low that from intra-day perspective might look like the best time to buy but since you had ignored the intra-day data you couldn't make the determination.