Question: How to design an algorithm that gets the all trading candles for quant3 research?

Napkin math

Minute candles are the most valuable, followed by day candles. Other time scale candles can be assembled from these without the need for unnecessary storage. Without any other information, the number of daily API calls would be:

120,000 symbols x 2 = 240,000 API calls needed

We are only allowed 15,000 calls per hour, so

240,000 calls / 15,000 calls/hr = 16 hours

It looks like the worst case time frame is 16 hours per day of constantly calling the broker API endpoint to get the candle data for every North American symbol. A trading day goes from 09:30 until 16:00 Toronto time – 6.5 hours of trading4. If we start the API call algorithm right at EOD, then 16:00 plus 16 hours means we would have all the previous day data by 08:00 the next morning. In theory with this approach we will not have a data deficit. BUT, there will only be a scant 1.5 hours to perform technical analysis before placing the new day’s orders.

Sample output of stock API candle calls

Single-threaded or multi-threaded?

Can we get away with a synchronous call algorithm? Let’s see.

15,000 calls/hr ≈ 4.16 calls/s ≈ 240 ms/call

For a single-threaded for-loop to succeed, each API call, including the overhead of DB storage, can take at most 240ms. I can tell you that this is an unreasonable assumption because:

A security with more trades5 has a bigger JSON payload which increases transit time

From experience, some requests take over a second to process. The proposed algorithm must make asynchronous API calls.

Do we really need to make 240,000 API calls?

When I built my production algorithm, I did the math and it seems extreme to make 240,000 API calls per day. So, how to know which symbols to call and which to defer or ignore altogether? The symbols have flags to indicate if they are quotable and/or tradable (and much more). I wrote a helper class to enumerate the Cartesian products of the combination of types of symbols. Here is a high-level breakdown to the types of symbols:

About 61% of the symbols are non-quotable. The breakdown of these symbols are:

Security Type

Count

Currency

4

-Not set-

38

Commodity

66

Right

3351

Mutual Fund

8367

Bond

11077

Index

11891

Stock

40084

All kinds of securities are not tradable. Why? Drilling down into the stocks in the breakdown reveals that all of them are no longer trading! Did they go bust? Maybe. It turns out that there is a lot of symbol turnover through name changes and delistings due to mergers, acquisitions, bankruptcies, and other reasons. At this moment in time, there are 48,753 quotable symbols.

Do all symbols have equal priority?

We’ve established that we can process all the quotable symbols in about 6.5 hours. If the market closes at 16:00 Toronto time, then we can only start technical analysis from 22:30. AWS Labmda7 processing time is least expensive at night, so this is a silver lining. However, the next question to ask is if we need all the symbols before we start TA. If the answer is no, then which symbols can be deferred or excluded?

List of symbols and exchanges

Here is a breakdown of the quotable symbols by exchange:

Quotable Security Exchange

Count

OPRAI

2

OTCBB

34

TSXI

47

NEO

82

S&P

259

⋮

⋮

TSXV

2150

TSX

2332

NYSE

3253

NASDAQ

3393

NASDAQI

15033

PINX

15999

Serious day and swing traders don’t bother with PINX, and the NASDAQI are niche indexes that aren’t as valuable as bigger indexes. Deferring them until the late evening or weekend saves about 31,000 x 2 API calls.

Above is a diagnostics chart (with JFreeChart) that shows the real progress of the API calls and how outdated the symbols are. The green lines are symbols excluding PINX and NASDAQI, and the red lines are those including them. The yellow band is the trading period. The green band is an approximate after-hours trading band. The red band is when the broker servers are notorious for going down. This real acquisition of all the quotable 1-minute and 1-day candles completed in about 6.5 hours as expected.

Conclusion: In under two and a half hours an algorithm should be able acquire the most valuable one-minute and one-day candles. This means by about 18:30 Toronto time we can begin mass technical analysis. While that is happening by another algorithm, the first algorithm can go back and get the Pinx and NASDAQI symbols for completeness.

Next: The next step is to figure out the storage footprint, database architecture, and how to deal with multi-threaded concurrency of the databases: Storing Stock Candle Data Efficiently