Distributed Computing Applications to ATSs

Hello,
I have been interning for nearly a year now at a large investment management shop as a CS graduate student and have since caught the investment bug. I have become very interested in TA and automated trading systems and am working on getting approved for an independent study course building a system all the way through.

My goal is to build a framework that allows for experimentation and learning. (I am aware of the many open source projects.) While I believe that I have a solid enough understanding to build a rudimentary system that allows for this, I am also interested in integrating my own work in distributed systems and making use of a cluster available to me.

From my inexperienced perspective and glances through academic and professional literature, some obvious applications are:

- Backtesting: Testing a single strategy over a very large set of historical data or testing many strategies
- Strategy evolution: Using GAs or similar to evolve strategies
- Real-time evolution of strategies: NNs (or are there other industry approaches?) that are fed in real-time to make decisions. (Not certain that this will yield anything profitable.)
- Real-time analysis: Having a large set of machines allows for more real-time processing. Would having synchronized machines acting as one be useful to you in your setup? Are you hampered at all by the overhead involved in setting up and maintaining such a thing?

That said, I was wondering if I could gather feedback, ideas/hypotheses on the integration and utility of clusters with automated trading systems. If an idea sucks, I'm glad to be informed.

Hello,
I have been interning for nearly a year now at a large investment management shop as a CS graduate student and have since caught the investment bug. I have become very interested in TA and automated trading systems and am working on getting approved for an independent study course building a system all the way through.

My goal is to build a framework that allows for experimentation and learning. (I am aware of the many open source projects.) While I believe that I have a solid enough understanding to build a rudimentary system that allows for this, I am also interested in integrating my own work in distributed systems and making use of a cluster available to me.

From my inexperienced perspective and glances through academic and professional literature, some obvious applications are:

- Backtesting: Testing a single strategy over a very large set of historical data or testing many strategies
- Strategy evolution: Using GAs or similar to evolve strategies
- Real-time evolution of strategies: NNs (or are there other industry approaches?) that are fed in real-time to make decisions. (Not certain that this will yield anything profitable.)
- Real-time analysis: Having a large set of machines allows for more real-time processing. Would having synchronized machines acting as one be useful to you in your setup? Are you hampered at all by the overhead involved in setting up and maintaining such a thing?

That said, I was wondering if I could gather feedback, ideas/hypotheses on the integration and utility of clusters with automated trading systems. If an idea sucks, I'm glad to be informed.

More...

If you want fault-tolerant, you need Tandem NonStop servers. (now under the HP NonStop brand).

I am using JGroups for basic clustering. Each node works on a fixed number of contracts and puts results into a Replicated Hash Map. Since every node has the same hash map in memory, there is no single point of failure. The only problem so far is a ~2 minute delay to reload analysis tasks from a crashed node...

I am using JGroups for basic clustering. Each node works on a fixed number of contracts and puts results into a Replicated Hash Map. Since every node has the same hash map in memory, there is no single point of failure. The only problem so far is a ~2 minute delay to reload analysis tasks from a crashed node...

More...

What processing is each node responsible for?

Is the 2 minute delay due to recalculation or because the task you have allocated each node is that time-intensive?

Do you know of others working with distributed systems towards this end?

Is the 2 minute delay due to recalculation or because the task you have allocated each node is that time-intensive?

Do you know of others working with distributed systems towards this end?

Thanks, I really appreciate hearing about this.

More...

Processing tasks are assigned to each node on a round-robin basis until the maximum # of tasks is reached. There is no head node -- each node listens for changes to the cluster group. My objective was load balancing, not so much failover.

The delay is because I'm using Esper for ESP, which can only persist the engine state in memory... so resuming analysis requires playing back past events from the database. It's a bit crude at this point. I am evaluating EsperHA ($8,000) that has the ability to persist to disk.