Does anyone know of any prior art for non-SQL data structures for high-frequency accounting, whether client, broker, or exchange-side? I'm thinking specifically of the problem of booking individual trade data into proper transactions, with balanced debits and credits. In my own case, I'll be doing this in or directly adjacent to a fast limit-order book, but I can see other reasons for such a beast existing. And yes, I agree that none of the current raft of popular non-ACID nosql engines are at all right for this job. I'm assuming I'm going to need to write this.

A usable answer to this question might be as simple as a link to a paper on the subject of non-SQL or nosql accounting in a high-volume trading context -- I'm obviously using the wrong combinations of search terms, because I'm not finding much yet.

What I'm working on is a project that includes a limit-order book and accounting on each node in a distributed grid or fabric. In my case, the traded instruments could best be described as real options or real derivatives, including some mild exotics. The vast majority of the orders would be initiated by machines, and the data rate looks like it could easily hit 60k trades/sec on each node. (Without going into a longer dissertation, it might help to explain that I'm in Silicon Valley these days; this is obviously for a new market, not any existing one.) See http://en.wikipedia.org/wiki/Real_options_valuation if you haven't run across real options before.

Partial answers, based in part on feedback to this question so far:

A purpose-built accounting mechanism would probably be log-structured, append-only, probably using a non-SQL API for insertion speed. The engine itself might be a hypergraph database. If running on multiple nodes, it would need a way of providing summary transactions to the other nodes in a peer-to-peer fashion. The more I dig into this, the more it's starting to look like a distributed hypergraph.

In the HFT world, it sounds like the standard procedure is still: Log but do not index the trades, do simple arithmetic ignoring debits and credits, and then synthesize balanced summary transactions to the accounting RDBMS periodically. Run MTM in batch. Is there anything anyone can say about how that "simple math and local logging" is done? I know how we did this in the derivatives world 15 years ago, but frankly it and MTM were both slow and ugly, and involved NFS servers, flat files, and shell scripts. Has nothing changed? ;-)

Okay, removing 'accounting' from the search terms just now found me this -- different question at first glance, covering both tick and financial data, but worth reading through -- looks like he had some of the same thoughts: Usage of NoSQL storage in Finance

Looks like it would be worth repeating my searches in google, citeseer, etc., substituting "finance" for "accounting".

Almost all OLTP systems are designed in such a manner that they "synthesize balanced summary transactions to the back office periodically". It's a standard approach.
–
Alexey KalmykovMar 14 '12 at 0:46

@AlexeyKalmykov that pretty much sum's it up,
–
pyCthonMar 14 '12 at 2:02

1

Thanks. Huh -- that could be why I'm not finding much prior art for better ways to do this. I thought for sure that things would have improved in the last 15 years.
–
stevegtMar 14 '12 at 3:15

on a side note there are plenty of NON-sql databases that can for-fill the roll noted above
–
pyCthonMar 20 '12 at 5:34

Given that exchange traded option markets are slow enough for SQL databases, you can color me skeptical that you are spending brainpower on a problem that will really exist. Obviously I don't know anything about the real-options-like trading you are thinking of, so I could easily be wrong on this topic, but that's my first thought.
–
Brian BMar 22 '12 at 16:22

2 Answers
2

I know this is probably a naive answer, but when I started doing data analysis for personal trading I looked for something much faster than SQL. I program in C++ and I found that HDF5 was the answer to all my problems

I have to think that there are a lot of very fast, very optimized special-purpose
accounting engines out there filling this role.

Yes and no. I do not think you are high volume at all - you just have a corporate-level server for the database, not a cheap low-end hosting. I do about 2000 transactions per second on a SQL Server with a mid-range database.

The core will be:

Decouple front and back with a message queue anyway.

Take trade executions from a FIX backoffice link that reports from clearing / broker.

it seems like a huge waste of data center horsepower when a more modern purpose-built,
probably non-SQL accounting engine might be orders of magnitude faster.

There is one thing amiss: SQL has data integrity, while NoSql is often written ignoring data integrity requirements. You can get away with a lack of data integrity for a LOT of stuff, but not with accounting.

You also miss that accounting is a standardized commodity side. Large companies run something like SAP - and want all their data to be in there, regardless of costs. It is not a waste of time to upgrade the one central system doing your payroll, all invoices for the organization, etc. on top of trade accounting.

Also it is a question whether accounting really needs every trade - back office yes, to consolidate and check, but accounting is OK with synthesized balanced summaries. I do not do a lot of trading so far but submit monthly PNL totals with broker statement to my accountant (where it goes straight to my monthly profit / loss and tax calculations). I never will do different , even when volume ramps up - but will consolidate daily or hourly and correlate INTERNALLY, but not for accounting.

I didn't mention what sort of volume I'm looking at -- but so far, it looks like about 60k trades per second on each node of a distributed fabric, on commodity hardware. And when I say "non-SQL", I'm not thinking of any of the current raft of nosql non-ACID databases; I agree with you that they weren't written with accounting in mind. I'm assuming I'm going to need to write this myself; publicly-available prior art is what I'm looking for here.
–
stevegtMar 14 '12 at 7:10

1

60k trades per second? Are you serious? Not doubting HFT, but if that is more than 2-3 nodes you talk ECN level volume, not trading. You would take over hald a stock market. In this case I would likely go with summaries and distribute stocks per node, with local logfile + summarization.
–
TomTomMar 14 '12 at 7:13

Yep, 60k trades per second per node would be about right. I hear you regarding local logging + summarization: that's three in favor of that now, no other suggestions yet. I've updated the question a bit to clarify some of the points you raised. And who said anything about stocks? ;-)
–
stevegtMar 14 '12 at 7:26

But real time MTM is NOT ACCOUNTING. This is risk management and trade control and done WITHOUT database in memory. You talked about accounting. Not about a real time trade position overview.
–
TomTomMar 14 '12 at 20:03