Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I am writing the schema for a simple bank database. Here are the basic specifications:

The database will store transactions against a user and currency.

Every user has one balance per currency, so each balance is simply the sum of all transactions against a given user and currency.

A balance cannot be negative.

The bank application will communicate with its database exclusively through stored procedures.

I expect this database to accept hundreds of thousands of new transactions per day, as well as balance queries on a higher order of magnitude. To serve up balances very quickly I need to pre-aggregate them. At the same time, I need to guarantee that a balance never contradicts its transaction history.

My options are:

Have a separate balances table and do one of the following:

Apply transactions to both the transactions and balances tables. Use TRANSACTION logic in my stored procedure layer to ensure that balances and transactions are always in sync. (Supported by Jack.)

Apply transactions to the transactions table and have a trigger that updates the balances table for me with the transaction amount.

Apply transactions to the balances table and have a trigger that adds a new entry in the transactions table for me with the transaction amount.

I have to rely on security-based approaches to make sure no changes can be made outside of the stored procedures. Otherwise, for example, some process could directly insert a transaction into the transactions table and under scheme 1.3 the relevant balance would be out of sync.

Have a balances indexed view that aggregates the transactions appropriately. Balances are guaranteed by the storage engine to stay in sync with their transactions, so I don't need to rely on security-based approaches to guarantee this. On the other hand, I cannot enforce balances be non-negative anymore since views -- even indexed views -- cannot have CHECK constraints. (Supported by Denny.)

Have just a transactions table but with an additional column to store the balance effective right after that transaction executed. Thus, the latest transaction record for a user and currency also contains their current balance. (Suggested below by Andrew; variant proposed by garik.)

When I first tackled this problem, I read thesetwo discussions and decided on option 2. For reference, you can see a bare-bones implementation of it here.

Have you designed or managed a database like this with a high load profile? What was your solution to this problem?

Do you think I've made the right design choice? Is there anything I should keep in mind?

For example, I know schema changes to the transactions table will require I rebuild the balances view. Even if I am archiving transactions to keep the database small (e.g. by moving them somewhere else and replacing them with summary transactions), having to rebuild the view off tens of millions of transactions with every schema update will probably mean significantly more downtime per deployment.

If the indexed view is the way to go, how can I guarantee that no balance is negative?

Archiving transactions:

Let me elaborate a bit on archiving transactions and the "summary transactions" I mentioned above. First, regular archiving will be a necessity in a high-load system like this. I want to maintain consistency between balances and their transaction histories while allowing old transactions to be moved somewhere else. To do this I'll replace every batch of archived transactions with a summary of their amounts per user and currency.

If you choose option 2 (which I think is cleaner), take a look at pgcon.org/2008/schedule/attachments/… how to implement "materialized views" efficiently. For option 1, chapter 11 of Haan's and Koppelaars' Applied Mathematics for Database Professionals (don't worry about the title) would be helpful to get an idea how to implement "transition constraints" efficiently. The first link is for PostgreSQL and the second for Oracle, but the techniques should work for any reasonable database system.
–
j.p.Sep 15 '11 at 7:37

You know, normally people ask on StackExchange "Is this homework?" because they don't want it to be. But, since you're doing a bank schema from scratch, I really, really hope for your sake that this is homework. :D Either way, it should be a great lesson in proper ACID transaction design.
–
Jonathan Van MatreDec 22 '11 at 19:45

1

@JonathanVM - Homework shmomework. This is actually the design for BofA's next-gen commercial banking database, which they will be migrating all their customer data to on 2012/04/01.
–
Nick ChammasDec 22 '11 at 20:13

3

@NickChammas /me heads off to close his account at BofA. ;-)
–
Jonathan Van MatreDec 22 '11 at 20:55

7 Answers
7

Not allowing customers to have a less than 0 balance is a business rule (which would change quickly as fees for things like over draft are how banks make most of their money). You'll want to handle this in the application processing when rows are inserted into the transaction history. Especially as you may end up with some customers having overdraft protection and some getting charged fees and some not allowing negative amounts to be entered.

So far I like where you are going with this, but if this is for an actual project (not school) there needs to be a hell of a lot of thought put into business rules, etc. Once you've got a banking system up and running there's not a lot of room for redesign as there are very specific laws about people having access to their money.

I can see why the balances constraint should actually be a business rule. The database is just providing a transaction service, and it's up to the user to decide what to do with it.
–
Nick ChammasSep 13 '11 at 16:10

What do you think about Jack's comments that using the two tables affords the developers more flexibility in changing or implementing business logic? Also, do you have any direct experience with indexed views that validates or challenges these concerns?
–
Nick ChammasSep 13 '11 at 16:17

1

I wouldn't say that having two tables give you move flexibility is implementing business logic. It does give you more flexibility in doing data archiving. However as a bank (at least in the US) you've got laws which say how much data you need to keep. You'll want to test how the performance looks with the view on top, as well as take into account that if you have an indexed view you can't change the schema of the underlying tables. Just another thing to think about.
–
mrdenny♦Sep 13 '11 at 19:12

All of the items mentioned in the article are valid concerns to thing about when using an indexed view.
–
mrdenny♦Sep 13 '11 at 19:14

1

To clarify: IMO a transactional API gives more flexibility implementing business logic (not having two tables). In this case I'd also be in favour of two tables (at least given the information we have so far) because of the trade-offs proposed with the indexed view approach (eg can't then use DRI to enforce balance>0 business rule)
–
Jack Douglas♦Sep 14 '11 at 8:26

Having read those discussions too, I am not sure why you decided on the DRI solution over the most sensible of the other options you outline:

Apply transactions to both the transactions and balances tables.
Use TRANSACTION logic in my stored procedure layer to ensure that
balances and transactions are always in sync.

This kind of solution has immense practical benefits if you have the luxury of restricting all access to the data through your transactional API. You lose the very important benefit of DRI, which is that integrity is guaranteed by the database, but in any model of sufficient complexity there will be some business rules that cannot be enforced by DRI.

I'd advise using DRI where possible to enforce business rules without bending your model too much to make that possible:

Even if I am archiving transactions (e.g. by moving them somewhere
else and replacing them with summary transactions)

As soon as you start considering polluting your model like this, I think you are moving into the area where the benefit of DRI is outweighed by the difficulties you are introducing. Consider for example that a bug in your archiving process could in theory cause your golden rule (that balances always equal the sum of transactions) to break silentlywith a DRI solution.

Here is a summary of the advantages of the transactional approach as I see them:

We should be doing this anyway if at all possible. Whatever solution you choose for this particular problem, it gives you more design flexibility and control over your data. All access then becomes "transactional" in terms of business logic, rather than just in terms of database logic.

You can keep your model neat

You can "enforce" a much wider range and complexity of business rules (noting that the concept of "enforce" is a looser one than with DRI)

You can still use DRI wherever practical to give the model a more robust underlying integrity - and this can act as a check on your transactional logic

Most of the performance issues that are troubling you will melt away

Introducing new requirements can be much easier - for example: complex rules for disputed transactions might force you away from a pure DRI approach further down the line, meaning a lot of wasted effort

Partitioning or archiving of historical data becomes much less risky and painful

--edit

To allow archiving without adding complexity or risk, you could choose to keep summary rows in a separate summary table, generated continuously (borrowing from @Andrew and @Garik)

For example, if the summaries are monthly:

each time there is a transaction (through your API) there is a corresponding update or insert into the summary table

the summary table is never archived, but archiving transactions becomes as simple a a delete (or drop partition?)

each row in the summary table includes 'opening balance' and 'amount'

check constraints such as 'opening balance'+'amount'>0 and 'opening balance'>0 can be applied to the summary table

summary rows could be inserted in a monthly batch to make locking the latest summary row easier (there would always be a row for the current month)

I've elaborated on my archiving solution in my question. I don't see how it could break my balances-transactions consistency under the DRI approach--the database is guaranteeing that for me. With the stored procedure approach, on the other hand, a badly written archiving job may mess up its transaction summaries and break my b-t consistency. (Of course, in this case the balances would still be correct, but their transaction histories now contradict them.)
–
Nick ChammasSep 12 '11 at 16:38

This is the core issue I have with the non-DRI approach: Though I can restrict my application to use my stored procedures, it's much harder to guarantee that maintenance jobs, one-off scripts, future DBAs/DBDs, and so forth don't break the consistency between balances and transactions.
–
Nick ChammasSep 12 '11 at 16:41

3

@Nick - The archival process will delete rows and insert summary rows. Whether it gets its sums right or wrong: DRI will not prevent it deleting what it likes and inserting what it likes! That's what I call a false sense of security. With the transactional approach, you need the authority to force the archiving job, one-off scripts, future DBAs to also access the data through your well-defined API - or get written permission from someone in high enough authority - this kind of policy (with auditing) encourages people to be very careful ;-)
–
Jack Douglas♦Sep 12 '11 at 16:51

Great point. Side note: My "golden rule" does not break at all with DRI, as you stated in your answer, since it keeps balances always in sync with transactions. It's that DRI does not make an archiving bug apparent (by letting balances and transactions contradict) is your objection.
–
Nick ChammasSep 15 '11 at 17:55

Regarding your edit: So you propose having this summary table along side the main balances table? Does the balances table then effectively become a summary table that just has the records for the current month (since both will store the same kind of data)? If I've understood correctly, then why not just replace the balances table with the appropriate partition on the summary table?
–
Nick ChammasSep 19 '11 at 18:58

/* For a single User/Currency */
Select TOP 1 *
FROM dbo.Transaction
WHERE UserID = 3 and CurrencyID = 1
ORDER By TransactionDate desc
/* For multiple records ie: to put into a view (which you might want to index) */
SELECT
C.*
FROM
(SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY UserID, CurrencyID
ORDER BY TransactionDate DESC
) AS rnBalance
FROM Transaction) C
WHERE
C.rnBalance = 1
ORDER BY
C.UserID, C.CurrencyID

Cons:

When inserting a transaction out of sequence (ie: to correct an
issue/incorrect starting balance), you may need to cascade updates
for all subsequent transactions.

Transactions for the User/Currency would need to be serialized to maintain an accurate balance.

-- Example of getting the current balance and locking the
-- last record for that User/Currency.
-- This lock will be freed after the Stored Procedure completes.
SELECT TOP 1 @OldBalance = OpeningBalance + TransactionAmount
FROM dbo.Transaction with (rowlock, xlock)
WHERE UserID = 3 and CurrencyID = 1
ORDER By TransactionDate DESC;

Pros:

You no longer have to maintain two separate tables...

You can easily validate the balance, and when the balance gets out of sync you can identify exactly when it got out of whack as the transaction history becomes self documenting.

Edit: Some sample queries on retrieval of current balance and to highlight con (Thanks @Jack Douglas)

The main problem with this approach is the burden on updates, which increases the older the transaction is - the entire subsequent transaction history needs updating each time this happens. Yes it should never happen, but I wouldn't want my model to rely on that assumption.
–
Jack Douglas♦Sep 14 '11 at 8:16

1

@Jack - Transactions against the same balance will have to be serialized one way or the other. How else would the database guarantee that each transaction sees the latest balance before it is executed? In this approach the database would row lock the latest transaction, which contains the balance, to execute a transaction. In the two tables approach the database would row lock the balance. In both cases a series of transactions will be serialized if it hits an individual balance.
–
Nick ChammasSep 15 '11 at 15:18

1

@Nick Having a balance index view and a transaction table, you would not need to lock any table to update the balance (is I think Jack's point
–
Andrew BickertonSep 15 '11 at 15:25

1

Ah, as in read the latest transaction with an update lock to ensure concurrent transactions are serialized. Otherwise, two transactions may be executed against the same balance at the same, since a SELECT does not block others from reading the same data. Looks like this guy has had trouble getting SQL Server to hold an update lock on a single row.
–
Nick ChammasSep 15 '11 at 15:31

4

@Jack - With regards to the update problem from your first comment, we could simply add an "adjustment" transaction to correct any balance errors, as opposed to updating the entire transaction history. I understand this is common practice in banks.
–
Nick ChammasSep 15 '11 at 15:55

I am not familiar with accounting, but I solved some similar problems in inventory-type environments. I store running totals in the same row with the transaction. I am using constraints, so that my data is never wrong even under high concurrency. I have written the following solution back then in 2009::

Calculating running totals is notoriously slow, whether you do it with a cursor or with a triangular join. It is very tempting to denormalize, to store running totals in a column, especially if you select it frequently. However, as usual when you denormalize, you need to guarantee the integrity of your denormalized data. Fortunately, you can guarantee the integrity of running totals with constraints – as long as all your constraints are trusted, all your running totals are correct. Also this way you can easily ensure that the current balance (running totals) is never negative - enforcing by other methods can also be very slow. The following script demonstrates the technique.

A better variant is decreasing number of summary records. We can have one balance record at the end(and/or begin) of day. As you know every bank has operational day to open and than close it to do some summary operations for this day. It allows us to easy calculate interest by using every day balance record, for example:

In Oracle you could do this using just the transactions table with a fast refreshable Materialized View on it that does the aggregation to form the balance.
You define the trigger on the Materialized View. If the Materialized View is defined with 'ON COMMIT', it effectively prevents adding/modifying data in the base tables.
The trigger detects the [in]valid data and raises an exception, where it rollback the transaction.
A nice example is here http://www.sqlsnippets.com/en/topic-12896.html

Based on your requirements, option 1 would appear the best. Although I would have my design to only allow inserts into the transaction table. And have the trigger on the transaction table, to update the real time balance table. You can use database permissions to control access to these tables.

In this approach, the real time balance is guaranteed to be in sync with the transaction table. And it does not matter if stored procedures or psql or jdbc is used. You can have your negative balance check if needed. Performance will not be an issue. To get the real time balance, it is a singleton query.

Archiving will not affect this approach. You can have a weekly,monthly,yearly summary table also if needed for things like reports.