Tag: database

I’ve been busy with a project, I’ve finally got round to writing this a week later than I intended…

In a recent conversation, someone pointed out that people sometimes remove “constraints” from a database in order to improve performance. This made me ask myself:

Is this a good thing, or a bad thing?

I have to admit that this is a technical change that I have considered in the past. Never-the-less, I have mixed feelings about it.

After some thought, my opinion is:

For many situations a constraint is redundant. The fundamental structure of many applications means they are unlikely to create orphan rows.

The cost of the constraint is in the extra processing it causes during update operations. This cost is incurred every time a value in the constrained column is updated.

The benefit of a constraint is that it absolutely protects the constrained column from rogue values. This may be particularly relevant if the system has components (such as load utilities or interfaces with other systems) which by-pass the normal business transactions.

Other benefits of constraints are that they unequivocally state the “intention” of a relationship between tables and they allow diagramming tools which navigate the relationships to “do their thing”. Constraints provide good documentation, which is securely integrated with the database itself.

In short:

The costs of constraints are small, but constant and in the immediate term.

The benefits of constraints are avoiding a potentially large cost, but all in the future.

It’s the old “insurance” argument. Make the decision honestly based on a proper assessment of the real risk and your attitude to taking risks. Be lucky!

More Detailed Argument

For those who don’t just want to take my word for it. Here is a more detailed argument.

Let’s take the “business data model” of a pretty normal “selling” application.

When we perform the activities “Take Order” (maybe that should be “Take ORDER”), or “Update Order”

we create or update the ORDER and ORDER_LINE entities, and

in addition we refer to PRODUCT (to get availability and Price) and presumably to the CUSTOMER entity which isn’t shown on the diagram.

When I translate this into a Logical data model, I impose an additional rule “Every ORDER must contain at least 1 ORDER_LINE”. The original business model doesn’t impose this restriction.

Remember some people do allow ORDERs with no ORDER_LINES. They usually do it as part of a “reservation” or “priority process” which we are not going to try and have here.

When the transaction which creates the ORDER and ORDER_LINE makes it’s updates, then it will have read CUSTOMER and ORDER, so it is unlikely to produce orphan records, with or without constraints.

On the other hand, by having the constraints we can document the relationships in the database (so that a diagramming tool can produce the ERD diagram (really I suppose that should be “Table Relationship Diagram”)).

I am left wondering whether it would be possible or desirable to enforce my “Every ORDER must contain at least 1 ORDER_LINE” rule. I’ll think about that further. (Note to self: Can this be represented as a constraint which does not impose unnecessary and unintended restrictions on creating an ORDER?)

If we don’t have constraints and we have something other than our transaction which is allowed to create ORDERs and/or ORDER_LINEs (As I said, typically this would be an interface with another system or some kind of bulk load), we have no way of knowing how reliably it does it’s checking, and we might be allowing things we really do not want into our system. Constraints would reject faulty records and the errors they created (or “threw”) could be trapped by the interface.

I gave an answer in the forum, but here is an expansion, and some ponderings.

First of all, let’s set out the terms of reference. The question asks about a “large unfamiliar” database. I think we can assume that “unfamiliar” is “one that we haven’t encountered before”, but what is “LARGE”? To me “large” could be:

Lots of tables

Many terror-bytes 😉

Lots of transactions

Lots of users

There may be other interpretations

I’m going to go with “Lots of tables” with the definition or “lots of” being:

“more than I can conveniently hold in my head at one time”

I’ve also assumed that we are working with a “transactional database” rather than a “data warehouse”.

Preparation

Gilian, the questioner was given some good suggestions, which I summarised as “Collecting Information” or perhaps “Preparation”:

Understand objectives of “The Business”

Understand the objectives of “This Project” (Digging into the Database)

Collect relevant Organisation charts and find out who is responsible for doing what

Collect relevant Process Models for the business processes which use the database

Of these, the one which is specific to working with a Database is the ERD. Having a diagram is an enormous help in visualising how the bits of the database interact.

Chunking

For me, the next step is to divide the model into “chunks” containing groups of entities (or tables). This allows you to:

Focus – on one chunk

Prioritise – one chunk is more important, interesting or will be done before, another

Estimate – chunks are different sizes

Delegate – you do that chunk, I’ll do this one

And generally “Manage” the work; do whatever are the project objectives.

I would use several techniques to divide the database or model up into chunks. These techniques work equally well with logical and physical data models. It can be quite a lot of work if you have a large model. None of the techniques are particularly complicated, but they are a little tricky to explain in words.

Here is a list of techniques:

Layering

Group around Focal Entities

Process Impact Groups

Realigning

Organise the Data Model

I cannot over-emphasis how important it is to have a well-laid out diagram. Some tools do it well, some do it less well. My preference is to have “independent things” at the top.

I’ve invented a business.

We take ORDERs from CUSTOMERs.

Each ORDER consists of one or more ORDER_LINES and each line is for a PRODUCT.

We Deliver what the customer wants as DELIVERY CONSIGNMENTS.

Each CONSIGNMENT contains one or more Batches of product (I’ve haven’t got a snappy name for that).

We know where to take the consignment by magic, because we don’t have an Address for the Customer!

We reconcile quantities delivered against quantities ordered, because we sometimes have to split an order across several deliveries.

That’s it!

Layering

“Layering” involves classifying the entities or groups of entities as being about:

Classifications

Things

Transactions

Reconciliations

Things

Let’s start with “Things”. Things are can be concrete or they can be abstract. We usually record a “Thing” because it is useful in doing our business. Examples of Things are:

People

Organisations

Products

Places

Organisation Units (within our organisation, or somebody elses)

Classifications

Every business has endless ways of classifying “Things” or organising them into hierarchies. I just think of them as fancy attributes of the “Things” unless I’m studying them in their own right.

Note: “Transactions” can have classifications too (in fact almost anything can and does), I’ve just omitted them from the diagram!

Note: The same structure of “Classification” can apply to more than one thing. This makes sense if, for example, the classification is a hierarchy of “geographic area”. Put it in an arbitrary place, note that it belongs in other places as well, and move on!

Transactions

Transactions are what the business is really interested in. They are often the focus of Business Processes.

Order

Delivery

Booking

Where there are parts of Transactions (eg Order_Line) keep the child with the parent.

Reconciliations

Reconciliations” (between Transactions) occur when something is “checked against something else”. In this case we are recording that “6 widgits have been ordered” and that “3 (or 6) have been delivered”.

If you use these “layers”, arranged as in the diagram, you will very likely find that the “One-to-manys” point from the top (one) down (many) the page.

Groups around Focal Entities

To do this, pick an entity which is a “Thing” or a “Transaction” then bring together the entities which describe it, or give more detail about it. Draw a line round it, give it a name, even if only in your head!

“Customer and associated classifications” and

“Order and Order_line” are candidate groups.

Process Impact Groups

To create a “Process Impact Group”

Select a business process

Draw lines around the entities which it: creates, updates and refers to as part of doing its work.

You should get a sort of contour map on the data model.

In my example the processes are:

Place Order

Assemble Delivery Consignment

Confirm Delivery (has taken place)

It is normal for there to be similarities between “Process Impact Groups” and “Focal Entity Groups”. In fact, it would be unusual if there were not similarities!

Realigning

Try moving parts under headers (so, Order_line under Order) and reconciliations under the transaction which causes them. In the diagram, I’ve moved “Delivered Order Line” under “Delivery”, because it’s created by “Delivery related processes” rather than when the Order is created.

Finally, “Chunking”

Based on the insights you have gained from the above, draw a boundary around your “chunks”.

The various techniques are mutually supportive, not mutually exclusive. The chunks are of arbitrary size. If it is useful, you can:

combine neighbouring chunks together or

you can use the techniques (especially “Focal entities” and “Process Entity Groups”) to break them down until you reach a single table/entity.

Tools

My preferred tools for doing this are: a quiet conference room, late at night; the largest whiteboard I can find; lots of sticky Post-its or file cards (several colours); a pack of whiteboard pens; black coffee on tap and the prospect of beer when I’ve finished. For a large database (several hundred tables) it can take several days!