I would like to address some of the issues raised in the recent debate over whether data belongs in tables or not. I think Honroy hit the nail on the head when he wrote, "It's important to consider the likelihood of change as well." But the issues are subtle and complex, which is the reason it has taken me a full day to get around to writing this response.

So let me begin with one of Jeff's responses, since he's the one I'm mainly picking on here :-)

there are no problems. That was an example. If you need historical reporting or tracking, you simply model it that way. As the article clearly says.

Ah, but the devil is in the details. How exactly are you going to model it? If you actually sit down to work it out you will find that the complications mount very quickly. For example, one way to model historical changes is to add effective-start-date and effective-end-date columns. But now you have to deal with the possibility that the time periods could overlap, or that there could be gaps between the periods. (Actually, you already have a similar problem even in the original simple example because there is nothing to prevent shifts from overlapping or having gaps, but that's another can of worms.) If you have an effective-end-date, how do you model the currently effective shift schedule? Do the current shifts have NULL end dates, or end dates far into the future (creating a Y2K-like problem)? What do you do if, due to an operational error, some incorrect data finds its way into the shifts table? What if that incorrect data was used in subsequent dependent calculations? Do you log all your database updates so that you can tell which calculations used the wrong data? Or do you have to go back and recompute everything from scratch?

I'm not saying that these problems can't be solved. Obviously they can. But the structure of the shifts table, and in particular the way in which the contents of the shifts table evolves over time, is such it is not at all clear that storing it in a database is really the best way to do it. The shifts table evolves over time in a very particular way. First, changes are rare. Second, retroactive changes are even rarer, and are generally an indication that some kind of mistake has been made and needs to be dealt with. And third, it is important to know not just what the shift schedule was on date X, but also what the system on date Y thought the shifts shedule was on date X. And it is not at all clear that that information might not be better stored in a config file under revision control.

The devil is always in the details, and it gets worse. What if your data volume is huge and performance is an issue? In cases like that it is not at all clear that it would not be best to hard-code certain rarely-changing parameters directly into the code instead of sticking them in a table and hoping that your database optimizes the join properly. What if you're writing an embedded system and the data is parameters that, if they are wrong, can make something blow up?

There are certainly cases where it is appropriate to put data in tables, and some of Jeffrey's examples (though not all of them) are good examples of such cases. But to extrapolate from there to claim validity for a general rule that "data belongs in your tables, not in your code" is a serious mistake.

Monday, July 23, 2007

The world is full of Really Bad Ideas which look like good ideas at first glance. My favorite example of this is the Schick Slim Twin disposable razor. The Slim Twin is, as the name implies, a twin-blade razor. It has the "innovative" feature of having a small plastic tab between the blades. The tab is attached to a little button that lets you push the tab towards the business end of the blades thereby (ostensibly) forcing out the razor stubble and other assorted gunk that is trapped in the space between the blades. Schick actually had a series of TV commercials that touted this feature when the Slim Twin first launched many years ago.

It seems like a good if not particularly earth-shattering idea at first glance. That is, until you actually try it. What you find is that the Slim Twin actually collects a lot more gunk than other razors. This is because the plastic tab blocks the space between the blades and actually causes gunk to build up in the first place! When the tab isn't there, any accumulated gunk just falls out of the back of the blades. So this "innovation" actually causes the problem it purports to solve. (And in fact, it makes the problem much worse, because hair can get caught between the tab and the blades, at which point it becomes all but impossible to dislodge. Don't ask me how I know all this.)

Which brings me to this blog entry from Jeff Smith where he asserts that "data belongs in your tables -- not in your code." It seems like a plausible enough assertion on its face, kind of like the idea that having a little tab in the razor to push gunk out ought to be a useful feature. But he doesn't back up this assertion with any actual arguments, only with examples. And in those examples he looks only at the benefits of storing data in tables instead of code and none of the drawbacks.

There are a lot of problems with storing data in tables the way Jeff suggests, but there is one overriding uber-problem, but I won't spoil the fun by telling you what it is just yet. Instead, consider what happens if you follow Jeff's advice, for example:

Now consider what would happen if the company's shift schedule were to change. Simple, you just update the SHIFTS table to reflect the new schedule and you're done, right?

Except that all your historical data is now wrong because it is based on the old shift schedule. And that old shift schedule is now gone.

So the first problem with storing data in tables is that relational databases don't have revision control. Code does. And if you have data that has the kinds of dependencies that revision control systems are good at tracking then you might well be better off having that data in your code so that you revision control system can track it.

But there is a much more fundamental problem with Jeff's advice, and that is that there is no sharp dividing line between code and data. Look at those SQL queries. They are just strings, and hence they are data. So should we store them in a QUERIES table? For that matter, look at the code itself. That is just data too. Why not store that in a table?

The fact of the matter is that the admonition to store data in tables is completely vacuous because the distinction between code and data is arbitrary. It is therefore, just like the tab in the Slim Twin, worse than useless because it seems like such a good idea but in fact it creates problems rather than solving them.

The right way to decide what to put where is to look at the properties of the data you need to store. If it's large quantites of identically structured data that doesn't change in ways that alters referential integrity then it probably belongs in the database. If it's small quantities of data whose structure defines the semantics of other data and which doesn't change at run-time, then it probably belongs somewhere else, if not actually in the code then probably in a configuration file under revision control.

But just because something "looks like data" doesn't mean it belongs in a table.

UPDATE: I do not deny that the problems with Jeff's original example can be fixed. But the point is 1) there are problems and 2) they have to be fixed and 3) the process of fixing the problems is, in this example and many others, essentially, re-invention of revision control. There is no magic that automatically accrues unvarnished benefits merely from moving "data" (whatever that means) out of code and into a database, and applying Jeff's advice uncritically is as likely to create problems as solve them.

Sunday, July 22, 2007

Remember Sara M. Taylor's testimony before the Senate Judiciary committee? The part where she said that she "swore an oath to the president"? Senator Patrick Leahy pounced on that comment, lecturing her about how her oath was to the Constitution, not to the President.

Consider: what if Leahy was wrong? What if that wasn't a Freudian slip? What if Sara Taylor really had (secretly, of course) sworn an oath to the President? Perhaps Leahy should not have been so quick to correct her. Maybe he was on to something and didn't realize it.

The Bush administration has denied a formal request from Congressman Peter Defazio to see the secret plans for operating the government after a terrorist attack.

WASHINGTON -- Oregonians called [Congressman] Peter DeFazio's office, worried there was a conspiracy buried in the classified portion of a White House plan for operating the government after a terrorist attack.

As a member of the U.S. House on the Homeland Security Committee, DeFazio, D-Ore., is permitted to enter a secure "bubbleroom" in the Capitol and examine classified material. So he asked the White House to see the secret documents.

On Wednesday, DeFazio got his answer: DENIED.

"I just can't believe they're going to deny a member of Congress the right of reviewing how they plan to conduct the government of the United States after a significant terrorist attack," DeFazio says.

Bush has also issued an executive order allowing the Administration to seize the property of anyone who opposes the war in Iraq. And so the fifth amendment bites the dust along with the first, fourth and ninth (to say nothing of Separation of Powers). Four down, six to go. (What, you really think the second amendment is safe just because the emperor calls himself a republican?) Meanwhile, the Democrats fiddle while Democracy burns.

Tuesday, July 17, 2007

My previous post is generating a surprising (to me) amount of controversy, and there were a number of comments that I thought deserved considered responses. But writing those responses in Blogger's tiny little comment window (Google people, are you listening?) was getting really annoying so I decided to escalate.

Do you consider their opinion to be more authoritative than any other obvious sources (say, a local christian bookshop or church committee or the bible), and if so, why?

This is a very good question, and I have three different answers for it:

First, the meanings of symbols have nothing to with authority. The meanings of symbols derive entirely from the intent of those who employ them, and from the perceptions of those who view them.

Second, it is fairly clear that the ring in this case is a Christian symbol. It is widely recognized as a Christian symbol, and it is inscribed with a reference to the New Testament, which should quell any remaining doubt.

But third, and most important, the ring is a red herring. If the girl had been wearing a crucifix on a chain the school's prohibition on jewelry would (presumably) still have applied. And surely no one would question that a crucifix is a Christian symbol.

So anyone should be allowed to take anything, call it a symbol of some religion

Yes, of course, as long as it is their religion. No one should be allowed to decide what is and is not a symbol of anyone else's religion.

(even though it isn't generally recognised as such)

Yes, of course. Some people have their own private religions with their own private theologies, symbols and rituals. Who are you to tell me that what I choose to be the symbols of my relgion are not valid?

it doesn't follow that you're allowed to say what you want, when you want, and where you want.

A straw man. No one disputes that freedom of speech has limits. You can't cry fire in a crowded theatre or commit libel. Clearly none of those circumstances apply in this case.

it doesn't entitle you to a free audience

It's a ring, for crying out loud. It's not like she's getting up in the middle of class with a bullhorn.

The real problem here is that the underlying prohibition on jewelry is inherently discriminatory against religions like Christianity which tend to render their symbology as jewelry rather than, say, clothing or makeup. Jews have yarmulkes. Sikhs have turbans. Hindus have Tilakas. But the principal symbol of Christianity is the Cross, and the principal means of displaying it on one's person (at least in the U.S.) is as a pendant hanging from a chain. So the issue is not the ring per se, the issue is that any blanket prohibition on jewelry necessarily discriminates against Christians, just as any blanket prohibition on wearing head-coverings indoors inherently discriminates against Muslims (and Jews and Sikhs).

Monday, July 16, 2007

I stand foursquare with Voltaire (or whoever it was who first said it): I may not (and in this case most assuredly do not) agree with what you say, but I will fight to the death for your right to say it. Or at least blog about it.

Sunday, July 08, 2007

Congress has issued subpoenas. The White House has refused citing executive privilege. The fight will end up in the Supreme Court, which will side with the White House on the grounds that this is not a criminal investigation. (It doesn't really matter. If it were a criminal investigation they would find some other excuse, but this is the most defensible argument for not following the Watergate precedent, so that is the one they will use.)

At that point, Congress's only option to restore separation of powers will be to impeach Bush, Cheney, and at least some of the members of the Supreme Court. Which they could do. But which, of course, they will not do because the Dems are spineless cowards who are afraid of their own shadows.

Wednesday, July 04, 2007

What will happen if you take a ping-pong-ball-sized sphere and fly it in orbit around the earth for eighteen months? The Bible won't tell you, but Albert Enstein predicted what would happen almost 100 years ago and got it right to within 1% (and that is most likely experimental error). Now that is a prophecy fulfilled.