Monday, December 10, 2007

One of the things I've gotten out of one of my business parters in Vsched (that online employee scheduling software I'm working on), is a bunch of best practices relating to managing lots of data. One of these is using start and end dates on critical data, rather than actually deleting this data.

What my partner suggests doing is to use start_date and end_date fields containing effective-as-of and effective-until timestamps. Then, if you need to delete this data, you UPDATE the end_date field with the current timestamp instead of using a DELETE query.

It does make queries over this data more complex: you need to add another WHERE predicate (e.g. start_date >= NOW() AND (end_date IS NULL OR end_date < NOW())). So that might be an issue, but hopefully you're looking at best practices before you implement, and you can always worry about perf later, right ;)

So why would you want to do this? Well here are a few arguments:

This will provide your system a better audit trail. You want to know who did what to your customer data and when they did it. If you rely on the method described in my 5Ws of database design post, you'll lose that trail when you do your DELETE. But if you just UPDATE that row appropriately, you're safe.

This will also allow an undo feature for those nasty delete features users claim to want. I say "claim", because my users claim to want to delete their data. But I know as soon as I add the feature, I'll get a bunch of them accidentally deleting data. And what will they do when they delete it? They'll curse their bad luck. Then they'll email my tech support list. Then they'll send me an email. Then they'll call me. Right when I'm about to sit down with friends and a drink. So much for that evening. But wouldn't it be better if they could undelete their own data? Yes. It would be better.

Well, as a matter of fact, I've got such a delete feature. But I didn't implement it with DELETE. I used the above end_date scheme. And it works great. Except I forgot the front end :). So when that accidental delete happened, I still got the call. But I could check the end_date and last_updated_by fields. Long story short, I found those 300 deleted rows. And with a single UPDATE query we were back in business.

Disclaimer

To quote a fellow Wisconsinite: All opinions expressed on this, our personal blog, are well-reasoned and insightful. Needless to say, they are not those of our employers. (...Whose opinions may also be well-reasoned and insightful; but, well, you know how it is.)