Pages

Thursday, January 17, 2008

The client I currently work for has quite a few special tables containing flexible attributes. These tables have one foreign key to another table, the base table. The attributes in the table containing the flexibal attributes conceptually belong to the base table, but their attribute values can change over time - the table contains a startdate and an enddate. And it is important that one can see the value of those attributes at any given time.

Today I was working on optimizing a large query that involved (among many others...) a base table of 4M rows, and a flexible attributes table of 40M rows containing approximately 20 flexible attributes. The query needed the current value of 7 of those flexible attributes. I did some testing about what would be the best approach to query all those values. And because I think it is a case that is certainly not rare, it is now here on my blog.

So in this testcase, the base table contains 10,000 rows, and the flex_attributes_of_base table contains 430,001 rows. There are 23 flexible attributes ('attribute0' until 'attribute22'). Every record in the base table has two records for each of the first 20 attributes (0-19). One for 2007-01-01 until 2008-01-01 and one for 2008-01-01 and upwards. The next three attributes are slightly different - you may even call it buggy - just to see if the queries output the right data in case:

- an attribute does not have a current value because earlier records all have an enddate (attribute20)

- an attribute has more than one current value, in which case the query should display the one with the most recent startdate (attribute21)

- an attribute that only base_id 1 has a value for (attribute22)

Let's issue a very simple but typical query against the two tables to retrieve the base record with id 1234 and its current value for attribute 11:

But here the same table/index is accessed 10 times. One developer noticed that and wrote another query to address this problem. He was able to get rid of the exists-subquery using the row_number analytic function, like this: