Pages

Monday, January 23, 2017

Temporal validity and open/closed intervals

A short while ago I created a SQL quiz for Oracle Dev Gym (PL/SQL Challenge) demonstrating the use of temporal validity and VERSIONS PERIOD FOR syntax to create a "change report" for changes in item prices. Use of temporal validity makes this easier, shorter and more readable code, but even though I thought I had it all covered, player Iudith Mentzel pointed out a tiny quirk I'd overlooked.

You see, the "change in price" wasn't really happening at one specific point in time - the old price was valid up to (but not including) a point in time (open-ended interval) and the new price became valid from (and including) that point in time (close-ended interval). If my "change report" happened to match those "ends" of the validity intervals exactly, it would never catch the "change" happening.
All of the SQL shown here can be found in the demo script here.

So for this demonstration I'll have a price table with a temporal validity period valid_price defined on it. I'll assume in this case each item will always have exactly one valid price at a given point in time - that is, that the periods for one item never overlaps and never has gaps in them.

VALID_FROM is included in the valid_price period interval (closed interval), VALID_UP_TO is excluded from valid_price period interval (open interval), and NULL in either date column means "from/to infinity".

We can use analytic LEAD function to have the VALID_UP_TO and PRICE of the "old" row and the VALID_FROM and PRICE of the "new" row together

select item
, valid_up_to as old_valid_up_to
, price as old_price
, lead(valid_from) over (
partition by item order by valid_up_to
) as new_valid_from
, lead(price) over (
partition by item order by valid_up_to
) as new_price
from prices versions period for valid_price
between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
order by item, valid_from nulls first, valid_up_to nulls last
/

select item, old_valid_up_to, old_price, new_valid_from, new_price
from (
select item
, valid_up_to as old_valid_up_to
, price as old_price
, lead(valid_from) over (
partition by item order by valid_up_to
) as new_valid_from
, lead(price) over (
partition by item order by valid_up_to
) as new_price
from prices versions period for valid_price
between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
)
where new_price is not null
order by item, old_valid_up_to
/

select item, old_valid_up_to, old_price, new_valid_from, new_price
from (
select item
, valid_up_to as old_valid_up_to
, price as old_price
, lead(valid_from) over (
partition by item order by valid_up_to
) as new_valid_from
, lead(price) over (
partition by item order by valid_up_to
) as new_price
from prices versions period for valid_price
between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and to_date('2016-11-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
)
where new_price is not null
order by item, old_valid_up_to
/

OK, we've now shown the change report for the entirety of November, which reports a single price change for Beachball. Before we had the change report for the entirety of December, which reports no price changes for Beachball.

But aren't we missing something? How about if we make a change report for November+December together?

select item, old_valid_up_to, old_price, new_valid_from, new_price
from (
select item
, valid_up_to as old_valid_up_to
, price as old_price
, lead(valid_from) over (
partition by item order by valid_up_to
) as new_valid_from
, lead(price) over (
partition by item order by valid_up_to
) as new_price
from prices versions period for valid_price
between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
)
where new_price is not null
order by item, old_valid_up_to
/

This time we spot that there was a change for Beachball from a price of 12 valid up to (not including) 2016-12-01 to a price of 15 valid from 2016-12-01. Even though our two previous change reports covered the same time period, neither reported this change.

The "problem" is the price interval "ends" match completely the intervals we are using in our November and December price change report, thus we won't get "old" and "new" row in the same set of data.

If we want such a "price change" report, we'll have to use a "closed" interval for the month we are reporting for, so that when we do next months report, we'll actually have a one second "overlap" with the report from this month:

select item, old_valid_up_to, old_price, new_valid_from, new_price
from (
select item
, valid_up_to as old_valid_up_to
, price as old_price
, lead(valid_from) over (
partition by item order by valid_up_to
) as new_valid_from
, lead(price) over (
partition by item order by valid_up_to
) as new_price
from prices versions period for valid_price
between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
)
where new_price is not null
order by item, old_valid_up_to
/

This isn't demonstrating any problem with temporal validity. The interval definitions are as they have to be and temporal validity is simply an easier way to deal with these things than doing it yourself with predicates involving proper combinations of AND, OR, >=, < and IS NULL.

But no matter if you do it yourself or you do it with temporal validity, you should beware of how "closed" and "open" intervals work and why the "semi-closed" intervals in temporal validity sometimes can fool you, like missing that I'd actually need "overlap" when doing an otherwise seemingly straight-forward "price change report".

Thank $deity for sharp Dev Gym players - I can always count on them to discover the little quirks I miss myself.