Oracle Data Warehouse Design and Architecture

Menu

Generating Lists of Dates in Oracle the DBMS_Scheduler Way

A rather common question in the Oracle community goes along the lines of this: “How do I generate a list of dates?”, with variations such as needing only working days, excluding specific dates, etc..

If there is a standard method for this it tends to go along the lines of this:

Generate a rather vanilla list of dates using a CONNECT BY clause on the DUAL table

Filter the results based on day of week or other criteria.

Other more sophisticated variations are also out there in the wild, and for the most part there is nothing technically wrong with them if they are fast and accurate and easily understood.

At the upper end of complexity it can be rather tricky to develop an accurate query. Here then is an alternative method that uses DBMS_Scheduler to define a calendar. DBMS_Scheduler has a sophisticated calendaring syntax described in the 11gR2 documentation here (with some interesting examples that are definately worth browsing here), that would do a very good job of implementing some complex list of date requirements. Although it would probably be my first choice for very complex list generation it also does such a decent job of simple list generation that I’d be inclined to consider it for those as well.

The method is built around the DBMS_Scheduler subprogram Evaluate_Calendar_String. This subprogram returns as one of its parameters a single date (well actually a TIMESTAMP WITH TIME ZONE) which is the “next” date in a calendar calculated from the following three input parameters::

A calendar string that defines the repeating interval of the list of dates. Here is the point where you would specify that you are interested only in the last-but-one day of every month as “FREQ=MONTHLY; BYMONTHDAY=-2;” for example.

A start date that gives the earliest date from which the calendar string is to be evaluated. For example, from today’s date, or from the beginning of this year. The start date may be redundant depending on how complete the specification of the calendar string is, and to me it seems like a hangover from the days of DBMS_Job in which you specify a start point and a method for calculating the next date based on it. Anyway, with a complete calendar specification the start date can be ignored. The example above is not complete because it does not specify the time of day for the calendar dates. This could be inferred from the start date if it were something like “2010-01-01 00:00:00”, but if you did not specify a start date then the calendar will inherit the time of day that you executed the program. It’s safer in my opinion to specify a complete calendar string such as ‘FREQ=MONTHLY; BYMONTHDAY=-2;BYHOUR=0;BYMINUTE=0;BYSECOND=0’

A date from which the next date in the calendar will be calculated. Typically this date would be the previous date in the list of calendar dates.

So with a complete calendar specification the Evaluate_Calendar_String program comes down to two elements: a calendar string, and a date to return the next calendar date after. Pretty simple.

We can therefore use the program as the basis for a pipe-lined function that accepts a calendar string and an initial date to start calculating subsequent calendar dates from, and which returns the desired list of dates.

The optimiser will think that the function will return 8,168 rows. Possibly there is a way around this.

You can reference existing schedules as part of the calendaring syntax.

If you really do stick to using the TIMESTAMP WITH TIME ZONE data type then you can be daylight savings time sensitive (see docs).

Anyway, there you go. I quite like this method because it’s very easy to tackle date lists of just about any complexity with it, especially when you can just copy-paste an existing method and modify it. If you are used to using it then as soon as you see this code you immediately recognise that a list of dates is being generated, and the definition of the list probably leaps out at you in the way that my last example hopefully did.

I expect that whatever the complexity of the calendar people generally do have a reasonable idea of the cardinality that they’re expecting, so the extensible optimiser would be an interesting choice. For example, if one was generating a list of all the Mondays and Thursdays in a single year then 52*2 would be a close estimate. Given particular start and end dates you could just estimate the cardinality for that calendar as “days between start and end dates multiplied by 2/7”. Without the fractions, of course.

Cheers Dom — it’s not to everyone’s taste I’m sure, and the instinctive reaction is probably “Neat trick but it’s not for me at the moment”.

The truth is that most people are going to get by with a simple pure SQL technique for generating lists of dates. However, if you have struggled to define a more complex list using SQL and have then implemented this solution, I think there would be a temptation to go back and refactor previous lists to use the same technique. It’s definitely one of those issues where it looks like too much effort for the first simple implementation, but once in use it’s very simple to re-use.

I really ought to add in the cardinality estimation based on the Extensible Optimizer technique.

One sticking point on my system is when to ship products to customers. Several algorithms have been used over time (“if it’s before noon on Thursday…”), eventually to be replaced by “make the order entry people put in a date.” Now with EDI, the customer is saying what the date is they expect it at their door, so without order entry people input… At one point I unthinkingly suggested using a calendar, so we could account for holidays and such, and of course that brings out the fact that customers have different locations with different holidays, some of which may or may not be celebrated, and our sites have the same… makes me want to roll a Mayan calendar at someone.

Hmmm, I was accidentally thinking about this while gardening this weekend. My first thought had been that a requirement to find a date prior to a particular date, as your requirement seems to be, exposes a weakness of this dbms_scheduler method because it only does “ascending” calendar calculations. It’s inherent to the way that the DBMS_Scheduler subprogram works that it finds the next date, not the previous date. But in fact you could restate your requirement as needing to find the date for which the next date in the sequence is the required delivery date.

It’s interesting to wonder how most efficiently to cope with different customers having different calendars of work and non-work days. One method would be to define general calendars to be referenced by all customers, and then maintain lists of included/excluded days on a per customer/customer-location basis that can be string-aggregated into the scheduler calendaring syntax for use in INCLUDE and EXCLUDE clauses. sounds fraught with problems though.

Not resolved, waiting on business requirements to be clarified. Since EDI tends to have a lot of per-customer customization, I expect a non-global solution, and probably several go-arounds as people try to force a global solution. The people who actually put stuff on the truck are the ones who know what the shipping conditions are (like, Colorado Springs in February), but the information is putatively needed at order time, which could be months earlier, or could be JIT. Good times.

My friends once had a band called “Big Food.” They had a song circa 1984 called Gardening, which IIRC started with the lines: I really like gardening
I really like to stand in sh*t