Finding the Most Recent and Consecutive Data

Question: [Joe Celko's Pension Plan Puzzle]

Luke Tymowski, a Canadian programmer, posted an
interesting problem on the MSAccess forum on
CompuServe in 1994 November. He was working on
a pension fund problem. In SQL-92, the table
involved would look like this:

The SIN column is the Social Insurance Number, something
like the SSN Social Security Number used in the United
States to identify taxpayers. The pyear column is the
calendar year, the service column is the number of months in
that year the person worked, and earnings is their total
earnings for year.

The problem is to find the total earnings of each
employee for the most recent consecutive 60 months of
service. This number is used to compute their pension.
The shortest period going back could be five years
with 12 months of service each. The longest period
could be 60 years with 1 month of service each.
Some people might work four years and not the fifth,
and thus not qualify for a pension at all.

The reason this is a beast to solve is that
"most recent" and "consecutive" are hard to write
in SQL. HINT: For each employee in each year,
insert a row even in the years they did not work.
It not only makes the query easier, but you also
have a record to update when you get in new information.

Answer: This query will get me the starting and ending years
of consecutive periods where (1) the employee
worked (i.e. service greater than 0 months) and
(2) the service totaled 60 or more consecutive months:

CREATE VIEW
PenPeriods (sin, startyear, endyear, totearnings)
AS SELECT P0.sin, P0.pyear, P1.pyear,
(SELECT SUM (earnings) -- total earnings for period
FROM Pensions AS P2
WHERE P2.sin = P0.sin
AND P2.pyear BETWEEN P0.pyear AND P1.pyear)
FROM Pensions AS P0, Pensions AS P1
WHERE P1.sin = P0.sin -- self join to make intervals
AND P1.pyear >= (P0.pyear - 4) -- why look sooner?
AND 0 < ALL (SELECT service -- consecutive service
FROM Pensions AS P3
WHERE P3.sin = P0.sin
AND P3.pyear BETWEEN P0.pyear AND P1.pyear)
AND 60 <= (SELECT SUM (service) -- more than 60 months
FROM Pensions AS P4
WHERE P4.sin = P0.sin
AND P4.pyear BETWEEN P0.pyear AND P1.pyear);

The subquery expression in the SELECT list is a
SQL-92 trick, but a number of products already have it.

The gimmick is that this will give you ALL the
periods of 60 months or more. What we really want
is the most recent endyear. I would handle this
with the Pension Period VIEW I just defined and
a MAX(endyear) predicate:

SELECT *
FROM PenPeriod AS P0
WHERE endyear = (SELECT MAX(endyear)
FROM PenPeriod AS P1
WHERE P1.sin = P0.sin);

I could handle that with some ugly HAVING clauses in SQL-92,
I could combine both those subquery predicates with
an EXISTS clause, etc. The trouble is that these features
would not be as portable right now.

As an exercise, you can try to add another predicate
to the final subquery which says there does not exists
a year between P0.pyear and P1.pyear which is greater
than the P4.pyear and still gives a total of 60 or
more consecutive months.