Yesterday on my favorite IRC channel fooqux asked interesting question. I took some more questions, and here is problem description:

We have a system which, every 5 minutes, takes a number of tasks to be done. Tasks are uniform. Within 5 minutes we can handle at most 100 tasks. Given the history of number of tasks added every 5 minutes, calculate backlog at any given moment.

Did you understand the problem? Well – I didn't. So, let's see the data, and expected output.

So. Let's see the data. I generated some random data, and here is how it looks:

Nothing really fancy here (side info: check those nice, Unicode frames in new psql will be writing about them some time in future).

So, the idea is that every 5 minutes (every, there are no gaps) new number arrives. And this is number of items to be processed. And every 5 minutes we can process 100 tasks, which is fine when we get 15 new tasks, but when system gets 150 tasks, and it will do only 100 of them in its 5 minute period of tim – it will backlog.

Every backlog calculation has to start from some specific point in time, where we know it was 0. In here – we know that on ‘2000-01-01 00:05:00′ it was 0 (because there were no events before).

Also, on every time frame we have to calculate how many tasks could be processed, and how many will have to be added or subtracted from backlog.

Just to remind: greatest( X, Y ) will return greater of these numbers. So if calculated backlog will be > 0 – it will be returned. If it will be < 0 – 0 will be greater, and backlog will be 0. Hope it's clear

Now. We have our first row ready. What about others?

In next rows we should reference previous one. Luckily it's very easy:

@Efgé:
That’s what fooqux wanted, and that’s what I originally tried to write, but it seems that I can’t write it with window functions. If you can – great. Show us – I mean – it’s perfectly possible that I overlooked something.