I have a set of call detail records, and from those records, I'm supposed to determine the average concurrent active calls per system, per hour (at a precision of one minute). If I query 7pm to 8pm, I should see the average concurrent calls for the hour (averaging the concurrent calls for each minute) within that hour (for each system).

So, I need a way to check for a count of active calls for 7:00-7:01, 7:01-7:02, etc then average those numbers. A call is considered active if the call's time and duration fall within the current minute being checked.

What makes this even more difficult is that it needs to span SQL 7.0 and SQL 2000 (some functions in 2000 aren't available in 7.0, such as GetUTCTime()), if I can just get 2000 working I'll be happy.

What approaches to this problem can I take?

I thought about looping through minutes (60) in the hour being checked and adding the count of calls that fall between that minute and then somehow cross referencing the duration to make sure that a call that starts at 7:00 pm and has a duration of 300 seconds shows active at 7:04, but I can't imagine how to approach the problem. I tried to figure out a way to weight each call against particular minute that would tell me if the call was active during that minute or not, but couldn't come up with an effective solution.

The data types here are the same as I have to query against. I don't have any control over the schema (other than possibly converting the data and inserting into another table with more appropriate data types). I've provided some example data that I know has concurrent active calls.

CREATE TABLE Records(
seconds char(10),
time char(4),
date char(8),
dur int,
system int,
port int,
)
--seconds is an stime value. It's the difference of seconds from UTC 1/1/1970 00:00:00 to the current UTC time, we use it as an identifier (like epoch).
--time is the time the call was made.
--date is the day the call was made.
--dur is the duration of the call in seconds.
--system is the system number.
--port is the port on the system (not particularly relevant for this question).
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924228','1923','20090416',105,2,2)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239923455','1910','20090416',884,1,97)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924221','1923','20090416',116,2,15)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924259','1924','20090416',90,1,102)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239923458','1910','20090416',891,2,1)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924255','1924','20090416',99,2,42)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924336','1925','20090416',20,2,58)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924293','1924','20090416',64,2,41)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239923472','1911','20090416',888,2,27)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924347','1925','20090416',25,1,100)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924301','1925','20090416',77,2,55)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924332','1925','20090416',52,2,43)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924240','1924','20090416',151,1,17)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924313','1925','20090416',96,2,62)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924094','1921','20090416',315,2,16)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239923643','1914','20090416',788,2,34)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924447','1927','20090416',6,2,27)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924342','1925','20090416',119,2,15)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924397','1926','20090416',76,2,41)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924457','1927','20090416',23,2,27)

6 Answers
6

I think MarkusQ has the answer, but let me develop an alternative that you may find easier to use. I'll use my customary method of developing this as a series of simple transformations in views, an analogue of functional decomposition in a procedural language.

First, let's put everything in common units. Recall that record's column s is seconds since the epoch, midnight 1 January 1970. We can find the number of seconds since midnight of the day of the call, that call occurred, by just taking s modulus the number of seconds in a day: s % (60 * 60 * 24).

This just cross products/multiplies record rows, so we have one record row for each whole minute over which the call was active.

Note that I'm doing this by defining active as "(part of) the call occurred during a minute". That is, a two second call that starts at 12:00:59 and ends at 12:01:01 by this definition occurs during two different minutes, but a two second call that starts at 12:00:58 and ends at 12:00:59 occurs during one minute.

I did that because you specified "So, I need a way to check for a count of active calls for 7:00-7:01, 7:01-7:02". If you prefer to consider only calls lasting more than sixty seconds to occur in more than one minute, you'll need to adjust the join.

Now if we want to find the number of active records for any granularity equal to or larger than minute granularity, we just group on that last view. To find average calls per hour we divide by 60 to turn minutes to hours:

Note that in the last two examples, I'm using a SQL date (to which the functions month and year can be applied), not the char(4) date in your record table.

Which brings up another point: both the date and time in your record table are superfluous and denormalized, as each can be derived from your column s. Leaving them in the table allows the possibility of inconsistent rows, in which date(s) <> date or time(s) <> time. I'd prefer to do it like this:

If I understand you correctly, you want to get a count of all records for which the start time is less then t+60 seconds and the start time plus the duration is less than or equal to t, for each t in the interval of interest (e.g., t=7:00, 7:01, 7:02...etc.).

Then it's just a matter of averaging these counts.

But what is an average? It's just the sum divided by the number of items, right? In this case, the number of items will always be equal to the time range in minutes, and the sum will be equal to the sum of the durations-minutes that fall within the interval, which you can compute in one go off the data given.

Sound less impossible now? In pseudo SQL:

select sum(
((time+duration rounded up to next minute, capped at end of period)
- (time rounded down, bottom-capped at start of period) - 1)
/(1 minute) )
from Records
where date is right

Then just divide that by the number of minutes in the period of interest.

That's more or less what I'm trying to do. Except, I need if stime > t < stime + duration. The easy part is averaging the counts; the hard part is getting the them.
–
scottmApr 17 '09 at 19:09

No, I think you need to compare it the way I said; I'm assuming you made a typo and meant "stime < t < stime+duration" but this is still wrong; otherwise (for example) a call starting at 7:00:01 and lasting 50 seconds would not show up in any of the one minute intervals.
–
MarkusQApr 17 '09 at 19:33

My first bit of advice would be, if you ever find yourself saying (when using SQL) "I can create a loop..." then you should immediately start looking for a set-based approach. Get out of the procedural mindset when using SQL.

There are still a few fuzzy parts to your logic. Does a call count as being during a minute period if it simply has any portion of the call during that minute? For example, if a call starts at 1923 and lasts 62 seconds is it considered to overlap with all calls starting at 1924? I'm going to assume yes on that one, but you can adjust the code below if that's not true. It should be a minor tweak.

For the breakdown to minutes I would usually use a table with time spans - a start time and end time for each of the slices in which I'm interested. In your case since you are dealing with exact minutes and your start times are in minutes (even if the data type is screwed up) I'm just going to use a single column with the exact minute in question.

You'll need to either use that as a subquery to get the averages over a given time. Since it's late on Friday I'll leave that step up to you ;)

EDIT: One last caveat: I didn't account for time spans that cross day boundaries (i.e., go past midnight). Hopefully the code points you in the right direction for that. A better approach might be to create a view that turns all of those nasty strings into actual DATETIME values, then this becomes really trivial with the Minutes table.

I think its funny that you scolded me for suggesting a loop, but your first code example has one (although I understand the difference). I also like the suggestion of using a view to "change" the schema into a usable format, I'll definitely be using that in the future. This approach does get me the data I am looking for.
–
scottmApr 17 '09 at 20:10

The important fact here (and why I bothered reply after MarkusQ posted) is that the duration of a call itself isn't enough to calculate how many minute intervals are covered. In my example, both calls only last 2 seconds...

You need the following info:
- the "start time", rounded down to the minute
- the "end time", rounded down to the minute
=> covered intervals = number of minutes difference + 1

SUM that value for the range you're looking at, then divde by the number of minutes in that range, and you have your answer.

If you're only looking for calls that are truely concurrent you can't use such tricks, but it's still possible (I've had to do something similar). But for your definition of concurrent, this should do it...

I don't think that's going to work for me. Casting, say 12:00 (the time field), to datetime ends up as Jan 1 1900 12:00PM. I think the datediff functions are getting thrown off with that.
–
scottmApr 17 '09 at 19:52

No, that's fine. TSQL uses 01/01/1900 as the "zero base" for it's dates. So a "time" without a "date" part always comes up as "Jan 01 1900 ??:??:??.???"
–
MatBailieApr 17 '09 at 19:54

Oh right, because you are specifying the "minute" part, not the whole date.
–
scottmApr 17 '09 at 19:57

However, your query still returns 0 results when I specify date='2009 Apr 16', start ='19:00' and end='20:00' (which is the date range of the set of data I provided)
–
scottmApr 17 '09 at 20:02

Apparently "hhmm" is not appreciated as a time format. But "hh:mm" is fine. In place of "time" use......CAST(LEFT(time,2) + ':' + RIGHT(time,2) AS DATETIME)
–
MatBailieApr 17 '09 at 20:15

I approached the problem by converting the data into an easier format. I created a table where each row represents one minute of a call. Once you have that the average per minute by hour is simple. There are multiple selects in there to show the intermediate results. As long as the time range being queried and the durations aren't extremely large, it should be ok...?

I can see only one approach that extracts the data as specified from the call records:

Create a list of events, where event is defined as the beginning of a call or the end of a call. (Thus each call record will generate two events.) Each event item should contain: system, datetime, and the boolean begin/end. Datetime should be rounded down to the nearest minute.

Sort this list by (system, datetime) and scan it. For each call begin, increment CURCNT by one. For each call end, decrement CURCNT by one.

If the datetime value is different from the previous record, add CURCNT to HOURSUM. If the datetime value indicates the start of a new hour, divide HOURSUM by 60, write a new result record (system, date, hour, average), and reset HOURSUM to zero.

It should be obvious when to initialize CURCNT and HOURSUM, what to do when the system value is different from the previous record, etc.