Business Intelligence, SQL Server, and other assorted IT miscellany

Recently, I had the need to analyze phone call data to answer questions such as how many phone calls were received in a given day and how frequently voicemail answered instead of a live person. In this scenario, I was fortunate enough to have a fairly accessible phone system to work with — a Cisco UC520. While this guide is specific to working with a Cisco UC520 device, most Cisco phone systems (the UC series or anything utilizing CUE/CME) should be pretty comparable.

So, you want to be able to analyze Cisco call data? Well, you’re in luck! There are three major steps to this process:

Extract raw call data from the phone system

Capture the exported data and interpret it

Insert it into SQL Server and perform reporting

This won’t be a complete step-by-step guide, but I’ll try to hit all the high points and am always open to questions.

Extracting Call Data from the Phone System

Cisco phone systems have the ability to export call data, known as Call Detail Reports (or CDRs) via one of three major transport mechanisms:

File-based (sent to a remote device via FTP)

RADIUS-based (sent to a remote RADIUS server)

SYSLOG-based (sent to a remote SYSLOG server)

You’ll notice none of these three steps is a native SQL server, so we’re not getting off that easy. That being said, the File-based FTP mechanism makes for a fairly simple method of retrieving the data from the phone system. With the file-based FTP mechanism, the phone system will send CDRs (which are simply CSV text files — but more on that later) at a specified interval to a specified FTP server. Configuring this will require “enable” (level 15) configuration level access to the phone system. Cisco provides the following configuration guide, which we’ll use with some slight modifications, to configure the system to extract call data to an FTP server.

First, configure an FTP server somewhere where you’ll be able to later access the FTP data via a UNC network path. Make a note of the username and password and ensure the account has the ability to read, write, delete, and append to files in the directory you specify for the FTP account.

Next, connect to the phone system via a Telnet/SSH connection and perform the following commands (as always, take a backup first):

The previously mentioned configuration guide provides detail regarding what each command does, but a couple of notes I wanted to make regarding a couple of the commands:

“maximum cdrflush-timer 55” / “maximum fileclose-timer 60” – Configures the system to export data about hourly (the two numbers need to be at least 5 minutes apart). It’ll upload and append more frequently if the buffer fills.

“cdr-format detailed” – This is important as the output CSV file can be in either the “compact” or “detailed” format. The detailed format has 130 columns, identified here. The “compact” format is supposed to have only 23 columns of a specific subset, however, I found that it had a few more than 23 and the column subset wasn’t exactly what is specified in the documentation — which is difficult as the output file has no headers, only data. Regardless, we need at least one field not included, so “detailed” is the way to go. We will later skip columns which contain data we don’t need.

“acct-template callhistory-detail” – Putting the CSV file into “detailed” mode above is a start, but many columns will contain no data without the callhistory-detail template selected.

At this point, the phone system is configured to collect call data (CDR files) and export them to an FTP server. With the above settings, it’s set to do this about once an hour or so. For testing, if you don’t want to wait, you can force it to export right now with the following command:

1

2

file-acct flush with-close

file-acct reset

Capturing the Exported Data and Interpreting It

Now that we’re able to retrieve detailed call data from the phone system on a regular basis, we need to capture it into SQL Server and interpret it. I’m actually going to cover this in the opposite order, since interpreting it makes everything easier.

The file-based method configured above outputs CDR data in CSV files about once an hour. These CSV files, in the “detailed” format, have 130 columns and no header row. Fortunately, Cisco publishes this guide with descriptions for the 130 columns. Between this guide and looking at sample date, with a little work, you can figure out which columns are the most important for what you’d like to do. I found the following fields most helpful:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

call-id - A unique ID assigned to the call. A call can have multiple branches (bouncing between an auto-attendant and various extensions, so multiple rows in the dataset may have this same ID)

h323-conf-id - A unique ID assigned to the call. A call can have multiple branches (bouncing between an auto-attendant and various extensions, so multiple rows in the dataset may have this same ID)

h323-connect-time - The time the call was connected, in "NTP format" and with a leading period

h323-disconnect-time - The time the call was disconnected, in "NTP format" and with a leading period

h323-call-origin - From my experience, this has always been answer or originate and helps to determine the call direction

clid - The calling number

dnis - The number being called

gw-collected-cdn - "Destination number collected by the gateway used to route calls." In my scenario, this holds the number that was dialed for outbound calls, since outbound calls are broken up into two rows and the dnis field in this case is either blank of the same as the extension making the call.

feature-id-field1 - One of the most useful fields, this tells the type of row entry. For instance, TWC is a two-way-call and what we are mostly interested in.

feature-id-field12 - For calls which go to the group voicemail box, this field contains the extension of the voicemail box and is otherwise empty.

Other fields may be useful in other scenarios, but from my interpretation of the data output by my system and what I wanted to know about activity, these seemed the most useful. As a simple example, an outside call may be placed and immediately answered by an auto-attendant. From there, the caller may be forwarded to a handful of phones which ring simultaneously, then ultimately forwarded to voicemail. Even though this is only one call, since it bounces around in the system, it’ll have a handful of log entries in the CDR. This is where determining what you want to know will allow you to use the above fields to filter out the unnecessary rows. Generally speaking, I was interested in knowing about:

Incoming Calls – The feature-id-field1 value will be “TWC” and the h323-call-origin will be “answer” — additionally, the dnis should be the main incoming phone number for the system.

Incoming Calls (reaching group voicemail) – The h323-call-origin field will be “answer” and the feature-id-field12 field will be the extension of the group voicemail box

Outgoing Calls (external) – The feature-id-field1 value will be “TWC” and the h323-call-origin field will be “answer” — additionally, the clid will be a 3-digit extension and the gw-collected-cdn will be 7 or more digits (as phone numbers are 7 digits + one digit to dial an outside line)

Outgoing Calls (internal) – The feature-id-field1 value will be “TWC” and the h323-call-origin field will be “originate” — additionally, both the clid and dnis will be 3-digit extensions

With an understanding of which columns are most interesting and how to use the values in various columns to determine interesting rows, it’s time to finally start importing the data into SQL Server. Initially, I explored three potential methods of importing data into SQL Server.

Write a custom Windows Service with C# – While this method is incredibly flexible and I could do this fairly quickly, it wouldn’t be easy to maintain over time.

Use SSIS – SSIS is designed for this type of work, so it’s a solution which is reasonable to implement and maintain over time. This made SSIS an easy choice for my solution.

The SSIS package can be broken up into three parts:

An initial preparation step which creates a staging table to hold all of the interesting columns of the CSV as well as a directory to store the “processed” CDR files in.

The transform step which parses through each CDR, cleans up various fields, and then imports them into the staging table.

The load step which loads only the relevant rows from the transformed data into a final table which can later be used as a source for analysis.

The overall package for Cisco UC SSIS Import

For the staging table which is created (where raw data is loaded from the CDR files) I chose only to load a subset of the columns in the CSV. In actuality, I am loading a number of columns which seemed interesting, but I’m not currently using. This’ll make future expansion a little easier should I need them for additional functionality.

Following the staging table setup is a Foreach loop which takes the name of the input directory as a parameter and then loops through the step for each file in the input directory. For each file that is present, the data flow task reads in the data, cleans it up, then moves the file to a “Processed” sub-directory so it isn’t imported again on the next run.

The primary data flow for the Cisco SSIS UC Import Package

The first step of the data flow task, “Flat File Source – Import CSV” is where the physical columns of the CSV are mapped to a table which will be stored in memory. This is also where the fact that CDR files do not contain a header row becomes REALLY annoying. Adding a header row to the file isn’t useful, since future files won’t contain the header. I ended up taking a CSV file to use as a sample, adding a header row to it, then adding a second header to it, numbered 0 through 129. I used this as my map to determine which columns to import.

Selecting the unnamed columns in the primary data flow task of the Cisco SSIS package

So, other than not having column names in the source files, importing data doesn’t seem too bad! There are three caveats to consider:

The Cisco is not shy about inserting rows that contain no data into the source CDR files. These rows will have the unix time in the first column and nothing else in any other column. The default behavior for SSIS is to import numeric columns which are null as 0, so the call-id column (aka Column 1) will be 0 for any invalid rows. So, a valid row has a call-id greater than 0. A split filters these out nicely.

The second caveat is our time fields. Cisco uses “NTP format” for their time fields. Additionally, they lead each field with a “.” and the day number in the date may be either 1 or 2 digits in length (not padded). So, the entire time field is going to be either a 32 or 33 character string. Since I am familiar with C#, I found creating a function within a Script Component to be the best way to convert these variable length strings into standard DateTime objects.

The final caveat to be concerned with is duplicate data. It’s possible, though unlikely, that data may be read in and inserted, but then be placed in the CDR a second time. When this happens, we do not want to import it a second time as it’ll skew our data. Fortunately, between the unix-time, call-id, h323-conf-id, h323-setup-time, and feature-id-field1, unique rows can be distinguished. An SSIS lookup can be done to compare these rows in the existing table vs. what has already been imported and only import the unique rows.

Once the invalid rows have been filtered out, the date strings convert to proper datetimes, and the duplicates removed, the data can be imported into a temporary staging table in SQL Server and we can move on to the last task in capturing the data.

Insert Into SQL Server and Perform Reporting

The final task is pretty simple. We need to move data from the staging table into our permanent storage table. We could just select all the data and import it, but there is a lot of intermediate call data There are a couple of ways to do this. For simplicity (and future expansion), I created a basic data flow task for each type of call I wanted to import (Incoming, Outgoing, Internal, and Incoming (Voicemail))

Data flow for final data imports with Cisco SSIS Import

Within the source for each data flow, a simple select statement is used to pull data from the staging table with two minor twists. The first twist is that rather than selecting directly from the table, I’m selecting from a view which does a few minor joins (joins to a table which translates extensions into internal usernames), renames a few fields, and for each of the 4 types of calls, it saves an integer, 1-4 describing the call type (Incoming, Outgoing, Internal, Voicemail). This makes the forthcoming reporting queries much simpler.

The completed SSIS package can now be scheduled to run periodically via a SQL Agent job. This completes the data process. The phone system outputs data to a file at least hourly. The import package runs periodically (at most once an hour) and inputs and exported data into a SQL Server table. This process repeats forever.

Now, it’s time to start seeing the fruits of our labor. One of the things I wanted to see was the average number of incoming calls by type and hour over a date range. With the data in our table, this was now a task that could be reasonably accomplished with the following sample query (which has been heavily modified to remove some business logic):

This query starts by creating a table variable which serves as a time dimension, with hours 0 – 24 and whether or not it’s considered a business hour. Doing this, instead of just using the values in the data table allows for every hour to always show up on the chart, even if it has 0 calls. This is important for showing the shape of an entire day.

Sampling from SSRS report of calls by hour and type (with some components removed)

With good, clean data now flowing to SQL Server on a regular basis, we can further make use of SSRS or other tools to create reports and visualizations for any number of other phone-related metrics — such as number of voicemails received, top callers, longest calls, etc. Additionally, if you have access to a caller ID database (internally or externally), phone numbers can be joined to this caller ID database for more intuitive display of phone numbers on reports.