Thanks to Hetal Rupani for the suggestion for this…at the 2012 Tableau Customer Conference Healthcare Meetup, issues with utilization calculations made up one of the biggest breakout groups. Hopefully this thread can serve as an aggregator of various ideas on how to work with this.

What are utilization problems that you have solved with Tableau? What are useful views to you? What are issues you ran into with the data?

Or, what kinds of reporting and analysis do you want to do for a unit census, or tracking patient throughput in your OR, or average # of waiting patients in your office, etc.? What are the questions you are trying to answer, Excel reports you're trying to duplicate, etc.?

To get things started, this is a tricky area because Tableau doesn't have a built-in set of duration & bucketing calculations to be able to look at this kind of data over time. Depending on the source data, the desired level of aggregation and level of detail in the output, and even the chosen mark type (bars vs. lines in particular), different techniques are needed and currently not well defined. For example, while answering this forum post http://community.tableau.com/message/191354#191354 I used a few different methods before figuring out that the final result (at least as far as I know) needed a fairly straightforward table calc - WINDOW_AVG(SUM([Number of Records])).

Here's a list of forum posts (and one blog post) that deal with these kinds of calculations:

For utilization, I've used two methods. For ones that span multiple days, I restructure my data using custom SQL (method 1). For ones that do not span multiple days, I've created custom calculations in Tableau (method 2).

However, we would like to revert to level of detail functionality as you mention this should be possible as well and much less prone to error. Would it be possible for you or another bright mind on this forum to elaborate on how your original example could also be done with LOD expressions? Or point to newer but similar cases where this is already used as I am unaware of these cases after searching the forum for a while now.

This issue has been stumping me for a while now. Admittedly, my advanced analytical skills (including SQL) aren't at the level of many commenters in this thread. With that said, has anyone been able to identify a way to calculate, for example, an hourly "census" using start date and end date that can be replicated by the less "code-savvy" users that Tableau is so well known for attracting?

Here's my particular use case:

Need an hourly census by day of week (heatmap) that spans time periods up to 2 years. This will give users an idea of how to staff their departments. For Acute Care units, a patient can remain in the unit for days at a time. They need to be counted for every hour that they're in the unit up to and including the hour of discharge. The users also need to be able to drill down into any point on the heatmap for detailed patient level data.

After researching this for a few days, I'm convinced that it can be done, but I'm also convinced that a fair portion of it may be over my head. Since this has been an issue since Tableau's inception - have there been any advancements that will appeal to users such as myself? If not, which of the methods in this thread will be the best suited for my case?

Thanks - this is the single number one issue holding us back from creating dashboards for a number of departments.

Is your data in Oracle by any chance? If so, I wrote a package in Oracle to do this very thing that I could share. We use it for dozens of census calculations that we run for Emergency/Inpatient/Long Term Care patients, beds and staffing levels. It can average across the hourly or day and slice by facility/unit/location group/staff type/etc...

It works quite well and would take some time for me to write up, but if you are using Oracle, it's quite a slick piece of code that runs fast that other cartesian join methods that I have used in the past.

I really will write this up better some day, but for now here is a brief explanation of the core of the code. Essentially I do a self join on a small chunk of the source table with events. The keys being (1) the calculation of the average census over a given hour (no snapshot census values please!) and (2) limiting the size of the self join for speed/efficiency sake.

HAVING (SUM(LEAST(d.finish_dt,s.start_DT + 1/pwindow_resolution) - GREATEST(d.start_dt,s.start_dt))*pwindow_resolution) IS NOT NULL

Brief explanation of each line:

(SUM(LEAST(d.finish_dt,s.start_DT + 1/pwindow_resolution) - GREATEST(d.start_dt,s.start_dt))*pwindow_resolution) avg_COUNT This bit of code is calculates the average census over a given time window. Find the earliest ending event (either the patient left mid hour, or the end of the hour) and the latest starting event (the mid hour arrival of the patient or the end of the hour). pwindow_resolution defines the granularity of the time window. In my case this is usually 24 (divides the day into hours – oracle’s default data calcs result in the unit of ‘days’)LEAST and GREATEST are analytic functions that compare values in two different columns … a MIN/MAX only looks within a single column of data.

(SELECT * FROM pdst_table WHERE start_DT >= :2 AND start_DT < :3 AND data_source = pdst_data_source AND ptn_status = pdst_patient_status) s This bit gets the date range and dimensions from the destination table. In my case this would return the exact same fields as the very first select statement (i.e. data source, institutions, patient status, start date, and a blank avg_count (census count)):2 and :3 are Oracle bind variables. You can hard code these for testing purposes, but since I run this through a package, I am dynamically changing these.

LEFT OUTER JOIN pdst_table_TEMP d ON (s.start_DT + 1/pwindow_resolution) >= d.start_dt AND s.start_dt < d.finish_dt AND s.inst = d.inst AND s.data_source = d.data_source AND s.ptn_status = d.ptn_status This is a self-join essentially. This is really inefficient if you run this against your whole table. In my case, we have 10’s of millions of records and doing a self-join is impractical (used to take a week, new code takes 1 hour:). Instead, I wrap this code in a loop that does a 7 day chunk at a time. So what I do here is I create a temporary table that contains one week of data and then do a join to that.

GROUP BY s.data_source, s.inst, s.ptn_status, s.start_dt This is to roll everything up. The start_dt represents the starting point of the time window, so the time windows is from the start_dt to the next record's start_dt

HAVING (SUM(LEAST(d.finish_dt,s.start_DT + 1/pwindow_resolution) - GREATEST(d.start_dt,s.start_dt))*pwindow_resolution) IS NOT NULL I believe this gets rid of the blanks so I don’t waste time joining nulls (the destination table already has zero’s filled in across the all records)

I'm attempting to use Mark Jackson's code from above with my data. I think it may work, but I'm trying to understand a few things. For your report, do you need the patient detail available for each hour, so the user can drill in and see all patients there that hour? This is what I need and it looks like the only way to accomplish this is to create a cartesian product between my detail data and the temporary table that stores each date. So essentially I'm multiplying each detail row x each day in my date range. This might be manageable, but it's quite unwieldy. I can't think, though, of any other way to do it if i need to maintain detail.

Just brainstorming here. Not sure if Mark is still around. Not able to @Mention him.