What sources do you primarily connect to? How do you connect to them (extract vs. live), and why? How do you structure your data sources? Do you use a dedicated analytics database? Column storage?

I'll start. We use AllScripts EPSi for cost accounting / budgeting at all of our hospitals. All of our patient level accounting information ends up here from our various EHR systems, so it acts as a bit of a data warehouse for us. We build complex custom SQL statements to create the following distinct sources:

1. Encounter Level

2. Charge Level

3. Payments

4. IDC9 Diagnosis

5. ICD9 Procedures

For our physician practices, we use NextGen. I have one data source that has CPTs, RVUs, payments and charges. This is also a complex SQL query.

EPSi also houses labor productivity, general ledger, and A/P data that we tap into. GL and A/P data is fed from PeopleSoft, which we do not currently attach to.

We create Tableau data extracts from these queries that are published to Tableau Data Server and updated daily. We are in the process of going live across the system with Epic. Most of the important data will continue to flow into EPSi.

I'm curious to know how everyone else approaches data connectivity. Some things are pretty challenging because they require getting data from multiple data sources. When each source contains millions of records, it can really slow things down to blend. And creating custom extracts for every analysis is impractical. Going with an column store analytics database seems like it would be a challange too given that patients are always being reclassed and encounters are always being re-summarized.

Thanks for sharing Mark. My situation is a mixed bag as well, having to go to different data sources depending on the need. I use subscription market share data which gets segment down to the code level (DRG, ICD-9dx and px) for service lines. I also get extracts from our admissions and appointment systems to compare the operational data with market data. I’m part of our Marketing and Communications department, so there are other communications related data analyzed along with operational data to track marketing ROI. Being on the marketing side of the organization, however, I’ve not been approached yet about including detailed financial analysis. Our finance people have similar data sources to what you describe, but they use OBIEE.

Data sets that get appended regularly I've added to SQL server because of size. I’ve only encountered one performance problem with Tableau accessing SQL and that was solved with running an extract. It can extract a prodigious of data. However, I don’t (yet) have Tableau Server so I can’t provide any insight about blending data extract connections within Tableau Server.

Thanks for posting, Bob, I'd meant to reply to this and it got left in the queue.

We (Southern Maine Medical Center, a member of MaineHealth) use an older version of Meditech for the hospital EHR and billing and eClinicalWorks for the physician practices. Medisolv is in place as an OLAP layer on top of Meditech. We also have a few vendors who support core measures, CAHPS, event tracking (patient complaints, near misses, etc.), and other functions that I'm sure I'm forgetting. We're implementing Epic & Lawson to go live for our particular hospital in early 2014, there's also an enterprise data warehouse with a limited set of data (mostly around ACO reporting) from Recombinant. Finally, there's an enterprise BI tool selection process underway as the various hospitals and physican groups in MaineHealth are breaking down silos to work in a more organized fashion.

I work in the Quality Management department so although I can see charge data, we only really look at it in those places where quality and $$ are intersecting, such as Value Based Purchasing and Medicare Spending per Beneficiary. We scrape data from a variety of reports and downloads from the different systems via Excel, CSV, some VBA code, and even manual abstraction to put it into a set of MS Access databases. Where we're manually abstracting, that data is entered real-time or within days, everything else is monthly or quarterly updates (because of the effort involved in the ETL process), with a couple of annual ones where we're getting data from the state or Medicare. We don't plan on getting more real-time until we move to Epic/Lawson/etc.

The data includes encounters, diagnoses, procedures, and several hundred measures as either raw data or aggregate (core measures, registry measures, falls, physician certifications, infection prevention measures, pediatric immunization rates, lots of performance metrics for nursing and per specialty for providers, etc.). The data comes out via queries or live-built Access views that are then exported into Excel for building charts, Tableau for more advanced analysis & dashboards, and several different reports, including Ongoing Physician Performance Evaluation (OPPE) and Physican Compensation (bonuses that depend on meeting quality metrics).

MS Access is definitely stumbling with our largest database of encounter information (which has a few million records total across several tables), and anytime we have to move lots of data from the file servers to the machines things can get really slow, so mostly what I'll do is build queries in Access to use as datasources for extracts. Since I'm using extracts generally I'll do all the joins in the query and avoid using blends unless there's some real advantage, like this one that I blogged about: http://drawingwithnumbers.artisart.org/tableau-data-blending-sparse-data-multiple-levels-of-granularity-and-improvements-in-version-8/. Another advantage to doing the heavy lifting in the query is that I have fewer steps to document.

I'm definitely running into the issue Mark mentioned about the overhead of having different queries for different analyses, and the overhead of trying to maintain common calculated field formulas across different workbooks. I've been trying out using published datasources and keep on running into oddities where something works in Tableau Desktop with a live connection that then fails when moving to an extract and/or fails when the datasource is published. I don't think that area of Tableau has been exercised enough since it was introduced in v7.

On the positive side, a couple of things I've noticed in working with v8 is that it can be incredibly faster in building extracts, one that took 8 minutes to do every month in v7 is down to a minute, and the table calculation improvements are also quite helpful as they often 2x to 10x faster.

re: OLAP: I haven't, I've been so busy with other projects that I haven't gone down that route yet. An issue I've had with Medisolv is that the cubes are built with specific views of the data, and very often I'm wanting/needing a different view that requires fields that aren't available in that particular cube.

For OPPE, it's all home-built and each service line/specialty has chosen their metrics and targets. Also, because of the volume of information (rates, numerators, denominators, timeframes, etc. for 1yr and 2yr measures, also showing group comparisons) we're using the dashboard oil light metaphor and the KPIs only come up when performance is poor. We use a couple of targets for the KPIs, the 2yr group mean and a chosen target. Here are the KPIs:

- 2yr performance worse than 3 standard deviations from 2yr group mean is grounds for loss of credentialling, definitely a performance improvement plan is initiated. This one gets a red stoplight on the report, all the other KPIs get grey down arrows.

- 1yr performance worse than 2yr group mean

- 2yr performance worse than 2yr group mean

- 1yr performance worse than chosen target for the measure.

The group mean is "everyone in the system who has that measure" for almost every measure, there are only a few risk-adjusted measures that are broken down by specialty. At one point we also had national benchmarks, we've given up on them in favor of targets because they don't seem to add much value in improving quality. It's a lot more clear when we're just comparing a provider's performance to their peer group and the system target.

The chosen targets are very often 100% (e.g. core measures) or 0% (sentinel events, etc.). A lot of the outpatient registry measures (cardiovascular health, diabetes, etc.) will use the NCQA benchmarks, or a higher one if our group performance is exceeding the benchmark. If we need to look further for a benchmark, the general rule is to be at 90th %ile of the national mean, for example, our HCAHPS and CGCAHPS targets are 90th %ile. Measures that are new (like the IMM core measures) will use a national or state mean as a target, once those get established and more data is available then the target will be made higher. There are also a number of binary pass/fail measures (meaningful use certification, etc.).

We use Comparion Analytics (formerly the Delta Group) for our Core Measures and provider profiling, they do the work to identify expected & actual complications, readmissions, and mortality and then calculate the rate. Last year I worked out their math so we could take the data and accurately recalculate it in different ways, but we still depend on them for the initial calcs.