Building our report on the opioid crisis for the Gartner bakeoff

Introduction

You might have read in our previous blog post that Microsoft wa invited to take part in the Gartner Data and Analytics summit’s “Bakeoff” competition. We wanted to give you all a feel for the demo we presented at the bakeoff and describe some of the steps we went through in preparing the data and report for it.

First up, here’s a video showing the different parts of the solution we created:

You can interact with the report and download the .pbix file from the Data Stories Gallery here! Now, onto some of the steps for building the report.

Data preparation and modeling

Gartner provided the data we were to use for the demo. It came from a variety of sources (CDC, Medicare and Medicaid programs, and Census bureau) and in a variety of formats (CSV, Excel, etc.). Understanding, cleaning and prepping the data model was, as is so often the case in BI projects, the bulk of the work!

Drug overdose data

The drug overdoses information was in a common structure, where indicators (measures) were all included in one column and distinguished by the values of another column. Typically, Power BI data should have columns for each metric or measure you want to analyze. This is easy to rearrange in Power BI using the ‘Pivot’ transformation:

You’ll also notice that not every state reports the detail of which drugs were involved. This meant that in some visuals we could only show the overall values across all states, or pick certain drug types only for some states.

The drug overdoses information was also, unusually, reported for a 12-month rolling period. To help report on this we created a ‘LastDayOfMonth’ column based on the year and month columns. This made it easy to baseline everything at a low grain and use the built-in date hierarchies and time intelligence functions that Power BI provides.

Once loaded, we also created a measure that ensured we’d only report the 12-month-ending value and not aggregate over any time periods:

This sums the Drug Overdose Deaths column filtered to just the last date for whichever period is being viewed in the visual. The SUM() is required because it operates on the [Drug Overdose Deaths] column but should always only include one date anyway.

At the visual layer, we added this LastDayOfMonth hierarchy to the columns

One of the advantages of this DAX formula is that it returns only the last value from the date hierarchy but allows aggregation over other attributes, e.g. State. The DOD value is correct even when drilling down through the date hierarchy, but the total row is calculated correctly at the bottom of the table, and other aggregations such as Region still work correctly.

The ISFILTERED function here ensures that the measure is being filtered by the date hierarchy, creates a variable (__PREV_YEAR) for the drug overdose deaths a year before the current value, and divides the difference by the previous value to give percentage change. It’s a common pattern and a good example of using DAX’s VAR feature, which allows variables to be defined and reused across a calculation to aid readability.

Census data

The Census data presented another interesting structure:

The different values (population estimate and deaths) are over multiple columns this time, one for each year. To join this with other data we needed ‘Year’ to appear in a column of its own. Note that the population estimates are only available for 2016 and 2017, but deaths are available for 2015, too. To extract the year we applied the Unpivot transformation to the columns and used Split by Number of Characters to extract the year. This is the result:

And then pivoted the data:

This left some rows with blanks for the population estimate, which we managed through the reporting layer.

De-normalizing and joining data

We built several de-normalized tables to help with joining data from the various healthcare sources and the Bing News and Twitter data. This helped eliminate some unmatched rows such as the “YC” and “US” values in the Overdose table that correspond to New York City and the overall US, meaning we wouldn’t be double counting in any totals. In the Labelers table, we also removed the variants of “Inc.”, “LLC” etc. to ease readability in the report.

This let us use the semantic model by creating relationships across the tables rather than having to merge the queries together.

We also added some ‘helper’ tables that we used in slicers and demographic data for the Census and state representatives. With more time we had wanted to further de-normalize some of the facts out into other tables and set up relationships on more fields into the Bing and Twitter data. There were still several ‘dirty’ elements in the data such as products that only seemed valid for certain time periods, and labelers that seemed to have changed name at some point in time. We'd have needed more time to fully understand the structure of the data for these to really know how best to model it.

When you set one up, it takes you through a set of configuration steps that set up Azure SQL DB, Azure machine learning, Power BI Desktop files and other components. For the Bing News and Twitter templates, the resulting .PBIX file contained all the information we needed for the report pages. Out of the box, the report looked like this:

The bottom right hand visuals already show some of the drug names and news articles we wanted to match up with the structured data. These were the only visuals we needed for the final bakeoff report.

Integrating this data with my existing PBIX file was easy – in the Query Editor we could select the queries and choose Copy:

And then paste them into the Query Editor in the other file. This copies all the dependent queries (which includes parameters for the database and server name). We had to then create a mapping table and relationships to the other data and add the right visuals to the report.

With more time we would have used more of the Sentiment Scoring capabilities in other parts of the report and fed more of the specific drug manufacturer and prescriber names into the Bing News template to pull articles that referred to them.

You can read more about data preparation and modelling in our documentation:

Visualizations and reports

Theme and layout

Beautiful layout and styling of a report is important to consider and easy to achieve in Power BI. Once we’d settled on a color scheme and rough layout for the report pages, we used 2 key features to apply them: themes and shapes.

The theme determines various styling options across visuals in your report. We used it to set the color of data points, the styling of the title of visuals, and other formatting properties such as data labels. We chose colors based on the Power BI and Gartner brands (yellow-gold and dark blue) that also happened to work well together. We chose a few shades of each and some greys to use as secondary colors. This is good best practice – pick some muted tones and one or two accent colors to highlight key information. The theme file itself then contains the default colors to use along with the various other format options:

We also made extensive use of shapes on the canvas to create the blocks of color to separate the navigation area, titles and main report area. If you enable the Selection Pane you’ll see all the different elements on the pages. Some have quite a few shapes there! We just used the out-of-the-box line shape but gave the line itself 0pt weight – all we wanted was the background of the shape to block out areas:

State details report page tooltip

We were fortunate with the timing of our March release to include the report page tooltip feature that we used on the maps in my report and creating these is simple.

You create a page in your report to use as the tooltip; you can even use the pre-defined ‘Tooltip’ size to make something that’s a common size. We chose a Custom size as we wanted a little more space to fit the callouts and bar chart together:

Then you set the page to be available as a tooltip in the Page Information card (you can see this in the image above). Finally, add the field that you want to link it on to the Tooltip area of the filter pane – in this case, State:

Natural language query with Q&A

Part of the script Gartner provided was to build two or three visuals that had been used in the initial dashboard. We started here, and tried out the specific questions through Q&A. These were the results basically without any further tuning! All we did was name the columns in the data appropriately and tag the state field as being geographic data.

You’ll also see that Q&A prompts you as you type with the fields and values that match your partial word. Ohio was identified as a state in several columns ([State], [State Name], [Entity Value], [User Location]) and as a partial match to the Prescriber name.

For demo purposes we left the Q&A model there, but we could have improved it by adding synonyms, normalizing some of the repeated state fields, etc.

Navigation and Storytelling

Navigation bookmarks

We used Power BI’s bookmarks feature to produce the callout and navigation controls on the left of the reports:

Bookmarks let you save the visibility state of any elements on your report and can be bound to a shape. Look at the Selection pane and you’ll see, for example:

The selected shapes, textboxes and callouts correspond to a particular bookmark.

There are lots of items in the selection pane because we’ve layered a few visuals together for each of the two navigation ‘tabs’. There is a background shape, the text (both selected and deselected), and a transparent clickable object placed at the top of the Z order that is used to link to the bookmark. For example the ‘(0) BM1 – Click’ shape is just a transparent shape. e found the Link card in the formatting pane and set it to direct to the appropriate bookmark:

Storytelling with bookmarks

Once you’ve uncovered some insights and created a report, you’ll want to share and communicate with others. In this report we found some surprising data points about one labeler, Purdue Pharmaceutical. The data show that Purdue had the highest reimbursement amount through the Medicaid program by a significant amount. From investigating the Bing News articles that mention Purdue Pharmaceutical we saw that they were being sued by two states for mis-marketing the capabilities of OxyContin, their oxycodone-based painkiller. We added a textbox and two lines to call out those two states as they are significant outliers in the chart:

They only showed up as outliers when cross-filtering on Purdue’s data, so we stored that as a bookmark too, and used the Selection pane to control the visibility of the textbox and lines for the last bookmark.

Real-time Twitter dashboard

We wanted to include some real time data in the demo and pulling tweets in to a dashboard is an easy way to monitor this for any keywords.

You first need to create a Streaming Dataset in Power BI to receive the tweets. From the workspace that you want to add the dataset to, click the Create button at the top right of the page and choose Streaming dataset:

On the pane that appears choose API and click Next.

Give your dataset a name and set up the values that you’ll capture. We used:

datetime (DateTime)

tweet (Text)

username (Text)

name (Text)

countOfTweets (Number)

retweets (Number)

isPowerBI (Number)

timeMinute (DateTime)

timeHour (DateTime)

location (Text)

Make sure you enable the Historic Data Analysis option too, so that a Power BI dataset is created for your data to be stored in.

We used Microsoft Flow to monitor Twitter and pump data into Power BI. Once you sign in to Flow, click My Flows and choose Create from blank:

You’ll want to search for the Twitter connector:

And choose the Twitter – When a new tweet is posted trigger. You’ll have to enter search terms you want to monitor (in my case we used ‘opioid crisis’). Then you add an Action and choose the Power BI – Add rows to a dataset action:

Here’s what mine looked like:

We used some custom expressions for some fields to see if the tweet included the #PowerBI hashtag, and to extract certain parts of the time.

indexOf(triggerBody()?['TweetText'],'#PowerBI')

addMinutes(startOfHour(utcNow()), int(formatDateTime(utcNow(),'mm')))

startOfHour(utcNow())

Save your flow and you should see tweets start to flow through to Power BI. You can then use all the normal report creation tools to build visuals and pin them to your dashboard.

If you want to write any further measures against the model, you can use Power BI Desktop. Connect to the model in the Power BI service and use the Report Measures feature to add the calculation at the report layer. For example, we created: