Using Power BI to Report on Rich Text Data Fields in SharePoint

This post is the fourth in a series exploring Power BI and complex data types in SharePoint. It discusses working with using Power BI to report on rich text fields from SharePoint. The previous posts are:

A rich text field in SharePoint is a special instance of the multi-line column type which contains formatting attributes. The column becomes “rich” when either the “Rich text” or “Enhanced rich text” options are selected in the field’s definition.

The List

Consider the following list that contains a rich text field named “Description”:

This view displays the value of the rich text field and retains its formatting. Internally all of the formatting commands are stored as HTML, rendering is a simple task for SharePoint. However, None of the default Power BI visuals support HTML rendering. We are left then with two options. We must either retrieve the raw text from the rich text field, and lose the formatting, or find a visual that supports HTML rendering. Happily, both options are possible. As with any SharePoint data, we must first start with Power BI Desktop.

Loading the Data

We first launch Power BI Desktop, select “Get Data” and then choose SharePoint Online list (if connecting to SharePoint Online) or SharePoint List (if using SharePoint Server). We are then prompted for the URL of the SharePoint Site. The dialog is titled SharePoint lists, but the value is the URL of the site, NOT the list itself. Once this is entered we are prompted for credentials if we haven’t connected to this site before. After entering credentials, we can select the list that we want to report on. In our case, it’s named “Listings”. We select it, and then click on the Edit button.

Once the data loads in, one of the first things that you’ll notice is that there are a lot of columns to choose from, and it’s a good idea to remove the columns that you don’t need. We can do this by right clicking on the desired column titles and selecting “Remove”. In this case, we can remove the ContentTypeId column and everything to the right of it, with two important exceptions. We want to keep the “FieldValuesAsText” column, as we’ll be needing that to extract the text values.

Extracting Plain Text from Rich Text

One thing that you will notice right away is that he more simple column types like “Title” show their value directly in the Query editor. Rich text fields also show their values directly in the editor, but they include both the text and the html formatting commands. The rich text field in our example is named “Description”, and each entry begins with “<div class=…..”.

Given that none of the standard Power BI visuals support HTML rendering, this is clearly not what we need for our report.

We could perform a series of text substitutions to strip out all the HTML formatting from the column, but that process would be highly tedious, not to mention messy. Happily, The Power BI SharePoint connector can do this for us automatically through the FieldValuesAsText column.

First, we can remove the Description column altogether. Next, with our example in the Query Editor, we scroll right and select the expander icon for the “FieldValuesAsText” column. We then then deselect all available fields except the “Description” column. In addition, we want to deselect the “Use original column name as prefix” option.

We then select OK, and we once again have a Description column, but this time it is free of all HTML formatting tags.

At this point, we can click on “Close & Apply” in the ribbon, add a table to the design surface, and add a number of dimensions, including our “Description” field. It is displayed in the visual free of the HTML formatting.

Showing Rich Text in Full Fidelity

There may be cases where we want to use the rich text formatting, and not remove it. As mentioned above, that’s not possible using the out-of-box visuals. We are therefore left to find a custom visual to do this, and at the moment, there is only one such visual to the best of my knowledge. This is the HTML Viewer visual by Pragmatic Works, and its purpose is to do exactly as the name suggests.

To begin with, we need to start at the beginning of the previous section, prior to the removal of the Description column. In this case, we want to use the HTML codes, so the initial “Description” column is perfectly adequate as-is. All that we need to do is to select “Close and Apply” in the ribbon to load the data into the data model.

We must now get the HTML Viewer from the custom visuals marketplace. We click on the ellipsis in the visuals pane, select “marketplace” and then search for the visual using the search box. Once located, we select it, and click “Add”.

A new icon for the visual then appears in the gallery.

The HTML viewer displays and renders the values as a list. Only one dimension is allowed, so it is not possible to use it as a replacement for a table. The best way to use this visual is to make it the target of a selection. For example, to see the description of our listings, we can add a slicer on the page using “Title” and the HTML Viewer using “Description”.

While it is limited, it is possible to render rich text fields from SharePoint in full fidelity. However, if only the text is necessary (as is likely the case for reporting), Power BI gives us a rich set of tools to make this process relatively painless.

11 thoughts on “Using Power BI to Report on Rich Text Data Fields in SharePoint”

“In this case, we can remove the ContentTypeId column and everything to the right of it, with two important exceptions. We want to keep the “FieldValuesAsText” column, as we’ll be needing that to extract the text values.”