Category: Business Intelligence

Power BI V2 workspaces recently (May 2019) entered into general availability. The biggest difference between a V1 and V2 Power BI workspace is the fact that a V2 workspace is not backed by an Office 365 group, and a V1 workspace is. One area that this change affects a great deal is the “Get data” experience in the Power BI service (browser). This post outlines the differences, and describes the configuration options.

Data connections to files stored in SharePoint and OneDrive have certain unique characteristics when they are created in the browser. For example, these connections are automatically refreshed hourly unless that option is disabled.

V1 workspaces automatically offer the connection to the Documents library in the underlying SharePoint site. V2 workspaces do not automatically offer this option, as there is no underpinning SharePoint site. However, any V2 workspace can be connected to any Modern SharePoint site, and in this way, the option is more flexible. For the sake of clarity, a Modern SharePoint site is one that is backed by an Office 365 Group, and has an email address.

Let’s explore the 4 possible experiences when using “Get Data” and then choosing “Files” in the Power BI service. There are 4 possible experiences, depending on the type and configuration of the workspace;

Personal workspace

V1 workspace

V2 workspace not connected to a site

V2 workspace connected to a site

In each example below, the options are reached by selecting “Get Data” and then choosing “Files”. The type of files that can be imported are CSV, Excel, PBIX (Power BI Desktop files) and RDL (paginated reports).

Personal workspace

The personal workspace is the only workspace available using the free Power BI license. It is not connected to any SharePoint sites, and provides 4 options for importing.

“Local File” can be used for importing files from a local file store. Files imported in this manner are not automatically refreshed, and without the use of a gateway, cannot be. This option is available for every workspace type and will not be discussed further. “Learn about importing files” is a simple help link, likewise available to all workspace types.

OneDrive – Business connects to the currently logged in user’s OneDrive for Business storage. This is the OneDrive that is associated with “School or Organization” account which is stored in Azure Active Directory.

OneDrive – Personal connects to a user’s personal, or consumer OneDrive account. This is the type of OneDrive that is accessed using a “personal” account (otherwise known as a Microsoft account, or MSA). The personal workspace is the only type of workspace that allows a connection to personal OneDrive content.

SharePoint – Team Sites allows files stored in any SharePoint Online library to be loaded. Files stored in SharePoint on-premises can be loaded into Power BI, but only through Power BI Desktop. This method is online only.

Data imported in this fashion will be updated hourly with the exception of “Local File”. This will also be true of any OneDrive or SharePoint source referenced below.

V1 workspace

A Power BI V1 workspace is connected to an Office 365 Group, and therefore backed by a SharePoint site. This is reflected in the Files experience in the service.

Here we see 3 import options. Local File, SharePoint – Team Sites, and “Learn about..” are exactly the same as with personal workspaces. However, both OneDrive options from there are unavailable. The “OneDrive – XXXX” option is different, and bears some explanation.

In the image above, “Demos” is the name of the V1 workspace. Selecting this option will open the SharePoint library named “Documents” in the SharePoint site that is associated with this workspace and Office 365 group.

In my opinion, this option is poorly named, which leads to confusion. This container truly has nothing to do with OneDrive – it is a SharePoint library. We already have enough different “OneDrives” to keep track of, but I digress.

V2 workspace (not connected to a site)

The V2 workspace is not associated with a SharePoint site, and therefore, there is no Documents library to connect to. The option is instead replaced with the ability to connect to the user’s OneDrive for Business (OneDrive – Business) storage, as in the personal workspace. In essence, this experience is identical to the personal workspace experience minus the ability to connect to personal OneDrives.

V2 workspace (connected to a site)

Although a V2 workspace is not inherently connected to a SharePoint site, it can be manually connected to one. This restores the capability missing from V1 workspaces, while being more flexible. The workspace is no longer bound to a specific site, but can be configured to work with any Modern SharePoint site. In addition, the same site can be bound to multiple workspaces.

The “Modern” distinction above is important. The SharePoint site itself must be backed by an Office 365 group, as that is how it is identified in Power BI.

Associating a workspace with a SharePoint site

With V2 workspaces, site connection is now a property of the workspace. To edit workspace properties, select either the workspace settings button in the ribbon, or the ellipsis beside the workspace in the workspace list.

The connection setting is in the advanced section, and is identified as the “Workspace OneDrive”.

The important thing to note here is that you do NOT enter the URL of the SharePoint site in this field. This field is expecting the address of it in email format (ie demos@xxxx.com). All Modern Sharepoint sites are bound to an Office 365 group, and the email address is the address of that group.

Get Data – File options for a V2 connected site

Once connected, the “Get Data” – “File” options will be much the same as with an unconnected workspace, but with the “OneDrive – SiteName” option added.

I still take exception with the name presented above, in my opinion it should be “Site – SiteName” or “SharePoint – SiteName site” and use a SharePoint option. However, once connected files in the connected site can be imported easily into the Power BI service.

Usage

It is important to understand what the connected site is used for in Power BI. Connecting a site allows for files stored in a SharePoint library to be either imported into the service (all supported file types), or connected to (Excel files). This feature does NOT allow Power BI content to be stored in a SharePoint library

Power BI datasets and dataflows are the two native data sources for Power BI reports. Connecting to a datasets allows a report to be built against an existing Power BI dataset in place, and dataflows represent a source of data that has had transformations applied to it. When connecting to Power BI dataflows, data is imported, into a data model but the connection to a Power BI dataset is a direct connection.

The
two sources handle identity in drastically different ways, and this can lead to
confusion when dealing with multiple accounts and tenants. This post is an
attempt to help clarify this confusion

Connecting
to a dataset

To connect to a Power BI dataset, select the “Get Data” button from the ribbon, select the “Power BI” tab, select Power BI dataset, and finally the “Connect” button.

Next,
if a user is signed into Power BI Desktop a list of workspaces and their
datasets are presented. If not, the user is prompted to sign in. The dataset
can then be connected.

The important thing to notice here is the list of workspaces itself. The list presented is a list of the workspaces available in the tenant belonging to the currently signed in user. It is the same list of workspaces that can be chosen as a publishing target. It should also be noted that the identity of the user is displayed in the upper right corner of the dialog box, and the identity can be changed directly from there.

Connecting
to a dataset in a different tenant

The signed in user can be changed by selecting “Sign in” at the upper right of the Power BI Desktop client, or within the connection dialog itself. If the user signs into a different account (in a different tenant), a different list of workspaces and datasets will then be offered. The dataset source is hard linked to the currently signed in user. In this way, the Power BI dataset source behaves differently than all other data sources, which maintain connection credentials separately.

Connecting
to a Dataflow

Connecting to a dataflow follows the same steps as a dataset, with the exception that the “Power BI dataflows” option is chosen.

At this point, a Power BI data connection dialog will be shown.

There is only one authentication option
because Power BI dataflows only support one authentication option.

Unlike datasets, dataflows are NOT linked to the currently signed in user. The connection is authenticated, not the current user. The “Sign In” button must be selected, and authentication completed to connect to a Power BI dataflow.

Once signed in, selecting the “Connect” button will display a list of workspaces that contain dataflows. Expanding the workspace and then the dataflow will expose a list of entities that can be imported into the Power BI data model.

The connection information for the dataflow is cached with Power BI Desktop, and subsequent connections to dataflows will not require the user to sign in. The same authentication credentials will be used.

it should be noted that unlike the dataset connection dialog, this one does not show the current credentials and does not allow those credentials to be changed. This makes the process of changing credentials to use dataflows in multiple tenant somewhat less than intuitive.

Connecting to a dataflow in a different tenant

With datasets, changing the currently
signed in user will result in a different set of datasets being presented when
the dataset option is chosen. This is different with dataflows. No matter what
user is currently logged in, the cached credentials will be used.

This behaviour can be confusing when
multiple tenants need to be accessed. With most other data sources, the cached
credentials are linked with the specific data source. For example, when two
different SQL databases are connected, Power BI caches two different sets of
credentials.

To connect to dataflows in a different
tenant, the current connection information needs to be cleared. This can be
done with any data source, but it is particularly important to dataflows as it
is the only way to switch connection credentials.

To clear the credentials for the dataset, select “File”, “Options and Settings” and the “Data Source Settings”. The Data source settings dialog will then be presented.

Unlike most other data sources which can have multiple entries in the list, one for each unique data source, there will only be one source for dataflows. It is named “Power BI dataflows. For example, if the current instance of Power BI Desktop has authenticated to 3 different SQL servers, there will be three SQL Server connections in this list, but there will only be one for dataflows, no matter how many tenants that have been connected.

To switch tenants, the current credentials must be either cleared, or edited. The cached credentials can be fully removed by selecting “Clear Permissions” or they can be changed by selecting “Edit Permissions”. If cleared, the user will be prompted for credentials the next time the dataflow option is selected. If edited, the new credentials will be stored.

Conclusions,
and recommendations

It is possible to work with multiple
tenants for both connected datasets and dataflows. However, the methods for
doing so are completely different for either option. This can obviously lead to
some confusion.

It is my opinion that this behaviour should
be changed, and that the behaviour or connected datasets is the more intuitive.
If the credentials for the currently logged in user were user for both types of
connection, it would be much more intuitive, and also easier to user for report
designers.

If you are new to Power BI, or if you’ve worked with Power
BI Desktop, you’re familiar with the concept of refreshing data. By default,
Power BI caches data which needs to be refreshed on a periodic basis. Reports
that use Direct Query datasets do not need to have their caches refreshed, but to
see data changes, the report pages themselves need to be refreshed. If the
requirement is to have visuals on screen refreshed without any user
intervention at all, it is necessary to use a streaming dataset.

Unlike regular datasets, data is not “pulled” into a streaming dataset, rather it is “pushed” in through the Power BI API, Microsoft Flow, Azure Stream Analytics, or third party services such as PubNub. This article aims to explore the various ways of working with streaming datasets.

Creating and populating a streaming dataset

Streaming datasets are created directly in the Power BI
service itself or through the Power BI API. Unlike with other dataset types,
there is no schema to read in from an external source.

To create a streaming dataset, choose “create” from a workspace menu in the upper right, and then select “Streaming dataset”.

Select the type of the streaming dataset. For both API and flow, choose the API option. then select “Next”.

Streaming datasets contain only a small subset of the data types supported by regular datasets. These types are Text, Number and DateTime. Give the dataset a name and then create all the necessary fields.

It is important to choose the dataset correctly, as there is no opportunity to transform fields into different types within reports or dashboards.

As fields are added, the JSON definition of the dataset is available. This can be copied and used by the data source that is pushing the data into the dataset. Note that Microsoft Flow can read the schema directly, so that copying is not necessary

In order to use the techniques outlined below, it is critical to turn on the “Historic Data Analysis” switch. This switch changes the dataset from a streaming dataset to a push dataset.

With a streaming dataset, data is stored in a cache long enough to display in a dashboard tile and it expires very quickly. a push dataset retains the data permanently up to a limit of 15 MB. In order to create more complex visuals, a report must be created, and a report requires a push dataset.

A push dataset is identified as “Hybrid” in the Power BI dataset list.

The most important option to select in the definition of the
dataset is “retain historical data” If this option is not selected dashboard
tiles will be able to display current data, but will not be able to display it over
any significant time period. Data will be loaded into the dashboard cache for
use with dashboard tiles, but when the cache expires, so doe the data. In order
to use reports of any kind with a streaming dataset, this option must be
selected.

Once created, data can be added through the API, Microsoft Flow,
Stream Analytics or PubNub.

It should be noted that data stored in this way will only be available to Power BI, and only until the limits are reached. As the dataset fills, the oldest data will drop off. If there is any requirement to analyze the data over any significant amount of time, it is highly recommended that it also be stored in another location.

Adding a tile

Dashboard tiles can be created directly by Opening a dashboard and selecting “Add tile” from the ribbon menu. Select the Real-Time Data tile, and then select the dataset to use.

Tiles created in this way are limited to several visual types. These types are:

Card

Line chart

Clustered bar chart

Clustered column chart

Gauge

There are a limited number of configuration options available to these tile, depending on the tile type.Tiles created in this way will display data from the point of creation forward, according to the settings for the visual itself. These values will update in real time as data is added to the dataset, with no user intervention or refreshing required.

In order to display different types of visuals, or to use customize them beyond what is available directly in the dashboard it is necessary to create a report.

Adding a report in the service

Once created, the streaming dataset will appear in the service like any other. As with other datasets, selecting it from the dataset menu will open a new report canvas that can be saved. The report canvas in the service allows any of the Power BI visuals to be used with the streaming dataset.

Visuals on a report do not update automatically as data is pushed into the dataset, but these visuals can be pinned to a dashboard. Once pinned, the dashboard til will update automatically, so in this way, practically any visual can be added to a dashboard and updated in real time. All that is necessary is first create a report.

Creating a report in the service allows full fidelity access to the report canvas and all of the available visual types, but it does not allow for any editing of the data model. If things like calclated measures and columns are needed, it is necessary to create a report using Power BI Desktop.

Adding a report with Power BI Desktop

Power BI Desktop is able to connect directly to datasets in the Power BI service and push datasets are no exception. To connect to a streaming dataset (or any other), select the “Get Data” button, select “More”, then select the Power BI tab. Finally, select the Power BI dataset option, then select “Connect”

Next, select the workspace that contains the real-time dataset, and select the dataset itself. Selecting Load will establish a connection between the report and the dataset,

Because the report uses a direct connection to the dataset in the service, there is not data transformation opportunity and Power Query cannot be used. Additionally, several DAX functions are not available. For example, most of the functions on the “Modeling” tab are unavailable. It is also not possible to create calculated columns, but calculated measures can be created. Using Power BI Desktop, some relatively complex visuals can be created.

Once the report is published to the service, the visuals can be pinned to a dashboard, and once pinned, they will update automatically in real time.

Purging data

From time to time, it may be necessary to purge the data from the push dataset to reset the dashboard. To do this, the dataset can be temporarily changed from “push” mode to “streaming” mode. This will purge the stored data. Setting it back to “push” will start storing the data again.

To change the mode of the dataset, select the “Datasets” tab from the workspace menu, and then select the “edit” icon for the database that is to be changed.

The option that changes the mode it “Historic data analysis”. Switching it off changes it to a streaming dataset, and switching it on changes it to a push dataset.

At first, it may seem that visualizing real-time data in Power BI is quite limited due to the limited nature of tiles creating in dashboards. However, by using push datasets along with Power BI Desktop built reports allows for relatively complex visuals to be viewed in real time.

Microsoft data connect is a new technology for extracting data in bulkj from the Microsoft Graph. This article outlines how this data can be transformed with Databricks, and loaded into Power BI dataflows.

Microsoft
Graph data connect (GDC) is a connector technology that allows an
organization to extract data in bulk from the Microsoft Graph. Using Azure Data
Factory, extraction jobs can be scheduled that can securely extract Graph
data while respecting an organization’s data control policies. On a scheduled
basis, GDC stages the data behind the scenes, and stores it in an Azure storage
account. The storage can either be Azure Blob storage, Azure Data lake Gen 1,
or Azure Data Lake Gen 2. This article describes a procedure to process the
output from GDC and store it in a Power BI dataflow.

Details on how to configure GDC can be found here,
and an excellent video tutorial here.

Azure Data Lake Gen 2 Storage

Azure
Data Lake Gen 2 (ADLG2) brings a hierarchical namespace to Azure Blob
storage. This storage system is designed for big data analytics and is highly
cost effective. It is one of the three data sink (destination) options for GDC,
and it is the required storage system for the “bring your own”, or external storage
option of Power BI Dataflows. Given that n ADLG2 account is required for the
Power BI Dataflows, it is logical to use the same account as the GDC data sink,
but it is not required.

In order to use an ADLG2 account for external storage with
Power BI dataflows, it must be in the same data center as the Power BI tenant.
The data center for a tenant can be determined by navigating to the Power BI
web application, selectin the “?” icon in the upper right, and then selecting
“About Power BI”.

In order to be able to use an external storage account for
Power BI dataflows, it MUST be created in the data center listed in “Your data
is stored in”.

Connecting Power BI to ADLGen2 Storage

When a dataflow is created in Power BI, it is stored in an
ADLG2 storage system managed by Microsoft. If Power BI is the only platform
that will access the data, this is perfectly adequate, but an organization may
wish to use the data with other tools. If this is the case, a Power BI tenant
can be connected to an ADLG2 account that is accessible to other tools. A
workspace administrator can then decide to have all the dataflows in that
workspace store their data in the custom storage account. These are known as
“external dataflows”. Dataflows are all stored in Common Data Model (CDM)
folders which are described in detail here.

Detailed instructions on configuring external dataflow storage for Power BI can be found here . The process consists of several steps. It should be noted that as of this writing, external dataflows are in preview, and these steps could change.

If one does not already exist, create an ADLG2
account in the same tenant as Power BI

In Azure, Grant the Reader role to the Power BI
service identity for the account in #1

Create a file system for Power BI. The file
system MUST be named “powerbi”

Using Azure Storage Explorer, grant file system
access to three Power BI service principals, Power BI Premium, Power BI
Service, and Power Query Online (see the above link for details)

As of this writing, step #5 above is irreversible. Care
should be taken with its name.

Once configured, a workspace administrator can assign their
workspace to their external storage. This setting is a property of the
workspace, and can be accessed via its settings with the “Storage” tab.

Once this setting has been enabled, all dataflows will be
stored in external storage. A folder is created within the file system created
in step #3 above with the name of the workspace. Each dataflow in the workspace
will be added within that folder, and each entity of the dataflow as a folder
of its own. The dataflow folder will contain a file named model.json which
describes the entities, and the entity folders contain multiple csv files which
house the data itself. Within Azure Data Explorer, the structure appears as
below.

Azure Data Lake Gen 2 account (connected to the Power BI tenant)

File system created for Power BI dataflows (always named powerbi)

Workspace folder

Dataflow folder

JSON file describing the dataflow

Entity folder containing entity data

Once configured, Power Query Online (part of the process of
creating a dataflow) can be used to acquire and transform data. The data will
be stored in these folders according to the Common Data Model specification and
can be accessed by other applications. However, the reverse of this is also
true. Any CDM folder that is stored in the Power BI connected file system can
be connected to Power BI as an external dataflow. The process for doing this is
described here.
The order of operations is important. The user that will make the connection
needs to be granted access to the CDM folder before it is populated with data.

An external dataflow is read only with respect to Power BI
(Power BI only sees the data; it does not transform it). The goal is therefore
to transform the data created by Graph data connect into the CDM format. Azure
Databricks provides support for doing so.

Azure Databricks

Azure Databrick is a suite of serverless big data
technologies that encompass Hadoop, Apache Spark, SQL, Python and Scala
technologies. Databricks clusters can be created and used when needed and
discarded or suspended when not as needed. A discussion of how to create and
use Databricks is beyond the scope of this post, but there is a great deal of
documentation on it here. In
addition, Microsoft provides a free 14-day trial of Azure Databricks.

Databricks is particularly useful in this scenario, as it
has libraries that support Azure Data Lake Gen 2, and libraries that support
the Common Data Model. Databricks notebooks can be called from Azure Data
Factory, so that when a GDC extraction job is completed, the resulting files
can be processed with Databricks to populate the CDM folders.

An excellent tutorial on using Databricks with dataflows and
CDM folders can be found on GitHub here.
The scenario in the tutorial involves using dataflows to produce data instead
of consuming it, but it does cover off several important concepts. The tutorial
is part of the project that includes the CDM
library for Databricks which is used to transform GDC data into CDM
folders.

As of this writing, the CDM library requires a Databricks 4.3.x-scala2.11
cluster. This is an older configuration that is not available to the standard user
interface when creating a Databricks cluster. Subsequent versions of the CDM
library will most likely support newer clusters, but at present, it is
necessary to take a few additional steps during cluster creation.

From the cluster creation UI, specify window.prefs.set(“enableCustomSparkVersions”,
true) in the browser debug console, and then navigate to the cluster page, and
specify the image tag below. Refresh the browser and then
4.3.x-scala2.11 will be listed as a custom version.

Once a cluster has been created, and the CDM library loaded
into it, a notebook can be created to process the GDC data. Processing consists
of four main steps. Connecting Databricks to ADLG2, Reading the JSON files from
GDC, extracting the desired data into dataframes, and writing the data out to
CDM folders.

Connecting Databricks to ADLG2

The recommended way to connect Databricks to ADLG2 storage
is through a Service Principal. The same principal that GDC itself uses can be
used, and if the same ADLG2 account is being used, no further configuration is
necessary.

Databricks will need to read from the file system that
houses the GDC data. Several lines of code (Python) in a Databricks notebook
will establish the required connections:

Once connected, files in the GDC folder can be listed using
the built in dbutils library:

dbutils.fs.ls(filesystem + “/GDCFolderName”)

While the above and below examples shows account names and
keys being explicitly defined in the notebook, this is not recommended beyond
any testing or demonstration environments. Instead, it is recommended to store
such secure strings in Azure Key Vault and retrieve them at runtime. For
instructions on how this is done, see the document Secret
Scopes.

Reading JSON Files

Databricks can read all JSON files in a folder (as well as other text-based formats) into a dataframe. A dataframe is an in-memory table that can be hierarchical and queried via standard SQL commands. The schema of the dataframe will be implied through the structure of the JSON files contained within. To load all of the GDC JSON files from a particular folder into a dataframe, the following line of Python can be used:

contactbasedf = spark.read.json(filesystem + “/Contacts Folder”)

The read is recursive, which means that subfolders are
interrogated as well. GDC folders typically contains a metadata folder with
files of differing schemas than the data files themselves. For this reason, it
is a good idea to move the data files to a dedicated folder before reading them
into a dataframe. This can be done with the dbutils.fs.mv command.

Extracting the desired data

Once the files have been read into a dataframe, the dataframe can be saved to a temporary table. This table can be queried through standard SQL commands. For example, the query creates a temporary table from the initial dataframe (contactbasedf) that was created by reading JSON files created by GDC for organizational contacts. The relevant details are then queried and saved into another dataframe, named df1 in this case.

Writing to CDM folders

Once the CDM libraries are loaded into a Databrick cluster, writing data to them is a relatively simple method call from a dataframe. The call itself requires several parameters, and those parameters are:

cdmModelName – The name of the Model (dataflow) that houses all entities

entity – the name of the entity within the dataflow (a dataflow can contain multiple entities or tables)

cdmFolder – The folder in ADLG2 to save the model.

appId – The service principal ID of an application with Blob Contributor access to the ADLG2 account

appKey – The secret key for the appId specified above

tenantId – The tenant ID for the ADLG2 account

Using the dataframe defined above, the contents of the dataframe can be written out to the CDM folder with the following (Python) code:

The above code will output the contents of the dataframe to an entity named “Contacts” in a model named “AllContacts” stored in a folder named “AllContacts” within the workspace folder specified in the “Workspace” variable.

Creating an external Dataflow

Once the GDC data has been written to a CDM folder, it can
be connected to Power BI as an external dataflow. In order to do so, as
mentioned above, the user making the connection must have explicit access to
the model folders.

From a Power BI V2 workspace (V1 workspaces are not
supported), go to the dataflows tab, and select Create – Dataflow from the
toolbar. If Power BI has been connected to the ADLG2 storage, and the workspace
has been configured for external storage, the “Attach a Common Data Model
folder” option should appear.

Selecting “Create and attach” brings up the Attach Common
Data Model folder dialog box, where two items must be entered.

The Name of the dataflow is the name with respect to Power
BI. It can be completely different than the name of the model folder, or the
internal name of the model created above, but it’s likely a good idea to keep
it consistent. The CDM folder path is actually the absolute path to the
model.json file that describes the model, and it’s vital that model.json be
included at the end of the path. Failing to do so will result in an error.

Finishing Up

Once completed, Power
BI Desktop can be used to connect to the external dataflow, just like any other
dataflow. The only difference is that external dataflows are not refreshed in
the Power BI service, but will be updated by Databricks. The same Azure Data
Factory jobs that extract data from Graph data connect can be used to call into
the Databricks notebooks when the data has been extracted.

If you are interested in a product that leverages the data
produced by Graph data connect, I would be remiss if I did not suggest our
tyGraph for Exchange, which is currently in preview. It combines all of the
technology listed above with a rich set of reports in concert with other Office
365 workloads. If you are interested, please contact me directly, or email sales@tygraph.com .

At the end of 2018, SharePoint received something that we haven’t seen for a long time – a new column type, Location. Location columns will look up an address and geocode it as it is being entered in a form. It will also separate all the constituent parts of the address as well as the latitude and longitude into separate display only columns. These columns are used primarily in views but can also be used in reports. Given that I put together a series of posts recently on using Power BI to work with complex SharePoint report types, I was interested on how to report on this new column type. As it turns out, it is relatively straightforward.

This post will delve into the nuances involved with reporting on this new SharePoint Location column in Power BI..

At the end of 2018, SharePoint received something that we haven’t seen for a long time – a new column type, Location. Location columns will look up an address and geocode it as it is being entered in a form. It will also separate all the constituent parts of the address as well as the latitude and longitude into separate display only columns. These columns are used primarily in views but can also be used in reports. Given that I put together a series of posts recently on using Power BI to work with complex SharePoint report types, I was interested on how to report on this new column type. As it turns out, it is relatively straightforward.

This post will delve into the nuances involved with reporting on this new SharePoint Location column in Power BI..

The Location Column

To begin with, the Location column is a “modern” SharePoint column. This means that it can be added to a list via the Add column button in the list view, but NOT through the list settings page as other column types are.

List view creation

List settings creation

If the Add column does not appear for you, you may be using a “classic” SharePoint site, or you may be using one or more column types that are not supported in “modern” which causes a classic view to be used. Removing these columns from the view is often enough to light up the add button.

Once created, you will have the option to add any or all of the address components to the view. These are display elements only and will be available to reports (or other views) whether or not they are added to the view at creation time.

Once created, entering data is as simple as typing in an address, or the name of a location into the column. The typeahead feature will attempt to find the location and fill in the details.

Once selected, the full address will be filled in, and all the constituent address properties will be populated. If they are on the view, the list can be sorted, filtered, etc. by these elements.

Reporting on the Location Column

Internally, the location is saved as a BLOB of JSON content within a column. When the column itself is used in the view, its friendly display format is displayed. When constituent items are displayed (City for example) their values are extracted from the column and displayed as discrete elements. For other SharePoint column types, this can provide complications, but the developers of the location feature seem to have had reporting in mind when it was built. Consider the following list that contains a Location column named Location:

Loading the Data

We first launch Power BI Desktop, select “Get Data” and then choose SharePoint Online list. We are then prompted for the URL of the SharePoint Site. The dialog is titled SharePoint lists, but the value is the URL of the site, NOT the list itself. Once this is entered, we are prompted for credentials if we haven’t connected to this site before. After entering credentials, we can select the list that we want to report on. In our case, it’s named Properties. We select it, and then click on the Edit button.

Once the data loads in, one of the first things that you’ll notice is that there are a lot of columns to choose from, and it’s a good idea to remove the columns that you don’t need. We can do this by right clicking on the desired column titles and selecting Remove.

Using FieldValuesAsText

With all other complex SharePoint column types, the FieldValuesAsText column will retrieve the textual representation of required column values. This is the way that the column value appears in a view. However, it appears that the Location column type is an exception to this rule. When the Location column is used, the JSON value itself is returned, which renders FieldValuesAsText relatively useless. THis value is also available using the Location column value itself. The steps for extracting FiedValuesAsText are covered in previous posts in this series. Given that ultimately this will not be a good approach for the Location column, we won’t go into it further here.

Field value and value extracted from FieldValuesAsText

Using DispName

The text value of the location column is instead available through the derived DispName column.

With Power BI, it is possible to transform the JSON data contained in the original column, or the extracted FieldValuesAsText column. All of the extracted properties are available through more efficient means. The FieldValuesAsText column can therefore be ignored for the purposes of reporting on Location columns. In addition, in most cases, the original column (Locations in this case) can be removed, and the DispName column should be renamed in its place.

This behaviour is inconsistent with the behaviour of other complex SharePoint fields. It does not affect capability, but in the interests of consistency, my strong suggestion would be for the SharePoint team to eliminate the DispName field, and leverage FieldValuesAsText for the text conversion in the data feed.

Using Location Components

All the text components of the location column are separated out automatically as columns in Power Query. They can be used as any other column, and no additional action is necessary.

Automatically extracted location components

Using GeoLoc

Power BI will automatically geocode data at the time the report is rendered. The text components can therefore be used by the reporting engine to place data on a map. However, geocoding is a relatively computationally expensive operation, especially if there is a lot of data, or poor internet connection. In addition, some visuals may require the use of specific latitude and longitude co-ordinates. These co-ordinates are available through the GeoLoc column if they are needed, but they do need to be extracted.

Within Power Query, locate the GeoLoc column, and click on the Expand icon in the right of the column header.

Select both the Latitude and Longitude columns and deselect Use original column name as prefix. In my testing, both Altitude and Measure do not return any meaningful data, so they can be safely ignored, however this could change in the future.

At this point, we are almost ready to do some reporting. Once all the required columns have been shaped, and their data types set, select the Close and Apply button from the ribbon.

Reporting

Before using the location data on a map, it is important to categorize each of the components so that Power BI knows how to use it on a map. To categorize a data field, select it from the fields list. Then select the Modeling tab from the ribbon click the Data Category dropdown.

The category for most of the fields is obvious, but below is a table of recommended choices. In addition, both the longitude and latitude fields need to be set to the Decimal Number type.

Field

Category

City

City

CountryOrRegion

Country/Region

Latitude

Latitude

Location

Place

Longitude

Longitude

PostalCode

Postal Code

State

State or Province

Street

Address

Once categorized, the data can be placed on a map according to any desired parameters. In this can, the below shows a map of listings colour coded by the asking price range.

The resulting report can then be published to the Power BI service, and then embedded into a SharePoint page through either the Power BI web part, or secure embedding if so desired.