Power BI is constantly evolving – there’s a new version of Power BI Desktop every month, and the Power BI service is updated every week. Many of the new capabilities in Power BI represent gradual refinements, but some are significant enough to make you rethink how you your organization uses Power BI.

Power BI dataflows and the new shared and certified datasets[1] fall into the latter category. Both of these capabilities enable sharing data across workspace boundaries. When building a data model in Power BI Desktop you can connect to entities from dataflows in multiple workspaces, and publish the dataset you create into a different workspace altogether. With shared datasets you can create reports and dashboards in one workspace using a dataset in another[2].

The ability to have a single data resource – dataflow or dataset – shared across workspaces is a significant change in how the Power BI service has traditionally worked. Before these new capabilities, each workspace was largely self-contained. Dashboards could only get data from a dataset in the same workspace, and the tables in the dataset each contained the queries that extracted, transformed, and loaded their data. This workspace-centric design encouraged[3] approaches where assets were grouped into workspaces because of the platform, and not because it was the best way to meet the business requirements.

Now that we’re no longer bound by these constraints, it’s time to start thinking about having workspaces in Power BI whose function is to contain data artifacts (dataflows and/or datasets) that are used by visualization artifacts (dashboards and reports) in other workspaces. It’s time to start thinking about approaches that may look something like this:

Please keep in mind these two things when looking at the diagram:

This is an arbitrary collection of boxes and arrows that illustrate a concept, and not a reference architecture.

I do not have any formal art training.

Partitioning workspaces in this way encourages reuse and can reduce redundancy. It can also help enable greater separation of duties during development and maintenance of Power BI solutions. If you have one team that is responsible for making data available, and another team that is responsible for visualizing and presenting that data to solve business problems[4], this approach can given each team a natural space for its work. Work space. Workspace. Yeah.

Many of the large enterprise customers I work with are already evaluating or adopting this approach. Like any big change it’s safer to approach this effort incrementally. The customers I’ve spoken to are planning to apply this pattern to new solutions before they think about retrofitting any existing solutions.

If you use dataflows with Power BI Premium, you probably use linked and computed entities. There’s an overview post here, and an example of how to use these tools for data profiling here, but in case you don’t want to click through[1], here’s a quick summary:

When adding entities to a dataflow, you use another dataflow as a data source

This adds linked entities to your new dataflow, which are basically pointers to the entities in the source dataflow

You then use these linked entities as building blocks for new entities, using union or merge or similar approaches

This approach is simple and powerful, but[2] it may not always give you exactly what you want. For example, what if you don’t want the users who have access to your new computed entities to also have access to the linked entities your new dataflow references?

Let’s take a look at what this looks like. I’m using the dataflow I build for that older post on data profiling as the starting point[3], so if you’re a regular reader this may look familiar.

This is a simple dataflow that contains three linked entities and three computed entities. The computed entities use Table.Profile to generate profiles for the data in the linked entities. When you connect to the dataflow using Power BI Desktop, it looks like this:

As you can see, all six entities are available to load into Power BI Desktop.

What if you only wanted users to be able to read the profiles, without also granting them access to the entities being profiled? Why do dataflows give access to both?

The answer is equally simple, and obvious once you see it:

As with other dataflow entities[4], the linked entities are enabled for load by default. Removing these entities from the dataflow is as simple as clearing this setting.

Once this option is cleared for the linked entities, the dataflow will look like this, with only the three computed entities being included:

And as desired, only these entities are accessible to users in Power BI Desktop:

Hopefully this quick tip is helpful. If this is something that has been making you wonder, please realize you’re in excellent company – you’re not the only one. And if you have other questions about using dataflows in Power BI, please don’t hesitate to ask!

[1] Don’t feel bad – I didn’t want to click through either, and wrote this summary mainly so I didn’t need to read through those older posts to see what I said last year.

[2] As I’ve recently learned by having multiple people ask me about this behavior.

Last week Microsoft held its annual Microsoft Business Applications Summit (MBAS) event in Atlanta. This two-day technical conference covers the whole Business Applications platform – including Dynamics, PowerApps, and Flow – and not just Power BI, but there was a ton of great Power BI content to be had. Now that the event is over, the session recordings and resources are available to everyone.

This session is probably my favorite dataflows session from any conference. This is a deep dive into the dataflows architecture, including the brand-new-in-preview compute engine for performance and scale.

Common Data Model sessions

As you know, Power BI dataflows build on CDM and CDM folders. As you probably know, CDM isn’t just about Power BI – it’s a major area of investment across Azure data services as well. The session lineup at MBAS reflected this importance with three dedicated CDM sessions.

This session covers how CDM and CDM folders are used in Power BI and Azure data services. If you’ve been following dataflows and CDM closely over the past six months much of this session might be review, but it’s an excellent “deep overview” nonetheless.

This session is probably the single best resource on CDM available today. The presenters are the key technical team behind CDM, and goes into details and concepts that aren’t available in any other presentation I’ve found. I’ve been following CDM pretty closely for the past year or more, and I learned a lot from this session. You probably will too.

[1] I have a list of a dozen or more sessions that I want to watch, and only a few of them are dataflows-centric. If you look through the catalog you’ll likely find some unexpected gems.

[2] If this is all you need to know, why do we have these other two sessions?

[3] Including Jeff Bernhardt, the architect behind CDM. Jeff doesn’t have the rock star reputation he deserves, but he’s been instrumental in the design and implementation of many of the products and services on which I’ve built my career. Any time Jeff is talking, I make a point to listen closely.

AUTOMATION & LIFE-CYCLE MANAGEMENT

Using the ‘Refresh now’ API, the limitation on the number of refreshes you can schedule per day is removed and instead an unlimited number of refreshes can be triggered for each dataset. Combining the refresh now API with incremental refresh, you can build a near real-time dataset that performs small updates of fresh data very often.

Note: The time of existing refresh is not expected to be shorter, so a new refresh of a dataset cannot start before the previous one finishes. Remember that your resource limitations do not change with the introduction of this API, so use these unlimited refreshes with caution and be careful not to overload your resources with unnecessary refreshes.

Although the blog post only explicitly mentions datasets, the same “as many refreshes as you want” capability applies to Power BI dataflows in workspaces assigned to dedicated (Power BI Embedded or Power BI Premium) capacity.

It’s important to note that this is an API-only feature[1]. If you’re setting up a refresh schedule via the UI, you’ll still see the same daily limits, but using the dataflows API you will now be able to have full control over the refresh schedule for your dataflows.

[1] This is by design, and is unlikely to change. A high-frequency refresh schedule can place a significant load on the capacity resources, and is a configuration that should only be made after careful consideration of the implications.

Last week I delivered two online sessions on the topic of integrating Power BI dataflows with an organizational Azure Data Lake Storage Gen2 storage account. I’ve blogged on this topic before (link | link | link | link) but sometimes a presentation and demo is worth a thousand words.

On April 30th I delivered a “Power BI dataflows 201 – beyond the basics” session for the PASS Business Intelligence Virtual Chapter. The session recording is online here, and you can download the slides here.

On May 4th I delivered a “Integrating Power BI and Azure Data Lake with dataflows and CDM Folders” session for the SQL Saturday community event in Stockholm, Sweden. I was originally planning to deliver the Stockholm session in person, but due to circumstances beyond my control[1] I ended up presenting remotely, which meant that I could more easily record the session. The session recording is online here, and you can download the slides here.

Each of these sessions covers much of the same material. The Stockholm presentation got off to a bit rocky start[2] but it gets smoother after the first few minutes.

Please feel free to use these slides for your own presentations if they’re useful. And please let me know if you have any questions!

[1] I forgot to book flights. Seriously, I thought I had booked flights in February when I committed to speaking, and by the time I realized that I had not booked them, they were way out of my budget. This was not my finest hour.

[2] The presentation was scheduled to start at 6:00 AM, so I got up at 4:00 and came into the office to review and prepare. Instead I spent the 90 minutes before the session start time fighting with PC issues and got everything working less than a minute before 6:00. I can’t remember ever coming in quite this hot…

Power BI dataflows have been available in public preview since November 2018. For almost five months, customers around the world have been kicking the tires, testing and providing feedback, and building production capabilities using dataflows.

When Microsoft published the latest Business Applications Release Notes, the “new and planned features” list included dataflows general availability with a target date of April 2019, which could typically mean anything before May 1st.

But… April has just arrived, and so has the dataflows GA!

The full details are on the official Power BI blog, so be sure to check it out. Also keep in mind that although dataflows are now generally available, some specific capabilities are still in preview.