Month: December 2018

I’m on vacation for the holidays, and plan to spend much of the next few weeks in the kitchen. This will result in more posts about food and fewer posts about Power BI. You’ve been warned.

This is my favorite cheese sauce. The recipe is adapted from one originally published by Modernist Cuisine, and made much better by the application of chilies.

Please note that the recipe uses proportions by weight, rather than specific amounts. You can easily scale the recipe up or down so long as the proportion is the same, but you will need an accurate kitchen scale for this one.

If you really love spicy food, you may be tempted to use habaneros or another spicier chili to start with. Don’t do that. Start with jalapenos and scale the heat up as appropriate by adding a few spicier chilies to the mix. Trust me on this one.

Alternately you can scale down the heat by replacing some of the liquid with water. Do what is right for you.

So long as you keep the same proportions, you can use any liquid and any cheese you want. The possibilities are limitless.

You can also use jalapeno juice in other recipes where you want to add flavor and heat. My other favorite application is in yeast breads, where you can replace the water with chili juice. I’ve made this recipe multiple times with great results using this technique.

This is my favorite caramel. The recipe is adapted from one originally published in Dessert Professional Magazine. As with any caramel you want to be careful with this one – don’t take your eyes off hot caramel!

Ingredients

123 grams glucose syrup

821 grams granulated sugar

10 grams Fleur de Sel sea salt

657 grams unsalted butter

388 grams heavy cream

Procedure

Line a 9×13″ baking/casserole dish with parchment paper. Have the prepared dish ready by the stove.

Combine the glucose, sugar and salt in a large pot over medium-high heat and cook until it reaches 293 F, stirring frequently.

Meanwhile, combine the butter and cream in a medium pot and bring to a boil.

Once the sugar mixture reaches 293 F, slowly and carefully pour the cream and butter into the pot with the sugar, whisking constantly.

Bring the mixture back up to 122 C / 252 F, stirring constantly.

Pour carefully into the prepared dish.

Cool the caramel to room temperature, 3-4 hours or overnight.

Cut the cooled caramels into squares.

Storage

Store in an airtight container at room temperature.

Vacuum seal and freeze indefinitely.

Applications

Eat them as is

Enrobe with tempered chocolate, and top with a few flakes of sea salt

Bake peanut butter cookies with a small thumbprint depression in their tops, and fill the depression with a generous lump of caramel

I have a theory that everyone has a gift. Not a gift that they have been given, but a gift that they can give to others.

Your gift is something you’re good at, and which enables you to expend minimal effort to achieve disproportionally large benefit in return.

If you’re an IT professional, you’ve probably encountered situations where you could spend a few minutes configuring a loved one’s hardware or software. For you, this was a trivial task, done in minutes or seconds without any real effort, but for your father or aunt it would have been a stressful ordeal, with no guarantee of success.

Regardless of the context, there’s something that you’re better at. Cooking, baking, drawing, painting, photography, singing, rhyming, writing, carpentry, plumbing… something. As you read this list, hopefully something leaped to mind. For me, this list include a few things where I have serious skills, and a bunch of things where I feel hopelessly challenged[1].

Once you recognize and acknowledge your gift, you just need to keep your eyes open for opportunities to give it. Each day, look for the places where you could invest a few minutes to spare someone a few hours. Look for the chance to invest an hour to save someone days or weeks. And once you identify the opportunities, choose which of them to act upon. It’s not your responsibility to solve every problem you can, but your gift is only valuable when you share it.

What is your gift?

Who will you share it with?

[1] One day I’ll encounter a situation where someone desperately needs a skilled swordsman. But it is not likely to be this day.

Not every data source that is supported in Power BI Desktop is supported today in dataflows, but these data sources should work[1]:

SAP Business Warehouse

SAP HANA

Azure Analysis Services

Google Analytics

Adobe Analytics

ODBC

OLE DB

Folder

SharePoint Online folder

SharePoint folder

Hadoop HDFS

Azure HDInsight (HDFS)

Hadoop file HDFS

Informix (beta)

For any of these data sources, you should be able to build a query in Power BI Desktop, copy it and paste it into Power Query Online and have it work, even though the data sources are not yet listed in the UX.

Let’s look at an example using the Folder source, both because the Folder data source is not yet supported in the Power Query Online user experience, and because of how to relates to yesterday’s post on using custom functions[2]. We’ll begin by choosing a folder that contains our data files.

Choosing the Folder data source

Entering the folder path

Reviewing the files in the folder

Once we’ve connected to the folder, we’ll edit the query to load the data from all CSV files that the folder contains, and then combine the contents of the files.

Filtering to only include CSV files

Ready to go!

Once we’re done, Power Query in Power BI Desktop will give us this:

There are three points worth mentioning in this image:

The data itself – this is probably what grabs your attention, because it’s big and data

The set of queries built by Power BI Desktop includes multiple queries and a custom function

The main query loaded into the PBIX data model references the other query and the custom function

Now let’s take it all from Power BI Desktop and use it in a dataflow. As covered in the post on authoring Power BI dataflows in Power BI Desktop, right-clicking on the query and choosing “Copy” will copy all of the script we need. Like this:

We can then paste each of these queries into the “Blank query” template in Power Query Online. Power Query Online is smart enough to automatically prompt for a gateway and credentials. The only edit we’ll need to make is to remove the one line that’s highlighted in red[3], and then… it just works.

There’s certainly a little irony involved in using the phrase “it just works” after going through all of these steps… but it does work.

While Power BI dataflows and Power Query Online are in preview, there are likely to be some rough edges. But because dataflows are built on the same Power Query function language and connectivity platform, which means that in many scenarios you can use dataflows for tasks that are not yet fully supported in the interface.

During preview there are still functions and data sources that aren’t yet supported in dataflows, but this technique can unlock data sources and capabilities that aren’t yet exposed through the UI. Check it out.

You can use custom Power Query “M” functions in Power BI dataflows, even though they’re not exposed and supported in the preview Power Query Online editor to the same extent they are supported in Power BI Desktop.[1]

As mentioned in a recent post on Authoring Power BI Dataflows in Power BI Desktop, the Power Query “M” queries that define your dataflow entities can contain a lot more than what can be created in Power Query Online. One example of this is support for custom functions in a dataflow. Functions work the same way in dataflows as they work in Power Query Desktop – there’s just not the same UX support.

Let’s see how this works. Specifically, let’s build a dataflow that contains a custom function and which invokes it in one of the dataflow entities. Here’s what we’ll do:

We’ll pull in sales order data from the SalesOrderHeader table in the AdventureWorks sample database to be an “Order” entity in the dataflow.

We’ll use the min and max of the various date columns in the SalesOrderHeader table to get the parameter values to pass into the custom function. We’ll then call the custom function to build a Date entity in the dataflow.

We’ll close our eyes and imagine doing the rest of the work to load other entities in the dataflow to make what we’d need to build a full star schema in a Power BI dataset, but we won’t actually do the work.

Let’s go. Since we’re just copying the code from Matt‘s blog, we’ll skip the code here, but the result in Power Query Online is worth looking at.

Even though Power Query Online doesn’t have a dedicated “create function” option, it does recognize when a query is a function, and does include a familiar UX for working with a function. You will, however, need to clear the “Enable load” option for the query, since a function can’t be loaded directly.

The Order entity is super simple – we’re just pulling in a table and removing the complex columns that Power Query adds to represent related tables in the database. Here’s the script:

Now we need to put the two of them together. Let’s begin by duplicating the Order entity. If we referenced the Order entity instead of duplicating it, we would end up with a computed entity, which would require Power BI Premium capacity to refresh.

This is what the query looks like before we invoke the custom function. With all of the awesome options on the “Add Column” tab in Power BI Desktop, implementing this logic was surprisingly easy.

Most of the complexity in this approach is in the work required to get min and max values from three columns in a single table. The topic of the post – calling a custom function inside a dataflow entity definition – is trivial.

When we’re done, the list of entities only shows Order and Date, because these are the only two queries that are being loaded into the dataflow’s CDM folder storage. But the definition of the Date query includes the use of a custom function, which allows us to have rich and possibly complex functionality included in the dataflow code, and referenced by one or more entities as necessary.

[1] I was inspired to write this post when I saw this idea on ideas.powerbi.com. If this capability is obscure enough to get its own feature request and over a dozen votes, it probably justifies a blog post.

One key aspect of Power BI dataflows is that they store their data in CDM folders in Azure Data Lake Storage gen2.[1] When a dataflow is refreshed, the queries that define the dataflow entities are executed, and their results are stored in the underlying CDM folders in the data lake storage that’s managed by the Power BI service.

By default the Power BI service hides the details of the underlying storage. Only the Power BI service can write to the CDM folders, and only the Power BI service can read from them.

NARRATOR:

But Matthew knew that there are other options beyond the default…

Please note: At the time this post is published, the capabilities it describes are being rolled out to Power BI customers around the world. If you do not yet see these capabilities in your Power BI tenant, please understand that the deployment process may take several days to reach all regions.

In addition to writing to the data lake storage that is included with Power BI, you can also configure Power BI to write to an Azure Data Lake Storage gen2 resource in your own Azure subscription. This configuration opens up powerful capabilities for using data created in Power BI as the source for other Azure services. This means that data produced by analysts in a low-code/no-code Power BI experience can be used by data scientists in Azure Machine Learning, or by data engineers in Azure Data Factory or Azure Databricks.

Let that sink in for a minute, because it’s more important that it seemed when you just read it. Business data experts – the people who may not know professional data tools and advanced concepts in depth, but who are intimately involved with how the data is used to support business processes – can now use Power BI to produce data sets that can be easily used by data professionals in their tools of choice. This is a Big Deal. Not only does this capability deliver the power of Azure Data Lake Storage gen2 for scale and computing capability, it enables seamless collaboration between business and IT.

The challenge of operationalization/industrialization that has been part of self-service BI since self-service BI has been around has typically been solved by business handing off to IT the solution that they created. Ten years ago the artifact being handed off may have been an Excel workbook full of macros and VLOOKUP. IT would then need to reverse-engineer and re-implement the logic to reproduce it in a different tool and different language. Power Query and dataflows have made this story simpler – an analyst can develop a query that can be re-used directly by IT. But now an analyst can easily produce data that can be used – directly and seamlessly – by IT projects. Bam.

Before I move on, let me add a quick sanity check here. You can’t build a production data integration process on non-production data sources and expect it to deliver a stable and reliable solution, and that last paragraph glossed over this fact. When IT starts using a business-developed CDM folder as a data source, this needs to happen in the context of a managed process that eventually includes the ownership of the data source transitioning to IT. The integration of Power BI dataflows and CDM folders in Azure Data Lake Storage gen2 will make this process much simpler, but the process will still be essential.

Now let’s take a look at how this works.

I’m not going to go into details about the data lake configuration requirements here – but there are specific steps that need to be taken on the Azure side of things before Power BI can write to the lake. For information on setting up Azure Data Lake Storage gen2 to work with Power BI, check the documentation.

The details are in the documentation, but once the setup is complete, there will be a filesystem[2] named powerbi, and the Power BI service will be authorized to read it and write to it. As the Power BI service refreshes dataflows, it writes entity data in a folder structure that matches the content structure in Power BI. This approach – which has folders named after workspaces, dataflows, and entities, and files named after entities, makes it easier for all parties to understand what data is stored where, and how the file storage in the data lake relates to the the objects in Power BI.

To enable this feature, a Power BI administrator first needs to use the Power BI admin portal to connect Power BI to Azure Data Lake Storage gen2. This is a tenant-level setting. The administrator must enter the Subscription ID, the Resource Group ID, and the Storage Account name for the Azure Data Lake Storage gen2 resource that Power BI will use. The administrator needs to turn it on. In the admin portal there is an option labeled “Allow workspace admins to assign workspaces to this storage account.” Once this is turned on, we’re ready to go.

And of course, by “we” I mean ” workspace admins” and by “go” I mean “configure our workspaces storage settings.”

When creating a new app workspace, in the “Advanced” portion of the UI, you can see the “Dataflow storage (Preview)” option. When this option is enabled, any dataflow in the workspace will be created in the ADLSg2 resource configured by the Power BI admin, rather than in the default internal ADLSg2 storage that is managed by the Power BI service.

There are a few things worth mentioning about this screen shot:

This is not a Premium-only feature. Although the example above shows a workspace being created in dedicated Premium capacity, this is not required to use your own data lake storage account.

If no Power BI administrator has configured an organizational data lake storage account, this option will not be visible.

Apparently I need to go back and fix every blog post I’ve made up until now to replace “gen2” with “Gen2” because we’re using an upper-case G now.

There are a few limitations mentioned in the screen shot, and a few that aren’t, that are worth pointing out as well:

You can’t change this setting for a workspace that already has dataflows in it. This option is always available when creating a new workspace, and will also be available in existing workspaces without dataflows, but if you have defined dataflows in a workspace you cannot change its storage location.

Permissions… get a little complicated.

…so let’s look at permissions a little[3].

When you’re using the default Power BI storage, the Power BI service manages data access through the workspace permissions. Power BI service is the only reader and the only writer for the underlying CDM folders, and the Power BI service controls any access to the data the CDM folders contain.

When you’re using your organization’s data lake resource, ADLSg2 manages data access through the ACLs set on the folders and files. The Power BI service will grant permissions to the dataflow creator, but any additional permissions must be manually set on the files and folders in ADLSg2[4]. This means that for any user to access the dataflow through Power BI or the CDM folder through ADLSg2, they need to be granted permissions on all files and folders in ADLSg2.

Between the ability to store dataflow data in your organization’s Azure Data Lake Storage gen2 resource, and the ability to attach external CDM folders as dataflows, Power BI now enables a wide range of collaboration scenarios

[1] This time I just copied the opening sentence from the last blog post. Since I was writing them at the same time, that was much easier.

I’m not the only one who’s been busy sharing news and content this weekend about the integration of Power BI dataflows and Azure data services. Check out these additional resources and share the news.

Power BI Blog: This is the main Power BI announcement for the availability of Power BI dataflows integration with Azure Data Lake Storage Gen2.

Azure SQL Data Warehouse Blog: This is the main Azure announcement for the new integration capabilities, with lots of links to additional information for data professionals.

End-to-end CDM Tutorial on GitHub: This is the big one! Microsoft has published an end to end tutorial that includes Azure Data Factory, Azure Databricks, Azure SQL Data Warehouse, Azure SQL Database, and Azure Machine Learning.

CDM Documentation for ADLSg2: This is the official documentation for the Common Data Model including the model.json metadata file created for Power BI dataflows.

If you’re as excited as I am about today’s announcements, you’ll want to take the time to read all of these posts and to work through the tutorial as well. And probably do a happy dance of some sort.