vrijdag 29 mei 2015

Power Query

Introduction

Powerquery is an Excel add-in that can be used for examining data, transforming the data and combine data from multiple sources. Powerquery is the ETL tool for Excel. You can extract data from multiple sources like Oracle, DB2, Hadoop, online locations and many other sources. You can transform the data, such as conversions, splitting columns, add columns, refine and merging. PowerQuery helps you to connect to sources accross your organization or public data sources.

Power query is one of the Excel Add-ins family and you have to download and install it and use this together with Excel 2013. I'm using version 2.22.4007.242. In this blogpost I'll investigate PowerQuery in to more detail. I've borrowed this example of the blogpost "Tutorial Introduction to Power Query".

If we investigate the Get External Data in more detail, we have the following options:

From the Web

From File

From Excel

From CSV

From XML

From Text

From Folder

From Database

From SQL Server Database

From Access Database

From SQL Server Analysis Database

From Oracle Database

From IBM DB2 Database

From MySQL Database

From PostgressSQL Database

From Sybase Database

From TerData Database

From Azure

From SQL Azure Database

FromAzure Marketplace

From Microsoft Azure HDInsight.

From Microsoft Azure Blob Storage.

From Microsoft Azure Table Sorage

From Other Sources

From Sgarepoint List

From Odata Feed

From Hadoop File (HDFS)

From Active Directory

From Microft Exchange

From Dynamics CRM online

From Facebook

From SAP BusinessObjects BI Universe

Fro Salesforce objcts

From SalesForce Objects

From ODBC

BlankQuery

Recent Sources

This is quite an extensive list.

Now for this blogpost I'll use the example of the following blogpost "Tutorial Introduction to PowerQuery". In this tutorial the Top 250 movies of the IMDB.com site is combined with the revenue of the top 50 Revenue movies on Wikipedia.

Load the Movie data from IMDB Top 250

1. Open an Empty WorkBook in Excel, Go the PowerQuery Tab and click on From Web.

2. Enter the URL and Press on OK.

3. Click on Table 0 and examine the data and press Load

4. The data is loaded in Excel and now we have to delete some columns and clean some stuff.

5. Click on the PowerQuery Tab and click on the Launch editor.

5. Remove some columns (Column0, rating and the last column). Right click on the column "Rank & Title" and press Split Column and By Delimiter.

The Results of this transformation looks like this screenshot:

7. Right click again on the column Rank& Title.2 and split the column again on the year by using the delimiter (and the option Split on the Right most delimiter. Then replace the ) by nothing. The result should look look like this.

8. Rename the columns to Rank, Title, Year and IMDB Rating and Trim the Title column. The resulting window should look like the screenshot as shown below.

On the right you see the steps I've taken to come to the results.This is very handy. If you take some steps that you may regret, just delete the step and the action is rolled back.

9. Press on Close& Load.

Load the Revenue data from Wikipdia

10. The next thing is to load the revenue per movie data from the Wiki page. Press From Web, Press Ok and press Load.

11. The data is loaded into Excel and it should like the screenshot below.

12. Press on Launch Editor and let's edit some stuff. I've renamed the two queries to IMDB Top250 and Revenue per movie. I've also deleted the last column

Merge the two datasets into one

13. Now press on Merge on the Toolbar, and configure the Merge operation as follows: select the IMDBTop250 and the Revenue per Movie, Click on the columns on how to merge the two columns and press OK.

14. Click on the button on the Column "NewColumn" and select "WorldWide gross" and press OK.

The final result should look like below.

The two datasets are combined and now you can visualise this in multiple ways.

Conclusion

Pretty impressive Self Service BI tooling of Microsoft. I've investigated PowerQuery for a limited time and I've just scratched the surface. The ease on investigating and combining data is really great. There are far more options of PowerQuery I've to investigate;-) IT seems that the PowerQuery language M is also integrated in SQL Server 2016 SSIS. Worthwhile looking into this...