WEBVTT
NOTE
duration:"00:43:32.2900000"
language:en-us
NOTE Confidence: 0.825931012630463
00:00:00.050 --> 00:00:06.930
Buenos tardes this way, I'm very happy for being here.
NOTE Confidence: 0.878051102161407
00:00:07.890 --> 00:00:27.900
Ready for talk about power query about power BI and about some transformation that can save you our ETL in power BI. I want to thank the sequel, Bates Organization for the event itself want to things that sponsor.
NOTE Confidence: 0.894150912761688
00:00:28.690 --> 00:00:45.540
First, I want to thank all the volunteers for their help OK. I'm business in Italian can. Sultan Microsoft partner in power. BI Microsoft data platform. VPN lending learning trainer that's all about me.
NOTE Confidence: 0.924660503864288
00:00:46.530 --> 00:01:02.520
Going to the to the point we will today. We will talk briefly about the definition of ETL process. In general and ETL processing power BI Desktop in particular.
NOTE Confidence: 0.89591521024704
00:01:04.320 --> 00:01:22.340
Power BI ETL process goes through the 3 steps extraction transform women's data quality transformations and other data transformations and then loading data into the beautiful cellular data model OK.
NOTE Confidence: 0.911986827850342
00:01:23.550 --> 00:01:43.560
The ETL process is the first and definitely the most important layer of a business intelligence system. It's a hidden layer. The final user who looks like the beautiful's analytic analytics reports, and dashboards.
NOTE Confidence: 0.896899521350861
00:01:44.350 --> 00:02:04.360
See all this stuff we made inside our power query editor for obtaining data for transforming cleans in an for preparing all the stuff after loading into the the models ETL processes as?
NOTE Confidence: 0.883021891117096
00:02:05.150 --> 00:02:25.160
Implement as necessary for the success of power, BI Project. We could see that power that ETL step is the most important for deployed so that the success of power BI of.
NOTE Confidence: 0.891836106777191
00:02:25.950 --> 00:02:45.960
Display it depends on the good or the bad it'll OK. Let's talk about power BI power. We construct data from different sources from files folders database scripts cloud and many others.
NOTE Confidence: 0.913379907608032
00:02:46.750 --> 00:03:01.220
This dang transform doing some transformation for data quality filtering rows and columns formatting our data as a table and then.
NOTE Confidence: 0.885103940963745
00:03:02.160 --> 00:03:22.170
Combining queries merging queries doing a lot of tax for modeling purposes. Like Pivot on payment failed transpose and many others. And then we need to decide who each of the queries will be allowed to the model anguish not.
NOTE Confidence: 0.899959802627563
00:03:22.960 --> 00:03:40.330
We load our data to the model and then we need to fix the model. We need to check the relationships. We need to hide columns and we need to improve some our model, creating decks calculations and so on.
NOTE Confidence: 0.882000684738159
00:03:41.250 --> 00:04:01.260
After that, we have all this stuff necessary for creating publishing and sharing report. San dashboard using power. BI desktop power. BI server on power. BI mobile mobile OK. Let's go to.
NOTE Confidence: 0.936405718326569
00:04:06.810 --> 00:04:08.920
OK, we have here.
NOTE Confidence: 0.954305112361908
00:04:10.220 --> 00:04:11.840
Different sources.
NOTE Confidence: 0.88617742061615
00:04:13.950 --> 00:04:33.960
OK, we can hear different sources. Our purpose is to create a star schema tabular model using different sources, combining and performing several tax for 4.
NOTE Confidence: 0.871218979358673
00:04:34.970 --> 00:04:54.980
OK, the first one, will be this one sells here. I have here a power. BI desktop file empty total EMT OK. Let's obtain the first file ourselves are storing in a file with?
NOTE Confidence: 0.865435481071472
00:04:55.770 --> 00:05:15.780
Air PT we have now saying here for connecting to LPT but arpita file. We can connect using the text connector OK. Let's see here we can see. I'm here in the data source, but we are not able to CRPT.
NOTE Confidence: 0.87615293264389
00:05:16.570 --> 00:05:24.240
To select other files ancelet sales for the last year for 2018 OK.
NOTE Confidence: 0.892774641513824
00:05:27.160 --> 00:05:32.230
Power be, I will look at the data and here we have our navigator.
NOTE Confidence: 0.767207324504852
00:05:33.680 --> 00:05:34.310
Looks.
NOTE Confidence: 0.8838170170784
00:05:35.300 --> 00:05:55.310
Looks good OK, let's edit now we are inside the power query editor. We are in the correct place for making all the transformation. We need we need use first roads as heaters who we need to.
NOTE Confidence: 0.818403124809265
00:05:56.100 --> 00:06:00.920
Move the top rows the very first rose.
NOTE Confidence: 0.884224355220795
00:06:02.660 --> 00:06:09.310
Well, we can't see bad we need to remove also the 3 button rose.
NOTE Confidence: 0.911489069461823
00:06:10.770 --> 00:06:15.310
OK, I will write off the change type.
NOTE Confidence: 0.878773808479309
00:06:16.270 --> 00:06:31.520
Transformation here because I prefer doing again, OK that's all we have our data, it's clean here we.
NOTE Confidence: 0.895184278488159
00:06:33.380 --> 00:06:53.390
Half hour now it's OK, we have all our columns, OK great OK. We have sales. But my boss, said, OK, now. I want to see the information from the other years from 2017 because I want to have all the day.
NOTE Confidence: 0.805933892726898
00:06:54.180 --> 00:06:59.250
Just one table OK, let's look.
NOTE Confidence: 0.904402017593384
00:07:00.220 --> 00:07:16.140
At my data source here we have not data from 2016. But we have another folder. This file we could connect to hang and do all the transformations again and again bad.
NOTE Confidence: 0.849292576313019
00:07:18.000 --> 00:07:24.120
We are not Leslie, where you know, we are not lazy where efficient has said our.
NOTE Confidence: 0.916501462459564
00:07:25.100 --> 00:07:45.110
Patrick OK, so we could duplicate that means copy and paste duplicates becaus. The files are really, really very similar. We can duplicate here. We have just for transformation. But imagine that we have 20 or something like this we are able to do.
NOTE Confidence: 0.873447239398956
00:07:45.900 --> 00:07:55.730
Then we have here the last year and just to select the color red.
NOTE Confidence: 0.782884061336517
00:07:57.220 --> 00:08:04.690
Bath from the our air PT file with data.
NOTE Confidence: 0.91160374879837
00:08:06.300 --> 00:08:26.310
That's perfect now all the transformations are down. It's easy. The only things I need to do here is to read off of this for columns. The files were similar, but not identical and now it's necessary to join them in just.
NOTE Confidence: 0.856490790843964
00:08:27.230 --> 00:08:37.580
Query here in power query. We have here the possibility of happened queries OK, you could see here app increases new.
NOTE Confidence: 0.856811761856079
00:08:39.550 --> 00:08:47.850
I've been queries as new and we will happen. Our both queries that's all in just one query.
NOTE Confidence: 0.871293365955353
00:08:49.610 --> 00:09:09.620
Our sales are here, OK that's OK, an now I prefer doing now. I could select all the columns going to transform and detect data type for each of my columns, OK cells is the.
NOTE Confidence: 0.917357802391052
00:09:10.440 --> 00:09:16.040
Now we don't need to learn to the model sells for each.
NOTE Confidence: 0.888727307319641
00:09:17.260 --> 00:09:21.730
Dear independently is not necessary so we are.
NOTE Confidence: 0.724517524242401
00:09:22.850 --> 00:09:23.800
Not enable
NOTE Confidence: 0.848341524600983
00:09:25.020 --> 00:09:29.710
The Lord just for sale, OK, let's connect to the other.
NOTE Confidence: 0.91911369562149
00:09:30.760 --> 00:09:38.320
Sources our customer has the data about our customers are storage in the SQL server database.
NOTE Confidence: 0.874616503715515
00:09:39.710 --> 00:09:41.390
We are connecting.
NOTE Confidence: 0.847498416900635
00:09:42.600 --> 00:09:49.410
I will import I will not use a direct query here will import.
NOTE Confidence: 0.840090453624725
00:09:50.370 --> 00:09:58.760
I will connect to adventure works sorry one more time DM customer and OK.
NOTE Confidence: 0.88055557012558
00:10:02.070 --> 00:10:09.170
OK, we have the columns from being customer, I will change the name OK.
NOTE Confidence: 0.833618581295013
00:10:10.620 --> 00:10:30.630
We need to choose which columns we need here, we need customer Kejora Ficino. We need birthday and fine last name, something yearly income for example, but the into.
NOTE Confidence: 0.871056795120239
00:10:31.420 --> 00:10:37.590
And here is power BI is able to see that.
NOTE Confidence: 0.893798887729645
00:10:39.510 --> 00:10:59.520
Because of the relationship between Jim customer and in geography. We are able to select the columns front in geography without create any specific query here OK. Let's Aladdin Geography, select select OK.
NOTE Confidence: 0.904830157756805
00:11:00.310 --> 00:11:20.320
Reading your fee now we have the columns from customer an one columns that contains a table for each rose that contains all the information about Dean Geography, but for each one of our customers. The only thing we need to do.
NOTE Confidence: 0.781341075897217
00:11:21.110 --> 00:11:22.530
To expand.
NOTE Confidence: 0.881992638111115
00:11:23.550 --> 00:11:30.230
Just for selecting the columns, we need in this case, city and country.
NOTE Confidence: 0.921923339366913
00:11:31.350 --> 00:11:34.010
And that's all here we have.
NOTE Confidence: 0.867033421993256
00:11:36.420 --> 00:11:56.430
2 columns from geography, add our to our customer table. That's all OK does not total because here we have birthday birthday is very, very bad column for tabular model.
NOTE Confidence: 0.884682595729828
00:11:57.220 --> 00:12:17.230
Animal gharrity high granularity very high granularity in this column and and also it not have sense to work with birthday probably we need the age for each person OK for working in this case, it's really easy with.
NOTE Confidence: 0.907764077186584
00:12:18.020 --> 00:12:32.640
Cause we have transformation for the table for any columns and then we have transformation for text column number columns and date and time columns here we have from date columns.
NOTE Confidence: 0.862914204597473
00:12:33.620 --> 00:12:53.630
An option age, they will returns the days the all of the days this person is alive is ass. Granular E as before, I know, but in this case, we have a type duration and from duration. We are able.
NOTE Confidence: 0.850178182125092
00:12:54.420 --> 00:12:56.050
Obtain total years.
NOTE Confidence: 0.862339973449707
00:12:57.240 --> 00:13:06.630
So now is better is ass granularity as before, still OK but now we will round.
NOTE Confidence: 0.888821661472321
00:13:07.760 --> 00:13:22.660
This data, becaus now we have a number. We will round of course, age round down every time down because of people want to have more ages, OK now it's pretty good.
NOTE Confidence: 0.899470567703247
00:13:23.820 --> 00:13:43.830
Data for a tabular model is not too big but normally we need to create several groups of ages orange for this purpose here in power query. We have the possibility of add call.
NOTE Confidence: 0.821499824523926
00:13:44.620 --> 00:13:50.020
Using the condition OK create the conditional columns here.
NOTE Confidence: 0.91352254152298
00:13:51.070 --> 00:13:57.870
This is OK, let's change the name here, we have the Age.
NOTE Confidence: 0.86389946937561
00:14:01.180 --> 00:14:07.610
Now create the conditional columns, conditional columns will be age.
NOTE Confidence: 0.875943243503571
00:14:08.620 --> 00:14:17.660
Ranch and then if HS for example, less than or equal to 50.
NOTE Confidence: 0.962970018386841
00:14:19.290 --> 00:14:23.040
This person is very young.
NOTE Confidence: 0.812023997306824
00:14:25.640 --> 00:14:28.980
Otherwise is young.
NOTE Confidence: 0.656558752059937
00:14:32.890 --> 00:14:33.690
Perfect.
NOTE Confidence: 0.844822943210602
00:14:35.190 --> 00:14:48.040
OK, now we have 2 values 2, Alesis Perfect, 23 formalises verified for the are perfect for the tabular model POV OK.
NOTE Confidence: 0.911354720592499
00:14:49.050 --> 00:14:59.330
You know, I will change here for text OK, you know it. It's good, but the problem is if we change.
NOTE Confidence: 0.911519587039948
00:15:00.680 --> 00:15:20.690
At anytime we need to change this number for any other number or suppose that we have several conditions here with different numbers is not very good idea to go to the.
NOTE Confidence: 0.911884486675262
00:15:21.480 --> 00:15:40.780
Transformation here and change and change all the time, no for this reason is really useful to work using parameters is very easy to create we could manage para meters create a new one this will be age.
NOTE Confidence: 0.891193270683289
00:15:41.690 --> 00:15:59.660
Level for example, LH level one in case we have different conditions OK. So type number and we could put here our 50 Grade 50 and now we need to change.
NOTE Confidence: 0.884481012821198
00:16:00.610 --> 00:16:02.340
The value in the condition.
NOTE Confidence: 0.871682167053223
00:16:04.290 --> 00:16:10.030
You could say here we can use a value a column or a parameter.
NOTE Confidence: 0.909843921661377
00:16:11.010 --> 00:16:11.980
And that's all.
NOTE Confidence: 0.875623524188995
00:16:13.080 --> 00:16:33.090
By now if we have person with its 4647 years old. It's very young. But if we change our mind and we said. Oh, no very young just till 45, so this person is not long.
NOTE Confidence: 0.875623047351837
00:16:33.880 --> 00:16:53.890
Help is just junk OK. That's also we could use para meters because it's very, very easy to use. It is honest really, really good for the maintenance our code OK. We have sales we have customers.
NOTE Confidence: 0.897358536720276
00:16:54.680 --> 00:17:14.690
We are looking to the product our product is in a fall are in a folder. We have 3 files here Excel files in a folder the bunnies. We are not going to do the same as we did before with cells because now we have.
NOTE Confidence: 0.883643567562103
00:17:15.480 --> 00:17:33.260
But the idea is that each day we will have 11 more and one more file one for each day so we need to select the folder and analyze the folder itself, so new source.
NOTE Confidence: 0.825022041797638
00:17:35.590 --> 00:17:38.680
Folder connect.
NOTE Confidence: 0.861039161682129
00:17:41.850 --> 00:17:43.900
Select OK.
NOTE Confidence: 0.920240998268127
00:17:45.060 --> 00:18:05.070
In this case, we have a pretty pretty clean folder is not the real life is just because of the time you need to check check check Check all the time the folder because of quality issues here, OK great, so we have 34.
NOTE Confidence: 0.898715138435364
00:18:05.860 --> 00:18:21.300
Is here and we will combine files you can see? We did before happened queries and now we will combine files this button here is for combining files.
NOTE Confidence: 0.877644598484039
00:18:24.710 --> 00:18:44.720
OK, perfect. The point here is we have 3 files 3 XL fights. This this structure is the same. But the problem is the table for the first one, is P mountain for the second. One is P road for the other is P another thing so.
NOTE Confidence: 0.890923023223877
00:18:45.510 --> 00:19:03.680
Not able to use this data here, the secret point here is that if we edit the folder. We can obtain the metadata from the excels and power by now, it's doing the Magic.
NOTE Confidence: 0.891165733337402
00:19:04.810 --> 00:19:20.140
It obtaining on all the information from those 3 files here we have 32 objects from each excel files. I will trade off of this.
NOTE Confidence: 0.831932306289673
00:19:21.270 --> 00:19:24.510
And also power VI creates.
NOTE Confidence: 0.828960537910461
00:19:25.470 --> 00:19:26.500
A parameter.
NOTE Confidence: 0.898931086063385
00:19:27.530 --> 00:19:31.270
A value for this parameter the first file.
NOTE Confidence: 0.790178596973419
00:19:32.320 --> 00:19:35.840
Hey it found and then.
NOTE Confidence: 0.931002914905548
00:19:37.270 --> 00:19:53.000
The place for making all the transformation before combining the files and this is a place. This is a beautiful place be cause by first we need to select just the table.
NOTE Confidence: 0.865746378898621
00:19:54.290 --> 00:20:00.620
Just a table not to objects just the table and then we could.
NOTE Confidence: 0.840020477771759
00:20:01.740 --> 00:20:21.750
Read off of hidden eat MKE time OK. That's good great now. If we come back to our product. We could see that we have just 3 elements here, not 6 now we have.
NOTE Confidence: 0.919390082359314
00:20:21.780 --> 00:20:29.820
Just 3 OK, we have another problem here, we have a requirement for our customer.
NOTE Confidence: 0.892712533473969
00:20:30.760 --> 00:20:34.690
That we need the this text.
NOTE Confidence: 0.869402945041656
00:20:36.020 --> 00:20:56.030
We need the words is here, OK, we could do using M code. But we are not able to write encode. Unfortunately, bad power. BI is so clever and we can add a column from example in this case from select.
NOTE Confidence: 0.86592710018158
00:20:56.820 --> 00:21:00.410
I've selected if I right here.
NOTE Confidence: 0.798827469348907
00:21:01.790 --> 00:21:02.720
Mount time.
NOTE Confidence: 0.830532908439636
00:21:05.870 --> 00:21:25.880
OK, we could see that power BI wrote create this pression I need OK, so I need the text after delimiter oh it's really.
NOTE Confidence: 0.776900112628937
00:21:27.140 --> 00:21:32.360
OK um this is for product line.
NOTE Confidence: 0.536173164844513
00:21:38.840 --> 00:21:42.150
Rose line OK.
NOTE Confidence: 0.841548562049866
00:21:43.390 --> 00:21:54.760
Great now we will have broad line for Montaine for Rotan for tourism, fantastic now.
NOTE Confidence: 0.935442090034485
00:21:55.670 --> 00:22:01.620
We are able to obtain the data itself.
NOTE Confidence: 0.88005256652832
00:22:03.370 --> 00:22:13.300
OK, select product key in this case product sub category for example, color and list price that's all.
NOTE Confidence: 0.897648215293884
00:22:15.960 --> 00:22:35.970
Here we have all the columns with selected from the first file from the Mount. I'm OK and then we will have all the columns for all the files. We could see here we have mountains. We have roads we have dorians here.
NOTE Confidence: 0.871419727802277
00:22:36.760 --> 00:22:45.940
All the columns, we need from all the files is magic OK now we could.
NOTE Confidence: 0.885344207286835
00:22:47.630 --> 00:22:58.470
Select the correct transform detect data type OK, the good news here is that if we change.
NOTE Confidence: 0.564834594726563
00:22:59.470 --> 00:23:00.460
Our.
NOTE Confidence: 0.857614934444427
00:23:02.500 --> 00:23:06.530
Our UM folder for example.
NOTE Confidence: 0.859392285346985
00:23:07.910 --> 00:23:16.330
We selected 2 more 5 files and we put here now we will find we will have 5.
NOTE Confidence: 0.891974866390228
00:23:17.330 --> 00:23:23.780
Differ in Excel files. They only the only thing we need to do for saving our ETL is re fridge.
NOTE Confidence: 0.887803077697754
00:23:24.950 --> 00:23:29.830
That's all now we will have here 5.
NOTE Confidence: 0.855044424533844
00:23:30.890 --> 00:23:39.050
Um product lines, depending on the all the values, we have in our files.
NOTE Confidence: 0.905664324760437
00:23:40.180 --> 00:23:40.940
Great.
NOTE Confidence: 0.901605725288391
00:23:42.830 --> 00:23:44.020
That's great.
NOTE Confidence: 0.884689092636108
00:23:46.520 --> 00:24:06.530
Now we need to extract some information about sub categories from and a word file power be. I just not able to connect directly to word file but power BI is able to paste the content that we have at the clipboard.
NOTE Confidence: 0.836592376232147
00:24:07.320 --> 00:24:25.420
Copy I'm copy this information from the word, and I'm best in past in here in the in the power query editor using this option here, Interdata OK, so inter data.
NOTE Confidence: 0.817738711833954
00:24:27.430 --> 00:24:28.550
Based.
NOTE Confidence: 0.863955140113831
00:24:29.560 --> 00:24:32.550
That's all sub categories.
NOTE Confidence: 0.881320893764496
00:24:39.010 --> 00:24:40.660
Fantastic.
NOTE Confidence: 0.747172296047211
00:24:41.830 --> 00:24:42.990
Great.
NOTE Confidence: 0.890613079071045
00:24:43.910 --> 00:25:03.920
That simple, now another challenge. We have because the information about the categories. We the information is storage on a compressed zip file again power queries not able to look at the same file at the compressed file back.
NOTE Confidence: 0.824915945529938
00:25:04.710 --> 00:25:06.100
The queries able.
NOTE Confidence: 0.84013295173645
00:25:08.610 --> 00:25:26.460
To ruin their our script, so we have our strip here very, very simple are dis able to unzip file and return a data frame.
NOTE Confidence: 0.881949245929718
00:25:27.710 --> 00:25:29.350
2 of power BI.
NOTE Confidence: 0.907336592674255
00:25:30.910 --> 00:25:34.630
Let's do it now we will copy this code.
NOTE Confidence: 0.922728538513184
00:25:37.190 --> 00:25:40.440
Going to our power BI news source.
NOTE Confidence: 0.899688005447388
00:25:41.910 --> 00:25:48.120
From all the other sources here and then our street.
NOTE Confidence: 0.853968977928162
00:25:49.650 --> 00:25:55.340
Connect just based the code nothing more will be.
NOTE Confidence: 0.833402335643768
00:25:56.410 --> 00:25:58.850
No similar will be necessary.
NOTE Confidence: 0.911595940589905
00:26:00.500 --> 00:26:07.090
OK, our code it's executing and we obtain the information about.
NOTE Confidence: 0.892530918121338
00:26:08.080 --> 00:26:09.940
The categories.
NOTE Confidence: 0.894139051437378
00:26:13.130 --> 00:26:26.230
The only thing we need to do is for the quality of the data is split using delimiter power BI realize which is the delimiter.
NOTE Confidence: 0.824397385120392
00:26:27.270 --> 00:26:34.340
This is where the ID category an just here category.
NOTE Confidence: 0.685847818851471
00:26:37.690 --> 00:26:38.380
OK.
NOTE Confidence: 0.649600207805634
00:26:39.500 --> 00:26:40.010
OK.
NOTE Confidence: 0.857806324958801
00:26:41.580 --> 00:26:52.010
That's all now we have bread loot. We have sub categories and we have categories but the point is Gwynn Meats and stars schema.
NOTE Confidence: 0.859508156776428
00:26:52.910 --> 00:27:12.920
If we uhm relat our sales to the product product to the so category subcategory to to category. We will have snowflake schema so that means we need to merge all 3 queries OK. Let's do it, we could merge.
NOTE Confidence: 0.829803586006165
00:27:13.710 --> 00:27:33.720
Creating from product to sub categories, OK now we need to select Proto Category Key from here, so category key from from here and then.
NOTE Confidence: 0.843354046344757
00:27:34.510 --> 00:27:54.520
Join condition and so on. We are doing something like V. Look up in Excel now. I like I change name likes here. We have product OK now we have all the columns from product table.
NOTE Confidence: 0.899185359477997
00:27:55.310 --> 00:28:15.320
We have a column with a table for each row depending on the values for the sub categories and we are in the same position as we were before when we were working with customer and Eurofix.
NOTE Confidence: 0.83489602804184
00:28:16.110 --> 00:28:27.260
Expand here category ID for connecting to category. Later, an sub category column because of the need and we could repeat.
NOTE Confidence: 0.791212379932404
00:28:28.270 --> 00:28:32.550
For product now to category OK.
NOTE Confidence: 0.777755558490753
00:28:33.640 --> 00:28:39.370
And Select Category ID ID cat.
NOTE Confidence: 0.932734549045563
00:28:40.680 --> 00:28:46.300
Again, a table that contains the information from the category table.
NOTE Confidence: 0.895326256752014
00:28:47.370 --> 00:29:06.590
OK, select category here and that's fine, so in just one query. We have the information about customer. Sorry about product about Subcut. Sub categories, Ann about categories. That's fine. I will change this name here.
NOTE Confidence: 0.872546136379242
00:29:08.090 --> 00:29:28.100
OK and we need to do some transformation here for cleansing our data by first. We don't need source name and name and we don't need product subcategory key and we don't need category ID.
NOTE Confidence: 0.457387924194336
00:29:29.040 --> 00:29:30.090
OK.
NOTE Confidence: 0.84976452589035
00:29:32.760 --> 00:29:36.700
We need to apply to our model as less.
NOTE Confidence: 0.832171618938446
00:29:37.700 --> 00:29:42.630
Ass possible great now we will.
NOTE Confidence: 0.887929916381836
00:29:44.060 --> 00:29:50.520
A blood this query product query. That means that we don't need to load.
NOTE Confidence: 0.877520382404327
00:29:52.260 --> 00:30:02.620
The product the original products we don't need to lower the sub categories. We don't need to lower this query too.
NOTE Confidence: 0.92375510931015
00:30:04.040 --> 00:30:09.900
By now we have sales we have customers and we have products.
NOTE Confidence: 0.879560530185699
00:30:10.850 --> 00:30:30.860
OK, just 3 queries that we need to love to the mother OK. We have another another source. Here is an XL file that contains information about promotion sales. The point here is this an Excel file. Let's connect to the.
NOTE Confidence: 0.878232479095459
00:30:30.880 --> 00:30:31.960
Excel file.
NOTE Confidence: 0.9346804022789
00:30:32.920 --> 00:30:39.310
But we have a challenge here because here we have the information.
NOTE Confidence: 0.845602810382843
00:30:40.650 --> 00:30:47.800
Um promotion OK here we have the information.
NOTE Confidence: 0.885993838310242
00:30:48.760 --> 00:30:54.250
About self a lot of columns here with cell information.
NOTE Confidence: 0.916511714458466
00:30:55.330 --> 00:31:15.340
For any reason for any reason we are not able to decrease the granularity for this table. We are not able to delete any columns because of the some business requirement. This query will be useful for any other.
NOTE Confidence: 0.519530892372131
00:31:16.130 --> 00:31:17.240
Is so?
NOTE Confidence: 0.876402378082275
00:31:18.250 --> 00:31:38.260
If we have a query than we are not able to change the transformation here that can say for ETL is to reference so we will maintain the first one, and we will ref.
NOTE Confidence: 0.838569164276123
00:31:39.660 --> 00:31:43.710
GS 4 day promotions.
NOTE Confidence: 0.490465492010117
00:31:50.290 --> 00:31:51.280
OK.
NOTE Confidence: 0.895416259765625
00:31:52.320 --> 00:32:12.330
Now, now we have another query. We are able to do any transformation. We need for example, promotion key English promotion name promotion. Taipan promotion category that's all that we need great of course we?
NOTE Confidence: 0.911481738090515
00:32:13.460 --> 00:32:22.460
Seeing that a lot of repeated values here, we have here a transformation for cleansing data remove duplicates.
NOTE Confidence: 0.873739778995514
00:32:24.490 --> 00:32:43.450
That's all this is our data. We will applaud that means this one query will not be necessary to overload OK. Let's change the earth can you see a lot of transformation OK? Let's
NOTE Confidence: 0.914978504180908
00:32:45.650 --> 00:32:48.630
Let's apply all the change here.
NOTE Confidence: 0.78915399312973
00:32:59.380 --> 00:33:09.310
OK and now let's look at list half a look at life demo OK.
NOTE Confidence: 0.877493679523468
00:33:10.290 --> 00:33:11.780
Well, beautiful star.
NOTE Confidence: 0.853958368301392
00:33:14.140 --> 00:33:34.150
Schema tabular model bad we have nothing about calendar date. We need our calendar table here OK is there is a lot of way to do this bat.
NOTE Confidence: 0.895945250988007
00:33:34.940 --> 00:33:39.180
Just for showing you here we have.
NOTE Confidence: 0.892693638801575
00:33:40.170 --> 00:33:53.770
A great and beautiful script from the Master of the universe and encode Chris Webb and we could use it because greet publish.
NOTE Confidence: 0.934851229190826
00:33:55.700 --> 00:33:57.170
He's really, really.
NOTE Confidence: 0.821725130081177
00:33:59.240 --> 00:34:08.510
Generals so we just can copy this code this is great and paste into?
NOTE Confidence: 0.871609330177307
00:34:09.440 --> 00:34:18.730
Our power query how we are doing so, we, we will create a new source OK.
NOTE Confidence: 0.876528024673462
00:34:19.970 --> 00:34:25.350
Other kind of data and we will use blank query.
NOTE Confidence: 0.771097004413605
00:34:26.540 --> 00:34:28.140
Connect.
NOTE Confidence: 0.903458118438721
00:34:31.350 --> 00:34:41.130
For blank query. It looks like it's blank, but is not blank as soon as we go through the advanced editor.
NOTE Confidence: 0.851822733879089
00:34:42.130 --> 00:34:47.600
There is some content we read off and just paste the?
NOTE Confidence: 0.685034394264221
00:34:48.640 --> 00:34:49.390
Query.
NOTE Confidence: 0.854775071144104
00:34:51.190 --> 00:34:59.090
OK, this is because of the copy and paste that's all OK here is the code.
NOTE Confidence: 0.884570002555847
00:35:00.100 --> 00:35:12.780
Donde and now we have a function writing by Greece Web for creating a calendar table.
NOTE Confidence: 0.902063310146332
00:35:13.860 --> 00:35:26.020
The only thing we need to do is to define the value is for the first for the start date and for the?
NOTE Confidence: 0.800605475902557
00:35:27.370 --> 00:35:32.060
And 8 OK, let's do it.
NOTE Confidence: 0.866471588611603
00:35:35.870 --> 00:35:42.990
Great and now just invoke the function with those parameters.
NOTE Confidence: 0.90904426574707
00:35:44.110 --> 00:35:45.100
And that's all.
NOTE Confidence: 0.924780607223511
00:35:47.830 --> 00:35:52.320
Fantastic the only thing we need to do is to.
NOTE Confidence: 0.899147093296051
00:35:55.810 --> 00:36:02.820
Select transformation detect data type it's really, really important to.
NOTE Confidence: 0.89971923828125
00:36:04.190 --> 00:36:10.850
Do this transformation just for selecting data type correct for each one now this is our calendar.
NOTE Confidence: 0.824990510940552
00:36:13.310 --> 00:36:19.390
Calendar table, OK, we could've lot, that I will apply.
NOTE Confidence: 0.909125149250031
00:36:20.560 --> 00:36:22.960
Let's let's.
NOTE Confidence: 0.844350516796112
00:36:24.080 --> 00:36:29.640
Going here well, let's going here and let's.
NOTE Confidence: 0.887316644191742
00:36:32.420 --> 00:36:52.430
Fix the relationship here between 2 tables and that's all now we really have our schema. Our star Schema Tabular Model Totaly prepare oh sorry.
NOTE Confidence: 0.907442808151245
00:36:53.220 --> 00:36:56.490
I want you to I want you to show you.
NOTE Confidence: 0.872998237609863
00:37:00.180 --> 00:37:12.230
They like Dick structure all the steps we did. We could see here in the view tab. We have query dependencies?
NOTE Confidence: 0.944851100444794
00:37:13.630 --> 00:37:18.200
If we look at the query dependencies we will see.
NOTE Confidence: 0.910834491252899
00:37:19.280 --> 00:37:32.620
That for our model, we have here a lot of things for example, if we are looking at product, we could see that.
NOTE Confidence: 0.876546680927277
00:37:34.170 --> 00:37:35.010
We have
NOTE Confidence: 0.742405712604523
00:37:35.990 --> 00:37:37.250
Here.
NOTE Confidence: 0.896081924438477
00:37:40.050 --> 00:37:43.350
We have here our folder.
NOTE Confidence: 0.923134982585907
00:37:44.730 --> 00:37:48.070
Sample file created by power BI.
NOTE Confidence: 0.886755347251892
00:37:49.140 --> 00:38:09.150
Parameter created an function created by power BI and then products. The query products, then we have also sub categories here just we copy and paste into the power BI and then categories.
NOTE Confidence: 0.863500535488129
00:38:10.140 --> 00:38:30.150
Our code you could say our code categories. An categories subcategories on product makes our product OK. That's that's fine and we can see that load is disabled for any of those, 3 and just enable for the.
NOTE Confidence: 0.931206285953522
00:38:31.730 --> 00:38:33.070
Here we have.
NOTE Confidence: 0.866538643836975
00:38:35.620 --> 00:38:55.630
Well, the sample files also this one, an for sales 2 files at the beginning 2 files to origins, then 2 queries independent here, we have now any link between those.
NOTE Confidence: 0.893713474273682
00:38:56.420 --> 00:39:16.430
Or together with the union and append queries using those 2, here we have customer inside the customer. We have information about geography. We needed, we don't need the 2 queries here because of the sequel.
NOTE Confidence: 0.874868273735046
00:39:17.220 --> 00:39:37.230
Yeah, origin. Becaus power query realized that this relationship between both and we have a para meter here for the age range, OK, then we have here the information the calendar table from?
NOTE Confidence: 0.894972443580627
00:39:38.020 --> 00:39:54.730
Function from Chris Webb function and for promotion. We have the original file. Then the original query and then the other query that we reference so here we have a link.
NOTE Confidence: 0.904422402381897
00:39:56.070 --> 00:40:05.900
So we looking at the query dependencies. We are able to know a lot from our ETL process.
NOTE Confidence: 0.930593609809875
00:40:06.880 --> 00:40:15.090
OK, here, we could create several different improvements.
NOTE Confidence: 0.912210464477539
00:40:16.270 --> 00:40:36.280
But I prefer just to open the final version here because we can grouped the queries. We can group the functions. We can create comments. We can improve the labels for all the apply change.
NOTE Confidence: 0.887494027614594
00:40:36.300 --> 00:40:49.640
For the apply step and so on in in this case for example, is the same. But just with more time and here we have.
NOTE Confidence: 0.876262784004211
00:40:53.560 --> 00:41:09.160
OK, the the information for our model, the para meters here. The power by combining functions and all the queries. We have in our ETL so.
NOTE Confidence: 0.869247794151306
00:41:10.210 --> 00:41:14.540
Come back to the queries to the slice.
NOTE Confidence: 0.759299218654633
00:41:15.530 --> 00:41:16.760
And.
NOTE Confidence: 0.864113032817841
00:41:17.930 --> 00:41:18.660
Here.
NOTE Confidence: 0.841966807842255
00:41:19.640 --> 00:41:21.020
We extract.
NOTE Confidence: 0.830381333827972
00:41:22.190 --> 00:41:25.760
Our data front 2 air PT files.
NOTE Confidence: 0.863978981971741
00:41:27.530 --> 00:41:45.950
Then we extract from folder products files for sales for a product from folder remember 3 files and then 2, more from clipboard from sub categories from word file.
NOTE Confidence: 0.867306351661682
00:41:46.960 --> 00:42:06.970
Dan are script for categories, OK, then Excel files for promotions and then SQL server tables for customer and geography. All together and blank query for data for calendar.
NOTE Confidence: 0.865429282188416
00:42:07.760 --> 00:42:27.770
From Chris Web OK. That means with threat information for several data source different and then we transfer. We did several transformation just for obtaining a good data model A good day to model we need.
NOTE Confidence: 0.891668558120728
00:42:28.740 --> 00:42:48.750
Create a good tabular data model and we obtain the queries for the model. The additional queries. There also important. But we we are not to loading them and some parameters some some functions and.
NOTE Confidence: 0.893599689006805
00:42:49.560 --> 00:43:09.570
OK and we have here our beautiful beautiful star schema solar model that's all Corey editor is empowered by it's really, really powerful tool for ETL purposes data could be extracted from.
NOTE Confidence: 0.898769736289978
00:43:10.360 --> 00:43:28.980
Many different sources transform for different format to a model for clean scene for improving. The model taking in account. The tabular model that we need an load into the tabular model and that's all happy querying.
NOTE Confidence: 0.760713934898376
00:43:29.980 --> 00:43:32.260
That's only.