WEBVTT
NOTE
duration:"00:49:58.2100000"
language:en-us
NOTE Confidence: 0.908754765987396
00:00:00.740 --> 00:00:20.750
My name is Krishna Marypark. I'ma principle program manager on the Azure stream. Analytics team today. We're going to talk about how Azure stream. Analytics can help you gather real time insights from various different data sources that you might be working with so quick show of hands? How many of.
NOTE Confidence: 0.819793879985809
00:00:21.540 --> 00:00:23.370
Used Azure stream, analytics in some capacity.
NOTE Confidence: 0.826182723045349
00:00:24.520 --> 00:00:26.480
OK very little so.
NOTE Confidence: 0.918285846710205
00:00:29.180 --> 00:00:33.370
Here is over rough agenda for the day, I'm going to talk a little bit about.
NOTE Confidence: 0.925711512565613
00:00:34.310 --> 00:00:54.320
Why, what is the need for Real Time Analytics? What are the different scenarios that leverage real time analytics in the world. I'm going to give you a quick overview of Stream Analytics. And I'm going to give you also a few customer examples that are very representative of how our customers are using stream.
NOTE Confidence: 0.923883199691772
00:00:56.080 --> 00:01:13.110
I'm going to introduce the concept of temporal functions or time windows that we actually use in Stream Analytics. And I'm also going to introduce to you. Some of the built in analytics that we've been adding to stream analytics overtime.
NOTE Confidence: 0.899091482162476
00:01:14.180 --> 00:01:34.190
And I will spend a good amount of time doing demos got some devices here. I got a raspberry. Pi I gotta T, Texas Instruments sensor tag. I got a bunch of data sources bunch of gadgets and we will be. I will be introducing various different concepts to you with these different devices sending.
NOTE Confidence: 0.887977004051209
00:01:34.980 --> 00:01:54.260
So then we will look at all the different kind of projects that you can take on as consultants and practitioners here and hopefully we will have some time left in the session for Q&A. If not, I'm going to definitely spend as much time amount of time as you need outside of the classroom.
NOTE Confidence: 0.922973692417145
00:01:55.910 --> 00:02:03.760
So what are real time analytics so real time analytics? Is a class of analytics that is very different from traditional analytics?
NOTE Confidence: 0.912434875965118
00:02:05.330 --> 00:02:25.340
I'm sure the audience here is very, very familiar with the traditional analytics, where we bring the data. We put the data in a database and then we take the query. We analyze the data with the query so in essence? What we're doing is we're taking the query to the data flip it around that is real time analytics rather than taking.
NOTE Confidence: 0.903246760368347
00:02:26.130 --> 00:02:46.140
Into the data you're bringing data to the query. The query is always ready and enabled on the wire, so to say, and as the data is pushing through it. We are analyzing the data and why is it important because there are many, many scenarios that our customers have where they need to react?
NOTE Confidence: 0.90906035900116
00:02:46.930 --> 00:03:06.940
I think that is happening within a matter of a few seconds simplest example. We've all been at the point of sale terminal. We have swipe their credit card within a second you're going to get a response back whether your transaction has gone through or not do you really believe that the credit card gateway company has the time to take that event persist on a database check it'd.
NOTE Confidence: 0.906782746315002
00:03:07.730 --> 00:03:27.740
Runs, a mental models on it to see if it could be a fraudulent transaction and give you the response back chances are no chances are the credit card gateway companies using some kind of Real Time Analytics. In the background. So that it can give you a response back within a matter of a second or 2, so we have got a lot of customers.
NOTE Confidence: 0.907039999961853
00:03:28.530 --> 00:03:48.540
I mean because they want to understand the real time status of whether it is a sensor whether it is vibration on generator and things like that, so that they can proactively stop that or slow it down within a matter of a few seconds. There are many, many scenarios like that? How.
NOTE Confidence: 0.92236590385437
00:03:49.330 --> 00:04:09.340
Examples real life examples of how our customers are using stream Analytics. The first example that I have here is very near and dear to my heart because in fact stream. Analytics is in a scenario where it is saving lives. So Sky Alert is a Mexican company what they do is they provide.
NOTE Confidence: 0.904809474945068
00:04:10.130 --> 00:04:25.550
Ring systems about earthquakes happening on the western coast of Mexico, which is very systemic activity prone So what they do is they have sensors that they laid all across the western coast of Mexico and.
NOTE Confidence: 0.919521629810333
00:04:26.580 --> 00:04:46.590
They provide their unique value proposition is that they provide early warning systems 40 seconds before the government sirens go off so how are they doing it they are constantly collecting the telemetry from these devices are running through stream analytics and whenever there is an anomaly that they find they immediately understand around this law.
NOTE Confidence: 0.901694655418396
00:04:47.380 --> 00:05:07.390
Feature of this sensor how many subscribers do I have? Let me send a notification to all these subscribers and they do it by stitching together, various different services on Azure and Stream. Analytics being the real time processing engine there. There are really 2 things in the code that really are very interesting. One is they are able to provide rispa.
NOTE Confidence: 0.911786019802094
00:05:08.180 --> 00:05:28.190
So the to their subscribers 40 seconds before the government notifications go out and number 2 is they wanted a system or a service or a cloud that they think is 100% reliable that is up and running all the time so we work very closely with them to ensure that the pipelines that they have set up for real time analytics or?
NOTE Confidence: 0.902424037456512
00:05:28.980 --> 00:05:32.680
Percent of the time with all the necessary reliability and redundancies.
NOTE Confidence: 0.903943240642548
00:05:34.760 --> 00:05:54.770
Translator there a large power production company mainly based out of Canada and they operate these large wind farms and they you stream analytics to continuously monitor the state of their assets, which are essentially the turbines inside these large wind farms and they?
NOTE Confidence: 0.889818012714386
00:05:55.560 --> 00:06:15.570
Vibration in those turbines and as and when they find some anomalous activity they can proactively dispatch crew members to take a look at what is going on or even better. They have the capability to using the Cloud Command and control these turbines they can detect which.
NOTE Confidence: 0.898975849151611
00:06:16.360 --> 00:06:21.520
Turbine is defective and they can slow it down or stop it completely before any catastrophic damage occurs.
NOTE Confidence: 0.901011109352112
00:06:23.530 --> 00:06:43.540
Southern company is a large power and utilities company down in the South of United States and they you stream analytics for various different purposes number. One is alone worker safety so they use what we have built in Geospatial Analytics. Just like the geospatial functions that we have in.
NOTE Confidence: 0.863892257213593
00:06:44.330 --> 00:07:04.340
SQL DB we also have Geospatial Analytics. As a part of a stream Analytics. And in fact, we have inherited the same set of functions into stream analytics and we have enabled them in a real time fashion for on real time data sources, so whenever there are hurricanes or storms and tornadoes these crew.
NOTE Confidence: 0.920224189758301
00:07:04.360 --> 00:07:24.370
Members are moving in the opposite direction of the traffic right so they want to ensure that they loan worker safety is always there, so they are constantly monitoring the deviation of these workers from their projected path and if that deviation is more than a particular number of meters.
NOTE Confidence: 0.88550728559494
00:07:25.160 --> 00:07:34.500
Immediately send more help to find out if they are OK or call them and take necessary corrective actions they also.
NOTE Confidence: 0.915293097496033
00:07:35.510 --> 00:07:55.520
Have sensors on these large trucks that these workers are driving and this trucks are fitted with sensors that are constantly measuring the accelleration and so on, and so forth so they can measure if there are any sudden breaking or rapid acceleration because these generators and other equipment that this.
NOTE Confidence: 0.913633167743683
00:07:56.310 --> 00:08:16.320
Adding or really expensive so that they can go and take a look at that data and train them or ask them to drive safely and things like that, and then the pretty common scenario in power and utilities where these utility companies have sensors all across the distribution network, so that if a particular sensor.
NOTE Confidence: 0.890044808387756
00:08:17.110 --> 00:08:37.120
They immediately know that a particular part of the grid has failed and before the subscribers. Even called the power and utilities company. They can dispatch the cruise to take a look at what is going on by and also understanding exactly the location where the problem occurred.
NOTE Confidence: 0.915621042251587
00:08:38.610 --> 00:08:58.620
So today, it's been about 2 years in stream. Analytics has been in the market and we have about multiple thousands of customers who are using stream analytics in production scenarios in mission critical scenarios. These are just a few examples that are fully referenceable. Their case studies are completely documented about what they're doing with Stream Analytics.
NOTE Confidence: 0.875343561172485
00:08:59.410 --> 00:09:19.420
To azure.com case today's filter for Stream Analytics. Then you can very easily understand what each of these customers are doing with Stream Analytics. In fact, Europe is a hotbed of activity for Stream Analytics. As I speak and I was actually visiting customers and also participating at embedded World Conference in Nuremberg for the first 2.
NOTE Confidence: 0.888886034488678
00:09:20.210 --> 00:09:21.940
And then I happen to come here.
NOTE Confidence: 0.476572483778
00:09:24.160 --> 00:09:24.980
So.
NOTE Confidence: 0.882827639579773
00:09:25.930 --> 00:09:45.940
Let's slowly get into the core of Stream Analytics. So why do customers Lau stream analytics first and foremost it is all about the performance once your data is in the cloud we analyze the data. With the latency of a few milliseconds. So we don't persist any data as the data is moving coming into Azure.
NOTE Confidence: 0.908434331417084
00:09:46.730 --> 00:10:06.740
So we fully support millisecond latency is for data. Analytics, which is really not possible with any other solution in Azure today. The second point is all about developer productivity. So there are a few things here that we need to note stream. Analytics integrates with about 15 different services in Azure and all of these.
NOTE Confidence: 0.884707391262054
00:10:06.780 --> 00:10:26.790
Integration points or just point click provide content configuration matter para meters. You don't need to write even a single line of code in order to make these integrations happen whether it is to power BI for real time. Dashboarding whether it is to SQL DB or SQL data warehouse for data storage and persistence for a longer term basis or weather.
NOTE Confidence: 0.90046626329422
00:10:27.580 --> 00:10:45.050
Oceans to kick off a workflow for a notification all these integrations happen with the just providing a few simple configuration para meters. The second one is probably really interesting through this crowd 2 and a half years ago when we were at the drawing board trying to figure out what should be the.
NOTE Confidence: 0.876208305358887
00:10:45.970 --> 00:11:05.980
Authoring surface language head that should that we should provide to this really powerful stream. Analytics engine, we looked around, whether it was a compotator from the open source side whether it was a competative from the Pty offering side everybody required users to use some kind of imperative language that looked.
NOTE Confidence: 0.855211496353149
00:11:27.570 --> 00:11:47.580
Make it relevant in a real time world, so we were the first entrance in the market to provide a SQL language head to this powerful real time. Analytics engine, and now when we turn around whether it is competitions on the open source side or whether it is only proprietary side.
NOTE Confidence: 0.900740623474121
00:11:48.370 --> 00:12:08.380
But it's Park included Kafka included everybody offers a SQL language head and everybody is trying to look like and get the kind of success that we've had in this domain as Microsoft. The Third Point is intelligent cloud and edge stream. Analytics uses the same exact language whether you want to deploy your analytics in the cloud.
NOTE Confidence: 0.870636522769928
00:12:09.170 --> 00:12:29.180
Device on Azure Yotei Edge runtime. It is the same exact language. Maybe initially you been running your analytics in the cloud. But then you bought a more powerful device device with a more memory more footprint on which you can put Azure Yotei Edge runtime the same language. You now stop your job that is running in A.
NOTE Confidence: 0.914329886436462
00:12:29.970 --> 00:12:42.580
I want to run in the edge and then you provide the device ID and configuration para meters. Now your job is running on the edge in the future. We want to go in a direction where the system is going to be smart enough to.
NOTE Confidence: 0.869448244571686
00:12:43.740 --> 00:13:03.750
Split the processing between the edge in the cloud depending on the throughput depending on the kind of Analytics. If it is if the analytics or the throughput is really high then we will automatically move the processing to the cloud. If the throughput is low. Maybe it is in the night. Maybe it is during the not so busy times of the day, we will automatically transition they processing onto the edge.
NOTE Confidence: 0.928981602191925
00:13:04.540 --> 00:13:06.590
Kind of moving in that direction as we speak.
NOTE Confidence: 0.861397385597229
00:13:07.500 --> 00:13:27.510
We are the cheapest service in the market today for Real Time Analytics. We start at 11, us cents per hour and that is really, really cost interactive for our customers and we have a financially back 3 lines of SL on the service so if at any point of.
NOTE Confidence: 0.92439466714859
00:13:28.300 --> 00:13:39.720
Available 99.9 percent of the time during the course of any given hour by hour or minute by minute for the time we could not meet the Tesla. We're going to refund the customer back automatically.
NOTE Confidence: 0.898716509342194
00:13:41.590 --> 00:13:58.240
And in fact, we are going towards 4 lines of Tesla. That's what we are trying to aim for any guesses? What 3 lines of SL it translates to 99.9 percent SLA? How many minutes of downtime do you think it can a service can effort to have during the course of a month?
NOTE Confidence: 0.92730724811554
00:14:01.020 --> 00:14:13.970
3 nights translates to about 40 minutes of downtime, but 4 lines, which is 99.99 percent availability translates to 4 minutes of downtime only so we are moving from 3924 nines.
NOTE Confidence: 0.899452388286591
00:14:17.960 --> 00:14:37.970
So this is how our language looks like very simple to write but can do really powerful things data manipulation. Aggregation functions date and time functions. You are probably 100 times better in SQL than I am and you probably are very familiar with a lot of these functions that you see on the screen, but then.
NOTE Confidence: 0.87344878911972
00:14:38.760 --> 00:14:58.770
New functions that you might not have seen like windowing extensions, tumbling window hopping windows. Sliding window session window. What we did was we took the basic SQL compiler. And we added more windowing functions to it, temporal query capabilities to it to make it more relevant in a streaming world also.
NOTE Confidence: 0.910302698612213
00:14:59.560 --> 00:15:13.080
Machine learning functions, we have built in machine learning capabilities in Stream Analytics. And I'm going to talk about that and do some demos also geospatial functions is something that again. We inherited from the legacy of the SQL world.
NOTE Confidence: 0.871638596057892
00:15:15.250 --> 00:15:35.260
Less code equals more developer productivity about an hour ago. One of a really smart developers took a piece of logic and try to put that logic in SQL in stream SQL versus an open source equal and which was a party storm at that time spark is also not very far behind, so something that they could accomplish with.
NOTE Confidence: 0.87061995267868
00:15:36.050 --> 00:15:49.940
In stream analytics took them almost 2000 lines of code in a open source equivalent so that really shows the time to value in the ROI that customers can get when they deploy stream analytics for the real time needs.
NOTE Confidence: 0.897468328475952
00:15:52.140 --> 00:16:12.150
This is another very interesting direction in which we are heading so like I said some of our competitors started from the imperative code moving towards declarative So what we are doing is we started with a declarative. We had very good success with it, but then customers who want to implement some very complex math.
NOTE Confidence: 0.879698157310486
00:16:12.940 --> 00:16:32.950
Jeans and routines they were saying that SQL with SQL, they were hitting some ceiling so we started in direction to provide UDF support from within the SQL language in Stream Analytics. So today, we support user defined functions written in javascript and.
NOTE Confidence: 0.909770607948303
00:16:33.740 --> 00:16:35.990
Both in the cloud and on the edge.
NOTE Confidence: 0.904195606708527
00:16:36.960 --> 00:16:50.290
And we have plans to support more languages. We're thinking about python. Java or etc. We haven't decided but you you shouldn't be surprised to see us supporting more UDF's in the future.
NOTE Confidence: 0.913778901100159
00:16:54.070 --> 00:17:01.060
Like I said, we have a very compelling price point. It starts at 11 cents an hour. I think in euro terms, it is .09.
NOTE Confidence: 0.913413226604462
00:17:04.190 --> 00:17:24.200
Here are some other very important features that customers actually care about customers care about exactly once processing, especially if they're processing financial data. They cannot have a particular event process multiple times. So we provide exactly once processing. We also promise no Eva.
NOTE Confidence: 0.885236918926239
00:17:25.420 --> 00:17:40.010
As much as even even if the throughput is increasing for a given footprint capacity. We might delay the processing. But we will never drop in events. We will process every events that are coming our way, and we will process every event only once.
NOTE Confidence: 0.892907023429871
00:17:41.350 --> 00:18:01.360
Repeatability guarantees given a particular query and a particular event, you will always get the same result. No matter what. So, these things are really important for a lot of customers for auditability and also, if the if they are. Let's say streaming the stock codes or for high frequency trading for financial purposes.
NOTE Confidence: 0.904400706291199
00:18:02.150 --> 00:18:11.070
Balancing books and things like that, they want to make sure that there is repeatability. There is no event. Las and there is exactly once processing and we issue are all of those.
NOTE Confidence: 0.868451535701752
00:18:12.330 --> 00:18:18.550
The next point is again very, very powerful, we worked so much on it.
NOTE Confidence: 0.917422115802765
00:18:19.660 --> 00:18:35.730
Within the same query you can process all the events. Let's say you're monitoring the status of thousand cell phone towers spread across the entire country and you are looking for any issues with the health of those cell phone towers.
NOTE Confidence: 0.914522230625153
00:18:36.630 --> 00:18:56.640
You can do that in 2 different ways. Let's say some of the cell phone towers or connected by very fast 4. GLT some of those cell phone towers down where they connectivity is not very good. They are probably connected with the some satellite connectivity. There are some cell phone towers. That might be just sending in with the 2 G. Right you might have a mix of.
NOTE Confidence: 0.905233979225159
00:18:59.850 --> 00:19:19.270
You can using very simple configuration para meters tell the query to process all of these events. Coming from across all these cell phone towers, using a single unit global timestamp or you change the query a little bit. Add one more keyword to it and then the query will?
NOTE Confidence: 0.901859760284424
00:19:20.910 --> 00:19:40.920
Each cell phone tower independently of each other in other words, there will not be any offset that the query will put in order to ensure that all the events from all the cell phone towers or lining up before the query starts processing it with just simple configuration parameters. You can ensure that all the events are either process using?
NOTE Confidence: 0.86242413520813
00:19:41.710 --> 00:19:45.290
Big time stamp or just process each of them independently.
NOTE Confidence: 0.891776025295258
00:19:47.760 --> 00:20:00.270
You can also set out of order policies later. I will posit policies. You can say. I want to put an offset of so many minutes, so that all my systems catch up before I start crossing kind of goes with the previous step.
NOTE Confidence: 0.868190824985504
00:20:03.420 --> 00:20:23.430
So how does a pipeline real time Analytics Pipeline looks like so on the very left of the screen you have devices in gateways that are constantly generating and sending the data into Azure. Their landing spot on Azure is either an event hub or an IO.
NOTE Confidence: 0.835773468017578
00:20:23.520 --> 00:20:43.530
So both event have and IOT hub. They are message brokers. IOT hub is very geared towards IOT scenarios where customers are managing the health of their assets and devices and the primary difference between IOT hub and event hub is that I only have is a BI directional cue, it can command.
NOTE Confidence: 0.866442978382111
00:20:44.320 --> 00:21:04.330
Yes, it can write back to the source system, whereas it went up. It can only Ingress messages from the sources. But I OT help can command and control the devices back and there are other things like it can help secure the devices. You can update the firmware very easily and all of that with IOT hub so stream Analytics.
NOTE Confidence: 0.888447046279907
00:21:05.120 --> 00:21:25.130
Primarily from IOT hub send eventhubs we also have the capability to read from Blob Storage. You might ask why this doesn't blow storing in Blob and then reading from blob doesn't seem very real time you're right. It probably adds a few tens of seconds of latency, but we did that because there was.
NOTE Confidence: 0.872528731822968
00:21:25.920 --> 00:21:45.930
Message size limitation in IOT hub end event up and lot of our customers. We have many customers who are in the gaming industry where the event size tends to be higher than the maximum size of event that is supported in our we went up so for the purposes of those customers where the event size is high and the.
NOTE Confidence: 0.886286497116089
00:21:45.960 --> 00:22:05.970
Tolerance for latency is also high then for those customers. We provide IOT hub as a source now, what we're seeing is some customers even though they're using IOT hub and event for real time processing. They still use blob storage to read from Becaus if they want to replay the data and understand the B.
NOTE Confidence: 0.886225640773773
00:22:06.760 --> 00:22:14.450
Under some conditions, then they can very easily replay, the data from blob storage and do some analysis for themselves.
NOTE Confidence: 0.919918477535248
00:22:18.330 --> 00:22:38.340
Stream analytics is not a signal processing solution a signal processing solution. You have one signal coming in you're constantly processing that signal. We are a truly complex event processing solution? What does that mean we can join multiple fast moving streams of data at any given point of time if you have 3 or 4 streams of data coming in.
NOTE Confidence: 0.904620051383972
00:22:39.130 --> 00:22:59.140
And you want to put joins on these data sources to get some kind of Analytics. Going we can do that for you, we can join multiple streams of data and we can also join multiple streams of data with what we call reference data. Let me explain for a little bit the concept of reference data, the first example that I gave you from?
NOTE Confidence: 0.874633967876434
00:23:00.060 --> 00:23:20.070
Sky alert the Mexican company that does earthquake alerting systems so the data payload and even payload is just event ID and the seismic activity around the event ID, they are not sending what is the long A tude latitude information every time because that?
NOTE Confidence: 0.875133872032166
00:23:20.860 --> 00:23:40.870
Change so they have that information with for every sensor. What is the latitude longitude? When was it last serviced who is the vendor they bought it from etc? Etc in a reference database in Azure and just before the processing by Stream Analytics. They can come join reference data.
NOTE Confidence: 0.905298948287964
00:23:41.660 --> 00:24:01.670
Streams of data to add that level of metadata to the information so that if there is any anomalies that are found downstream. They know exactly who are these subscribers in that particular area and they can do a lot of logic based on that and once we process the data. We also have the ability to do real time, scoring on the data.
NOTE Confidence: 0.902115225791931
00:24:02.460 --> 00:24:22.470
The models that you might have trained in Azure machine. Learning that we integrate with Azure Machine Learning. So if you have a model that you have trained you. You just need to expose the rest. API endpoint of that model and we can actually integrate with Azure Machine Learning Service and do constant scoring and then push the data out and so that you can do.
NOTE Confidence: 0.88982218503952
00:24:24.170 --> 00:24:27.390
On the growth side there are typically 3 core.
NOTE Confidence: 0.873991787433624
00:24:28.500 --> 00:24:48.510
Scenarios that customers do one is they're storing the data for long term retention to run their machine learning algorithms to do some what if analysis, etc so for that. We integrate very well with Cosmos DB SQL. DB Azure Data Lake Azure Blob Storage, etc. We also see a lot of customers doing.
NOTE Confidence: 0.860718309879303
00:24:48.650 --> 00:25:08.660
What we call real-time dashboarding so we integrate beautifully with power BI in fact we were the first service that power be. I supported for their streaming APIs right for the streaming push APIs and I will show show. You some of that very soon and then the third set of activities that customers do once.
NOTE Confidence: 0.876596450805664
00:25:09.450 --> 00:25:29.460
Spy stream analytics is, they want to run some custom code downstream whether it is to send notifications to 2 people who are affected because of this early warning systems or create a service now ticket and route, it to our technician and things like that, we for that purpose is to integrate.
NOTE Confidence: 0.878842890262604
00:25:30.250 --> 00:25:38.570
Service bus topic service bus queues, etc. So that's kind of how an end to end real time. Analytics pipeline looks like on Azure.
NOTE Confidence: 0.892000317573547
00:25:41.180 --> 00:25:57.310
Now like I said the stream. Analytics is available, both in the cloud and on Azure. Yotei edge, so the first little blue circle that popped up actually says that you can run this whole thing in the cloud or on the device only device gateways as well.
NOTE Confidence: 0.874553084373474
00:26:00.460 --> 00:26:20.470
Given that this is a sequel audience. We recently started supporting SQL data in the reference category for Stream Analytics. Previously, it was only blob storage. Now we also support data in sequel to join with fast moving streams of data and some of the highlights here.
NOTE Confidence: 0.883106350898743
00:26:21.710 --> 00:26:41.720
We support refresh frequency for up to one minute, you can have your source data in the reference table change. One minute and beyond with that. Refresh frequency and you can also write Delta queries. If you don't want the whole snapshot of your data from your.
NOTE Confidence: 0.88452672958374
00:26:42.510 --> 00:26:54.490
Reference database update every time you can say. I want to snapshot only the changed elements in my source data to be reflected in my reference data.
NOTE Confidence: 0.877577245235443
00:26:56.910 --> 00:27:16.920
We also support this is also a relatively new feature we announced that ignite 2018. We have the ability to write full parallel to apologies to SQL and this was again coming from some of the European customers in the automotive industry.
NOTE Confidence: 0.849594235420227
00:27:17.710 --> 00:27:32.010
We can write our velocity at which we can write the throughput. We can write to downstream sequel. Once data is processed in stream. Analytics is about 500,000 events. A second sorry a minute I take it back.
NOTE Confidence: 0.881078720092773
00:27:34.660 --> 00:27:54.670
Very quickly temporal functions or the time windows form the foundational concept on which stream analytics works that actually defines how it is different from regular SQL that we're familiar with like I said, we took the same SQL compiler extended it to make it relevant inner.
NOTE Confidence: 0.857516825199127
00:27:55.460 --> 00:28:15.470
So the first window. I want to introduce is a tumbling window, so tumbling window is where the window size is same as the refresh frequency. Right so you can have a report that updates every 5 seconds and provides you what was the app.
NOTE Confidence: 0.877410054206848
00:28:16.260 --> 00:28:22.140
What cars the cross this bridge in the last 5 seconds so that is example of a tumbling window?
NOTE Confidence: 0.888508796691895
00:28:23.070 --> 00:28:43.080
The next window is very similar to tumbling window, but with a hop so the same example. Think of a scenario that where you need to answer how many tell every second. Tell me how many red cars passed. This bridge in the last 5 seconds, so the refresh is one second, but the window size.
NOTE Confidence: 0.87507152557373
00:28:43.870 --> 00:28:46.160
So that is the concept of a hopping window.
NOTE Confidence: 0.883560001850128
00:28:48.030 --> 00:28:53.700
So these queries that use I'm not going too deep into it, so that I can cover a lot of ground.
NOTE Confidence: 0.918388843536377
00:28:55.600 --> 00:29:14.260
Sliding window is also a fixed time window, so the only difference in a sliding window is that it provides an output whenever there is an event that either goes into the window or leaves the window and we see customers. You sliding window. A lot whenever they have a machine learning kind of scenarios.
NOTE Confidence: 0.872507572174072
00:29:15.250 --> 00:29:22.540
So the first 3 Windows whether it was tumbling hopping or sliding they're all fixed time windows.
NOTE Confidence: 0.883050680160522
00:29:24.640 --> 00:29:44.650
No connected cars autonomous driving it is a huge area for us and we are seeing customers. Just come to stream analytics in large numbers for this purpose is and if you think about it on a bus driving and the driver. Telemetry that this companies need to provide the drivers they cannot have.
NOTE Confidence: 0.870378077030182
00:29:45.440 --> 00:30:05.450
Session for ignition starter ignition end becaus every session could be could have a different duration, so in order to satisfy those needs for connected cars. We introduced a window that is purely based on data density so the windows starts whenever the data.
NOTE Confidence: 0.913849472999573
00:30:06.240 --> 00:30:26.250
Starts coming in and then you can define how to close that window you can say that you may want to close the window after every 5 seconds after you don't see any more data coming in. It is purely based on data density or you can say. I want to close the window 60 minutes until after I see the last.
NOTE Confidence: 0.899826645851135
00:30:26.300 --> 00:30:46.310
Or you can also say I want to set this maximum size of 60 seconds, so even if the data. Points are coming in. I want to close this window in 60 seconds, so for scenarios like connected cars and there are possibly more for a very good example is click stream Analytics.
NOTE Confidence: 0.899449229240417
00:30:47.100 --> 00:31:07.110
Online advertising so from the time a person is on a browsing session to a time. A pop up ad or some kind of ad appears on your web page. The the ad serving platforms have less than one or 2 seconds to give you relevant recommendations so for those kind of scenarios where you don't know how long.
NOTE Confidence: 0.866102159023285
00:31:07.900 --> 00:31:14.780
Into last once the user has launched a particular site right so session window could be of a lot of use.
NOTE Confidence: 0.838253557682037
00:31:22.340 --> 00:31:26.550
Like I said session windows starts when the first event has to come in Yeah.
NOTE Confidence: 0.898871421813965
00:31:27.920 --> 00:31:37.250
Now, one thing that we're working on this is kind of in the road map is to have an event characterize.
NOTE Confidence: 0.879291892051697
00:31:38.640 --> 00:31:57.580
Weather session window has starts or not right user can specify start. This window only when the event value in this particular category is X or Y so and then same thing to close the window out as well so that is another step that we're working on right now.
NOTE Confidence: 0.773877084255219
00:32:06.780 --> 00:32:10.220
I don't know the answer for that. Sorry.
NOTE Confidence: 0.875031352043152
00:32:15.650 --> 00:32:35.660
There there are different ways in which developers can build and deploy stream. Analytics queries, Azure Portal Power Shell. The list is on the screen there and very recently. We put AVS Code in private preview as well so that we see a lot of developers in the IOT world and.
NOTE Confidence: 0.849565327167511
00:32:36.450 --> 00:32:45.610
So for us kind of use Max and non windows, so for the purposes of catering to those customers. We are starting to offer visual code VS code.
NOTE Confidence: 0.862323343753815
00:32:48.260 --> 00:32:53.910
OK, so enough talking let me do a quick demo this is the first demo.
NOTE Confidence: 0.909183382987976
00:32:58.710 --> 00:33:03.800
OK, So what you see on the screen here any guesses what it is.
NOTE Confidence: 0.81225711107254
00:33:08.690 --> 00:33:10.030
Don't feel shy it's OK.
NOTE Confidence: 0.900939285755157
00:33:11.430 --> 00:33:26.510
It is the real time temperature and humidity in the room right here. Right now and how my measuring it. I have a small little device. It is ATI sensor tag you can buy it for 15:20 euros on Amazon so.
NOTE Confidence: 0.901035487651825
00:33:28.360 --> 00:33:41.040
What I want to show you before I even show you how we simpler query is to write in order to show dashboards like this first of all this is a dynamic dashboarding. It is a streaming dashboard that we work very closely with the power BI team to help enable.
NOTE Confidence: 0.914479076862335
00:33:42.650 --> 00:33:47.350
What I will do is I will just blow on it and then you can see the latency with which?
NOTE Confidence: 0.822780132293701
00:33:48.570 --> 00:33:53.030
You know the power BI dashboards can pick it up.
NOTE Confidence: 0.889608144760132
00:33:54.420 --> 00:34:14.430
There's a little bit of network delay because everybody is using the same thing. So the way this is connected to the cloud is this is a Bluetooth enabled device. This is sending Bluetooth data into my machine and my machine has a little utility that is sending it to an event hub in the Azure and Stream. Analytics reads from the event hub and then process.
NOTE Confidence: 0.838441729545593
00:34:15.220 --> 00:34:21.150
This sense about 10 second 10 data points every event what I am going to have is.
NOTE Confidence: 0.838505804538727
00:34:23.350 --> 00:34:40.330
What what I have is just AI'm? I'm going to compute the maximum of temperature and humidity every second, though I have 10 values and then I will put it in a power be. I dashboard so I'm just going to blow on it.
NOTE Confidence: 0.888513803482056
00:34:46.060 --> 00:35:06.070
So you will see that very soon, the power BI dashboards will calibrate themselves and within 2 or 3 seconds of me doing that. You are seeing jump in temperature and humidity, so this is not as good because the data center is somewhere in the West us we're here, despite that, you seen a very decent.
NOTE Confidence: 0.884418070316315
00:35:06.860 --> 00:35:18.530
You know latency is typically we see customers put their jobs in the same region that they're in in order to avoid even that 2 or 3 second latency is that you have seen here.
NOTE Confidence: 0.923536241054535
00:35:19.650 --> 00:35:25.840
So, like I promised I will show you the job that I had to write in order to get a dashboard like this to work.
NOTE Confidence: 0.84902548789978
00:35:32.570 --> 00:35:39.170
This is the job sorry because the job is running it is not very dark.
NOTE Confidence: 0.887173891067505
00:35:40.130 --> 00:36:00.140
So all I'm doing is select maximum of humidity as AHMDTHMDT&TMP or the variables that are coming in from the device and I'm also taking the timestamp associated with that humidity and temperature and I'm putting it into the output. Output is just an alias that have defined for the power BI dashboard here and.
NOTE Confidence: 0.872485518455505
00:36:00.930 --> 00:36:20.940
Name of the event hub again an alias for the event, where I'm pulling the data points from and I'm using the tumbling window like I said every one second. I want to get maximum of humidity and maximum temperature out of the 10 or so events that are coming into the cloud so lot of customers also you stream analytics for Phil.
NOTE Confidence: 0.90365082025528
00:36:21.730 --> 00:36:41.740
Location of the data if they want to keep their storage cost down in the cloud. They may not want to store every little piece of information that is coming that way. In some cases, customer want to do aggregation every 5 seconds every 10 seconds and then put less amount of data in storage or in some cases customers just want to read out a lot of noise, especially.
NOTE Confidence: 0.887389779090881
00:36:42.530 --> 00:36:51.780
Is it raining there ML models and things like that? So we see a lot of practical users of stream analytics, though it is not in the sense and respond category.
NOTE Confidence: 0.924579381942749
00:36:56.920 --> 00:36:58.320
Going back to
NOTE Confidence: 0.859121084213257
00:37:01.160 --> 00:37:02.100
The slide deck.
NOTE Confidence: 0.914673209190369
00:37:06.560 --> 00:37:07.900
Geospatial Analytics.
NOTE Confidence: 0.90303772687912
00:37:08.880 --> 00:37:28.890
We have a lot of interesting scenarios customers using stream Analytics. In Geospatial scenarios, so like I said the whole connected car autonomous driving right sharing asset tracking. There are multiple scenarios in the real time. Analytics world, which require us to provide geospatial.
NOTE Confidence: 0.876913249492645
00:37:29.680 --> 00:37:49.690
And and analytics on the geospatial information so other not star is to provide these geospatial capabilities, both in the cloud and on the edge in order to enable scenarios like ride sharing for Uber and we're very close to. I think we are.
NOTE Confidence: 0.91242241859436
00:37:50.480 --> 00:37:55.380
We recently announced the geospatial indexing or in the cloud and on the edge.
NOTE Confidence: 0.855215311050415
00:37:56.410 --> 00:38:16.420
In order to make the data computations alot more faster right so without geospatial indexing. Let's say you have N number of assets that are being tracked across M number of Geofences. We had to do any cross M computations. But with indexing and crossing computations is now limited to N login computations.
NOTE Confidence: 0.856258511543274
00:38:17.210 --> 00:38:37.220
Sing and latency is a lot more palatable for our end users so do these geospatial functions seem familiar to you from the sequel world. These are the same geospatial functions that we have actually inherited from what exists in SQL, but enabled it in a real.
NOTE Confidence: 0.809500277042389
00:38:38.230 --> 00:38:58.240
Scenarios we support both Geo JSON and WKT formats anybody familiar with WKT again. One or 2 of you WKT stands for well known text. It is a very common very well known text format in the Geospatial.
NOTE Confidence: 0.871486186981201
00:38:59.030 --> 00:39:19.040
You define let's say Geo spatial polygons. Let's say there is a city with a little Lake in between, then using a WKT formats. It's very easy to create this donut hole shape the regions on which you can run your geospatial code and also figuring out what is addition?
NOTE Confidence: 0.883594989776611
00:39:19.830 --> 00:39:29.270
Point B, sometimes you cannot just across the Lake you will have to go around the Lake and those kind of things. Those kind of computations can be done very easily by using WKT formats.
NOTE Confidence: 0.902772784233093
00:39:32.790 --> 00:39:52.800
We have customers using geospatial functions in many different scenarios for example, there is a large telecom company in Australia. They sell this little devices that can track the location of kids or pets. You can put that in a dog collar and you can define a Geo fence around your house, saying that the place that I.
NOTE Confidence: 0.860932528972626
00:39:53.590 --> 00:40:13.600
10 meters or 15 meters around my house is my Geo fence and you will get an automatic alert if your dog happens to cross that 15 millimeter boundary because the little juice. Facial tracker is now in the dog's collar. Same thing with kids you can set up a geospatial fences around the kids school.
NOTE Confidence: 0.891114413738251
00:40:14.390 --> 00:40:23.640
When the kid leaves the school you're notified that alright the school buses left and you know that you need to anticipate the kid coming in you know in the next 15 or 20 minutes.
NOTE Confidence: 0.830175936222076
00:40:28.550 --> 00:40:30.730
OK let's do a geospatial demo.
NOTE Confidence: 0.893624305725098
00:40:32.240 --> 00:40:52.250
I don't have a device for this, but I have a data source that I'm leveraging so the data source that I'm leveraging is the New York City taxi cab data that has information. The code information is the pick up point they drop off point, the number of people in the car and also.
NOTE Confidence: 0.912627398967743
00:40:53.040 --> 00:41:10.140
Is we took the geospatial coordinates of the Microsoft store in Time Square and the dashboard that you will see and then I will show you the query actually shows how many drop offs have happened around the Microsoft store in Time Square in the last 5 minutes or so.
NOTE Confidence: 0.919862687587738
00:41:11.460 --> 00:41:22.070
So let's start with the dashboard. Let's see understand the dashboard and then let's go into the query to understand how simple is a query can be written in order to power dashboards like this, I'm hoping everything works.
NOTE Confidence: 0.246902093291283
00:41:27.780 --> 00:41:28.960
No.
NOTE Confidence: 0.459807068109512
00:41:34.240 --> 00:41:34.640
OK.
NOTE Confidence: 0.589036166667938
00:41:38.860 --> 00:41:40.020
Let me refresh.
NOTE Confidence: 0.802278935909271
00:41:46.960 --> 00:41:50.010
Data generation failed for some reason OK.
NOTE Confidence: 0.813847541809082
00:41:53.200 --> 00:41:55.380
I wish it was working.
NOTE Confidence: 0.923114895820618
00:41:56.370 --> 00:42:02.710
So essentially you will get a real time dashboard like this for some reason I still have a view of what was happening when the data did not fail.
NOTE Confidence: 0.88664972782135
00:42:04.020 --> 00:42:24.030
In the first tile, I'm seeing pickups by different region in Manhattan and as they're happening, you can see the graphs move and slide right so we have defined different reasons in Manhattan through the WKT format. Midtown so we have third party providers who actually give you this Geo spatial polygons that you can buy and you can track.
NOTE Confidence: 0.859341144561768
00:42:24.820 --> 00:42:36.580
In your queries Midtown Soho Harlem Chinatown Carnegie Hill, etc. These are all different regions in Manhattan and this is showing how many pickups are happening by the cabs at different times.
NOTE Confidence: 0.899292945861816
00:42:38.380 --> 00:42:57.810
Then I'm also joining this stream of data with historical data that actually helps you compare the total business that this cap company or whatever is having right now versus historical average. Is it good? Is it bad you can do that by joining real-time stream of data with what I said reference data.
NOTE Confidence: 0.915756225585938
00:43:01.410 --> 00:43:13.500
Now this is the number of drop offs happened around Time Square with and I think we have defined 50 meters around Times Square in Manhattan, Microsoft store.
NOTE Confidence: 0.895263969898224
00:43:15.090 --> 00:43:19.490
Again, we used Microsoft stores coordinates and defined a Geo fence around it.
NOTE Confidence: 0.884520530700684
00:43:21.780 --> 00:43:40.670
And then for every trip that has pick up and drop off time we are able to calculate what is the average right length right? How many minutes does a taxi cab ride in Manhattan typically less this is just to show the kind of the power of stream analytics?
NOTE Confidence: 0.892090618610382
00:43:41.610 --> 00:43:44.590
And then average number of passengers per trip.
NOTE Confidence: 0.859128415584564
00:43:54.950 --> 00:44:05.790
So to put the so the question was what is required to send this data into power BI? How about I just show you? How to integrate with power BI?
NOTE Confidence: 0.874880969524384
00:44:06.890 --> 00:44:26.900
All you need to do is provide a name for the table and the data set in power, BI and Stream. Analytics all automatically cast that into power BI as a streaming data set. You don't even need to have the data set ready in 4 by if that data set doesn't exist as you have defined in the query. We will automatically create that data set for you.
NOTE Confidence: 0.774451076984406
00:44:27.690 --> 00:44:32.290
Choose the power BI to create those streaming tiles.
NOTE Confidence: 0.858577847480774
00:44:37.520 --> 00:44:38.930
Sorry. Can you talk a little long?
NOTE Confidence: 0.812812685966492
00:44:42.160 --> 00:44:50.300
I'm not a licensing expert on power be II'm sorry, OK, so I have 5 more minutes.
NOTE Confidence: 0.885076403617859
00:44:54.000 --> 00:45:01.880
I have one more very interesting scenario in the demo to cover. Let me just do that very quickly so very quickly.
NOTE Confidence: 0.918038010597229
00:45:08.000 --> 00:45:28.010
We understand that building and training machine learning models could be difficult. So, like I said, we have an integration with Azure Machine Learning, where you can bring your own model, so that you can score your data in real time, but lot of customers told us that it. It's time consuming to integrate with Azure Machine Learning and also because it did involves.
NOTE Confidence: 0.895924925804138
00:45:28.800 --> 00:45:48.810
Interaction between multiple services in Azure. Typically, the latency is or not in the range of what they want customers want probably one or 2 seconds of latency and if it if that stretches the latency to 4 to 5 seconds. For some customers that is just not enough. So we then looked at what is it that most of our customer? What kind of anomaly detection machine learning capabilities customers.
NOTE Confidence: 0.890946686267853
00:45:48.840 --> 00:46:08.850
And the answer was anomaly detection, so we worked with the data science team in Microsoft. We got the anomaly detection functions that are now open source didML.net and just big data into the code base of stream analytics, so for anomaly detection right now. You don't need to train and.
NOTE Confidence: 0.862854897975922
00:46:10.470 --> 00:46:30.480
Excuse me build your models you can just make simple function calls and we can detect anomalies win in your data streams very easily automatically so there are 2 very broad categories of anomalies that we detect one is the Spikes in depth dips. These are you know very short last.
NOTE Confidence: 0.889888226985931
00:46:31.270 --> 00:46:51.280
Then we also have we use a completely different algorithm to detect a slow, increasing slow decreasing kind of anomalous. Let's say there is a VM that you are monitoring and we have customers actually doing that for let's say memory leak OK. It is not something that is very evident to the naked eye, it slowly sort of creeps up on you so we have.
NOTE Confidence: 0.867332398891449
00:46:52.070 --> 00:47:12.080
Separate set of ML algorithms to actually help you understand that you have a slow, increasing slow decreasing by level change anomalies as well. And all you need to do in order to define an anomaly is let's just go with spike and if function for now, the first.
NOTE Confidence: 0.896017491817474
00:47:12.870 --> 00:47:32.880
Uh function para meters that you're providing is a scalar expression, which is just the attribute on which you want to run machine learning function. It could be something that is coming from your source data or it could be something that you compute with in your query. The Next One is the confidence level confidence level sets. This sensitivity of the.
NOTE Confidence: 0.919738352298737
00:47:33.770 --> 00:47:41.650
So higher the confidence level lower the sensitivity and the number of events that will be flagged as anomalies are also going to be fewer.
NOTE Confidence: 0.907211303710938
00:47:42.580 --> 00:48:02.590
History size one important thing I forgot mentioning guys. These are unsupervised learning models. In other words, the models are not pre trained. The models learn from the data that they are saying and why are they not pre trained it's because every system has a very natural different distribution then every other system so we did not want to.
NOTE Confidence: 0.893504500389099
00:48:03.380 --> 00:48:23.390
With any preconceived notion that these models and the data distribution will look like XY and Z so the model will learn from the data that is saying and partition by clause so within the same query using that partition by clause if you have 1,000,000 assets that you are monitoring in the background. We will spawn.
NOTE Confidence: 0.89464008808136
00:48:24.180 --> 00:48:44.190
Different models that are constantly looking for anomalies across 1,000,000 devices that you have and this is available, both in the cloud and on the edge as we speak, So what I have with me here is a raspberry. Pi and I'm running anomaly detection on raspberry. Pi like I said stream analytics can.
NOTE Confidence: 0.890303790569305
00:48:50.890 --> 00:49:10.900
Like I said stream analytics can run both in the cloud and on the edge. I have an edge device here. I have a raspberry pie. That is connected to the Wi-Fi of the event. I'm calculating anomaly on the vibration or the tilt on this particular device here, So what you see on the screen here is because I started.
NOTE Confidence: 0.871207773685455
00:49:11.690 --> 00:49:31.700
You're seeing an anomaly. I have a role I have tilt role is in this direction. Tilt is in the other direction and I'm calculating anomaly on the tilt axis of it and you can see that there is there were a normal is when I started moving the device and we're all we also expose the score for every event that.
NOTE Confidence: 0.890332520008087
00:49:32.490 --> 00:49:52.500
And when the score goes lower in the case of spike and dip. You are seeing an anomaly come up so if we had more time I would have gone a little bit deeper but I think we are at the end of time. I'll be more than happy to take any questions. I have a few stickers for stream analytics as well here if you are interested for laptops and Whatnot.
NOTE Confidence: 0.886247932910919
00:49:53.290 --> 00:49:55.200
Very much you've been a great audience thanks.