Italy the thank you for coming to this puzzle this is the reason that I think I don't have to present to you was of the 1st world view of you people unintelligible so please along with them for that is that you are there so thank you very much that morning everybody and as you may have seen I've heard recently changed my

00:39

affiliation already being unknown local years living in born of a joint model artists company here and born and we have focused on the geospatial data analysis massive data analysis and obviously also sentinel data processing experience comes from open-source years use which was signed in 3 and by involvement in development since 1997 and in my research years in Italy I spent a lot of time on HPC processing especially on modus land surface temperature time series so from there

01:17

tools send to know what's the point here we have to deal with some issues and these issues are especially the size of data this is the latest mission report which you can reach of our received from the company because web page if you look at the details here and you see here sixties 62 thousand seems have already been produced and this sentinel was just brought into space last year this is quite a bit uh 5 gigabytes in size and average discipline calculate because the overall volume is now 310 terabytes of raw data and and like this it is going on and this is only 1 of the many of Sentinel satellites which are brought to space here user user uptake is 1 . 2 petabytes so far under this is something where a system starts to struggle with so and cultural get into

02:13

such data by my colleague Carmen article was presenting the other day about the web interface just a screenshot to get you an idea where in there so it's been put up developing a web interface with cloud coverage selector with time rich selector and so forth and from there you can pick your scene and go on to calculate it and those which are present there you see serious like European slice of RGB infrared combinations of a false color combinations to get an idea of how a specific scenes look like looks

02:48

like so from the front end you may pick you're seeing you see up there I last nite I checked the earthquake region here in Italy was the I wanted to know is there any new dataset already available on the opt for a disaster management analysis and there was likely or anyone process from 24th of August and from there

03:13

you pick it this is a preview image which is already delivered by the and from there you can then register yourself and so we will receive a notification e-mail about when the data have been processed and so far we have been implementing 2 things 1 is a false color composite also true color composite option and then we have the heading mean adding some of vegetation indices and this is obviously to world so this is just a prototype which we like go trying for the conference

03:44

and From there we are going on with further functionality

03:48

edition so again this software here and not surprisingly grass years G and also just survived a prominent position here this is basically a complete open source that we are running I which show later in the Linux environment everything on high-performance computing systems and the beauty is I mean we know some of these systems quite well being a user ring a developer but in the end it is more or less standard software and this enables you to set up these

04:24

interesting applications so what are we are experimenting with the issue is that if you want it when it comes to which we see that other comes some big investments or some clever solutions or the combination of both so we have the problem that we need so that we need to find a provider who is providing us with computational resources and with this data size as you will not only need a lot of CPU will run disk space but also a decent Internet connection or it's a network connection to the stop in this case or if you will use data from Amazon or other systems in any case some data transfer may have an unavoidable even if it is internally so but of course the today's messages take the algorithms to the data this means if you can get as close as possible to the date of this is best otherwise you with a lot of time shifting around and talk we are experimenting with both of them at the systems which means you get access to erect to system you set up everything yourself or you get some higher level systems which are virtualized and from there you can bring up your instances and so forth so

05:40

the bigger the big which we have been implemented in the past month looks like this so we fetch the data currently from the data with our annual partner of signal we would be able to get them directly from the object storage so they're creating and on the Internet Archive maybe billing it up right now here in Germany and so we are much closer to the data because we can stay in intronic and from there we are running atmospheric correction but we're using the central core software friendship institutions are developing another software and so it's not yet clear to me that if there is an eternity of center course sometimes struggled submits probably metadata are not available but this is also ongoing development so from there we've done our process data into the HPC storage this can be a distributed storage system for example we are using different file systems for example justified system or others and and from there we can we can basically speed things into 2 parts 1 is the Web dedicated processing and 1 is the GIS processing which is definitely not the same thing in the wet part of we're interested to deliver data which are visually interesting let's say and attractive of course representing things in a nice way and it must be performed otherwise so you if you zoom in zoom out you don't want to wait that tides are another properly and so forth so this no need to all let's say a portion 16 bit that data to the web like that for which so what the for the WMS for example on the other hand if we stick to to that year's words we we do not want to lose any precision but we want to keep everything as it is that we deliver for example RGB stakes all or organized by the different solutions which are provided by sending them to which is 10 meters 20 meter and 60 meter resolution and so you can using especially the G . 2 you can then be uh the RT fights which will then make available additionally 1 we then push it out through the OGC protocols or so annual REST API which were currently developing but that we've come to that in terms of space to my surprise I mean know I know it but we discovered that the hard way the temporal space consumption can be quite intensive and I believed to have learned a bit about compression in in my years with motors surface temperature processing which was something like it in terabytes of final data size but still a single scene which goes through here can end up in 250 gigabyte of temporary space consumption and since we're this is OK of course but since we're speaking about parallel processing but you can imagine if you had really lounger 50 jobs in parallel you have to really plan ahead part where to put your data and if you have HPC notes so usually they are equipped with local disks like this is the this and they're not of unlimited size so far and like this you need to have some clever algorithm which is checking stuff and putting the calculations to the right so some more details already mentioned that was very correction happens at a time with central court in grass this atmospheric correction implemented for a series of satellites like lands that like would feel like repeat I like others so far we were not too keen on implementing yet another 1 for uh for to solve the problem is sometimes ties the within the at least in our system they just disappear and the lock frighten me that there is some of water vapor metadata missing or something else this I could not trace sounds because it's outside of our of development part let's see what happens I I don't know and also reprocessing doesn't change that so probably this is still a work in progress well and then we asked generating mistakes if you will check that what I have shown before meeting

10:09

there so this is 1 divided is between 5 and 7 gigabyte big this is once sentencing actually and accounts in different types and these starts here and you want to process or probably you

10:22

want a mosaic and given the fact that so with you you can create nicely archaeophytes which are basically a container for sticking different images or a mosaic and then it is quite easy issues sometimes different duty zones are hit by 1 zip so sometimes are in 1 and the others and another 1 this is of course a bit unfortunate if you want to process the entire scene in a moment but what you can do you can stretch it a bit and say OK I wish still sign of those of the let's say the minority of the tides which is in the neighbor you gemstone you project on the flight to the current 1 in grass this a nice tool for this are not important and it will figure out what the current and prediction is and then if it doesn't match reprojected on the fly this takes a few women city plans on how many gigabytes you have to process but still that's fine and from there we go into the Web GIS and then as I mentioned this story with is is obviously and then we decide if we want to go with processing more generous dedicated processing so were processor is already mentioned is trying to

11:39

reduce the amount of data so we want to shrink the debate store 8 status and our government hypothesis and that under brought rather unknown operator with hashtag signs and this does exactly the job for you but there's also rescaling toward but this is working on and then we have to deal with the black border and this is something which is quite common in remote sensing with no data and additionally you have a black border around and you have to get rid of this somehow so indeed the new black feature functionality for that but we're using the the scene classification algorithm which is so that's a product map which is an output of the central core atmospheric toward it is a generating an additional map layer which is telling you is there's water forest within class clouds and so forth and there is also no data are mask and this week and then simply assigns and use it for that purpose and once in Graz the mask is active if you export stuff the mask respected and at this point that the data are consistent with the public more than those who have already tried our system would have seen that but there is a lot time but this is basically because so we were lit scenes especially common was quite involved in the conference preparation tools are there but we just didn't set it up for this conference so or anything being processed and the next week let's say I believe that that that order with going because it's internally already gone up what so GIS process here the qualities that target we do not want to reduce anything because if you go for classification and I'm eagerly waiting for the product a version of the segmentation based classification then I also to compare it to what we get out of central core and that's not true did say but yeah we want to keep the data and the amount of bits and so forth and then we exported to modulate intuitive organized as I mentioned by the different resolutions name identification for that time is only the scene center and then use synonymy for

14:00

this In order to that so for the

14:03

user notifications to obtain where the scene was taken so this is the visualization this is again from common talk so I will not say much I mean you can see what it is to process the scene it comes shows up after some hours depends on its own the work load on the system and then you can overlay different maps and so

14:24

forth additionally this so when it is it is responsive

14:29

basically so if you receive shown notification by e-mail you can also look at this for your mobile devices so

14:36

OK there's some more details so what is going on so we are developing together with and

14:42

about and the rest API for grasp what is that the idea is that you can uh let's say you get a full API interface to the software which means you can control it from the way of the rest the protocol is giving your various commands for that you can send to the to the system and you can also retrieve bake your data so basically you have the possibility to process this through a protocol and you do not need any more right your Reginald yourself so this is something which can be controlled from a remote and the system is so implemented toward tool massive tolerate processing so if you have multiple nodes there's a job manager sending the stuff to the different notes there's a load balancer here which is doing the job and from there it is distributed and in the end there's is the distributed storage as well which is then collecting the results managing temporary data and so forth

15:47

so some scalability test this was done on the

15:50

Web infrastructure and number of CPU is and how much time we needed to process a vegetation index on the full extent in a scene so forth and scene is so can be some billion pixels of course multispectral spectral so there's some amount of data to be processed you see some interesting improvement in speed so lower is faster but because of some catching effect and this is quite interesting it depends of course on how the hardware is organized is the amount of CPU and then at some point it saturate saturates and here I put a dollar sign there if you have more money you can get more nodes of course you can distribute your workload over more nodes and then it will continued to be more is linear but you have to bring more money or whatever you do this

16:43

however you do the different test sets on odds on individual s uh what we made a test set up so it's essentially looking the same you have the different servers here and then from there you do the processing and everything again controlled for the rest of the some

17:09

use cases why sentence interesting I mean a single seeing is interesting by itself but still the power comes from time series and this is the matter of of especially with polar orbiting satellites if they are running operational mold means that they continuously generate data some also only on demand systems that depends about Sentinel is 1 which is continuously scanning and in this regard we have in In case for these optical data that basically all 3 day days we get the coverage with new sentence coming up Lexington meant to be which will be launched next year we get of course the more intensified time series and additionally we are also able to integrate lots of data for example so at the same time the Lancet it is operated with the lower resolution also less spectral resolution but still if you want to walk the creator of a longer time series with more than state of values that depends on your application but for example for and DVI this would be quite interesting that's a vegetation index telling you something about the logic of state of the vegetation then you can use these data so what we're looking at is the time series itself but if you have a longer time series ideally mighty annually your able to all derive not only trees but also to identify anomalies is there something different than usual in this then usually comes naturally from the mighty in your time series in this case the or it could be different from space so this is some example here this our evil irrigation future this this 1 is completely different compared to other so that was the crop may be different but about didn't invest investigate further in this case but they find this example interesting and I this is something where you want to look at with with time series natural and here we have the advantage that we have 10 meter resolution pixels in infrared and this is giving you a quite some potential for this kind of analysis so from this I

19:24

want to draw some conclusions

19:26

the conclusions for us because it's still hold their that image processing and remote sensing is completely different from GIS for us no such thing it was never like that especially also in the grass software you the full range from roster to vector to image processing image processing is sometimes as in this image processing class but still it is data like any arrested data and you can go back and forth is unique and so the that we can we used to wear heavy users of G does of course and so this is something quite essentially because the data are delivered in Japan 2000 formant which you need to decompress if you want to do something with those data graph comes with its own data format for good reason because it is well 1 of the few formants which is really a virtual I would say everything what does it mean you can store integer integer data can still floating point it different precision but you can also have collected for example due to it's not able to store a color table easily with floating point data because you can do it and so on so the reason is that we want to have it in grass itself the dual this importing what I mentioned before you can also kind of void in the sense that if the if the prediction is the same as you don't want to read project on the fly then you can just reduce the dataset and it will be visible to process it if it is what if it was important that political and external their external so like this would take the duty of audit because thousand whatever it is and then you say OK register it with within grasses and grasses OK find new dataset and just goes on and the the subsequent calculations would then be in uh grass format itself but there's also the art of external about command where can say OK instead of saving in grass format I just write it out right away in another us the format and then we have all the formants available through the door which I don't know how many 150 maybe today a lot at least I don't know to many probably if I may say so but anyway so you can just use it to reduce the data sets to your staff and right all right away in your preferred format this is also possible whatever is relevant here the memory management is extremely relevant when it comes to such amount of data and process being optimized and some people are always joking about progress already 33 years old what is that the advantages the following in the old days I mean it is maybe that's somebody's but anyway for this kind of software probably yes the resources were really limited so grass has always been an implemented in extremely efficient way to deal with data and nowadays winter times of Big Data really pays otherwise if you waste a lot of memory for nothing just because of lousy implementation not nowadays you would face a problem and so in the end the long history this back in the positive sense and finally the REST API and this is something which will be published in the near future 1 so we believe it is working fine but and this is of course offering a huge potential for this kind of analysis because and you don't want to process time-series manually definitely not and that's why these job management systems are quite relevant and effective and if you then can connect to them through a protocol that's even nicer because then this software interchange will be way of sulfur in data analysis exchange would be much

23:17

more so with that I would like to conclude thank you for your

23:21

attention and let the only thing we need for questions that we have question the I think the last 4 conditions but is not a question more general comment about sent to call not networking and the use of the I know that as he sets is still a work in progress I think there would be a new version and I think also I'm not sure whether it's not but there are plans to deliver also do so fast that victims for something to as an official product on the users he had so I think it will facilitate also the possibility for users to get Diamond he the this is just the victim's brother that's it thank you for this comment I also that is remorse and of course I'm quite interested because we are looking we we want to do analysis we are not so keen on writing this atmospheric correction 1st anything which is often then would be really definitely 1 great and he was a Christian pursuant to a question right now this is really can know what and so on so you mentioned what is that STD I'm doing this the have users descended processing instructions to who processing using this down but also any European launch also known grass commands but you can for example the atmospheric correction which was started to come to my question is have you ever considered using this event processing service OGC W-VSM introduces this and because it's so supposing point was that this this and so what do you have what you did you understand point in the course we would like wanted to our forest API and the WPS is already there there's WPS there's there's a wall and so forth there was no need to all implemented and there was interest also tool provides the rest support and that's why it exists so used to WZW is a new architecture In this architecture but the other projects may bring other interests and so it is it is with the open source software you have choice and this is quite nice and also a choice of protocols and that's why we thought to complete the picture and have also arrested API that is a system that of that would be available so so we need to finish things but then there also temporal idea Brian included so there's quite some time series support available there but coming soon thank you don't thank you Jesus of the business model is this you have an external something with the charge for using the personal and societal held that good point so obviously we cannot provide free computation to everybody on in an unlimited way there would be something like uh a model data often you can process some scenes and then you would be charged for other ones especially when it comes to time processing so if you ever solid a bill HPC center you will need to refund this by the user it's unavoidable last question last it was a proposal that this condition is sufficient model this is made in Europe is a database classification that is not of course I mean the data we are processing here you see this 1 this is in Argentina

27:07

uh nice to see but we also want to do something this nice to see by the way we have with all the others are down there they're all the the fine print but if you want to go for analysis like change land use change tracking you need to the classification in order to know what the lenders actually is and then practice over time so this is why the image classification for example using that algorithm is of

27:34

great interest to us and since these are really huge data sets it will also tell us something about before that and how how classes model to give you an answer or how fast is just images this is the question for the other speaker because he is the the author and uh we did not and use that gets off so far we're doing the processing thank you example I will tell you something about the you can was about to say if you have the means of conference and we have to stop you can switch to a little