is about point clouds as envisaged we gave up and pattern I'm going to talk about indexing very large

00:22

datasets point cloud data sets for for infrastructure and the focus mostly on the reorganizing portion and the indexing and I to some the infrastructure and future work toward the end so the initial problem that drove all the time about today was can we but I was light and the web browser the and a little bit about the dataset it's 37 thousand files

00:50

so this is a statewide letter collection and about 170 billion points uncompressed it's formed after about 2 so compressed it's about four hundred gigabytes per good compression it's all of and the data is organized as a full resolution titles this is actually the data yes I was very flat and but so these are the these are the the point clouds colored by which file is which and some problems with this layout is that it's difficult to access especially in the ways that clients want to access the data on for such a large dataset with titles there's really no way to get something like a low resolution overview of there's no hierarchical structure level of detail everything is for us all the time it's difficult to manage because it is thousands and thousands of files so it's difficult to treat the set logically as a single unit which is how we would like to think about it so we would like some sort of abstraction that where we can deal with this dataset as a contiguous unit rather than constantly being reminded that we're dealing with the tiles that make up the set and and as far as visualization unless you reorganize the data somehow you can really only do it in pieces you can get you can download the styles then maybe you can sparsify them so that way but you still have to virtual tiles from somewhere to do anything so what we really want is a point cloud service to do all these things I just said so we wanna provide hierarchical access to the data so different sitting at different zoom levels of buying a dynamic resolution and we won't be able to go anywhere very quickly and we want to be flexible so we did we aim for visualization 1st but it's not visualization only so the visualization 1st of mind set gets us into a quick response we have to be able to respond quickly queries on the order of milliseconds and that drives that drove our solution little bits will talk about that a little bit later in the service aspect is that we want it would deal with it as a single unit so that's why I wanna create some sort of service to abstract whatever organization come up with so we need to reorganize the data since they are obviously the uncompressed foreign have terrabytes letter dated to through we'd rather

03:15

not touch it if we could build some sort of meta index on top of that I mean there's all sorts of those you can keep the bar and the tiles that they fit nicely so that you can keep the bounds you can go to the bounds that you're interested in and select the relevant files but you can't do level of detail queries and again you're always you always need to consider the fact that you're dealing with tiles when you want get to the data so some constraints on memory so so you can you can keep

03:45

4 terabytes in RAM you probably can't keep 4 terabytes in RAM and what I'm sure you could provision lots of EBS volumes but they didn't really wanna go there and so we want to 1 important constraints to put on the solutions we wanted to be lossless that means a couple things that we don't wanna throw away any points and a lot of and especially visualization warranted indexes will do that because you don't necessarily need very fine resolution points when it's good enough to stop at some point at some fixed resolution we don't want to do that because we don't want to require you to keep the original data around then have to refer back to that full detail so we wanted to theoretically be able to reconstruct the original set from whatever we do in this transformation and we want to be able to add data that we don't wanna finalization steps because what these big sets my comment pieces at time 1 we will add to it and of course as I said visualization while we're focused on a fast response time so we want things on the order of milliseconds some assumptions we do things the cloud with so we assume that we had a scalable cloud computing duty to the at Google there's all sorts of things you can use but we assume that we could use that and and the 3rd 1 the distributed file system the main point there is that we're assuming that the I O can scale with the processing so if we scale up to we want to be scale across multiple machines but if there's some bottleneck in the I O that's not going to help that's not going be feasible and we assume that the problem is parallelizable and obviously most problems are parallel but practically for example if those files were titles they were just files of randomly spread points throughout the entire state that would make it more difficult to paralyze widely tiles are very common and they're good they're quite natural for sterilization so what we came up with is that we want active mom massive octree and there are a few reasons for that and or why this is what we want and it's not necessarily that we need the traditional octrees that you see and see us when 1 is that we want something that acts as a actually and the the the acting part is what it's when talk a little bit about implementation so 1 reason that we want something that acts like an octree is because there's a natural mechanism for visualization since here these are increased titles of increasing depths of an octree layered on top of each other so if you go down adapt you increase the resolution so there's a natural and general to discuss telling schemes since

06:24

very natural from another thing about trees as opposed to other types of trees is that the spatially distinct this is a quot an example of how a quadtree splits which is the 2 dimensional grid and you just looking at it it just strange parallelisation especially when you consider the tile datasets that will go into it is that you can naturally you can split things geospatial and work with confined sets and maybe merge and later or or what we do is we just merge some metadata semantics of text data again period by contrast to the last is a kd tree where the order of insertions drives how those things and being shipped up and reasons we don't want that are we want to avoid expensive things like rebalancing the tree of we walk and octrees very stable that that the category structure you see that naturally maps to a level of detail is a a natural always this doesn't matter so well and in general most other trees don't necessarily do the things we want to do all that they have many other benefits so there is an archery that's what you'd see an introductory is that there is a node at the top it represents the whole bounds the next bonds have up to a child nodes bisected and quadrants rock transfer octets for non-tree quadrant for quadtree so to make this structured it would take care 10 terrabytes and actually 12 13 just hold pointers for the amount of data that we're dealing with what I and that and there is a mention this is that any any ways we introduce the algorithm has really large effects of the scale so we we're were specifically targeting these hundreds of billions point sets and this is I mean this is not even including the data so obviously we wanna keep we wanna keep things stand wherever we can Tom so the 1st thing we do is linear as the tree and many of of you are probably familiar with a-z A-Z ordered curve you see a space-filling curve down at the bottom and for for a fixed depth of quad-tree it lets you map out those nodes into a one-dimensional structure that makes a lot easier to work with the countries pointers you don't that that basically removes the edges from that from the graph of the algorithm so typically that's done per depth so you go to a fixed resolution depth and you turn it into that nice grid what we wanted to do something different because we wanted a global global linearisation cannot the typically the way you do that is you see these multiple that fears that used you do 0 1 2 3 and then you do it again at the stub starting 0 and you end up with some nice encoding structures so you can call these as strings and you can figure out you can figure out things about the depth but the problem is 0 is not equal to 0 0 and that structure so you can't do math across levels so these encodings not numbers and what we wanted is numbers because you can do math and numbers and that that are common place on talk about the query later so what we do is what we wanna do is we want to attach all these lines together as 1 and could treated as a single Google infinite array so 1st for assigned the number and so this is just 1 mention for ordering of assigning number and so we haven't done anything really yeah and this would be an example of how newer traversed the this theoretically infinite array of mapping the single dimension to 2 dimensions of this the quadtree so we're talking to dimensions here on so you can see there these errors down at the bottom and that's the key thing that these the system consider the edges of the graph and what we do we wanna do is remove that from a data structures and the obvious question now is OK well how do I figure out what the arrow is how to figure out where to go next up and it turns out to be quite simple so with a bit shift into integer additions you can figure out the next spot the trees so if you're checking a position of that array and there's already a point there you can figure out what to check next very quickly some properties of this of using this method so here you can see the layered method so what we're doing is expanding that out into a single single 1 dimensional structure and that's great because you get a lot of concurrency benefits are operating in a single positions in a long data structure is very easy to throw more threads of for example and whereas structures like this are not quite as easy some it simplifies the data structures so all you ever have to do is look at a single point in that 1 data structure on the next slide I'll show you something point spacing and a lot of things you can calculate from the structure so you can it goes back to the millisecond response time will show you some things that you can calculate and make that happen so here's an example of a point spacing hopefully you can kind of see it so there's on the left we just use the 1st point that goes into a node and the right we when

11:46

a new point comes in instead of just using is the point here already we swap them of 1 is close to the center so and the reason I say this because a lot of a lot of other structures trying to mimic an archery will use buckets of points and makes it very difficult to do something that needs a single point resolution whereas are linearization we can do that trivially sonorant through this really

12:09

quickly it's just an example of what can you do with this

12:12

method have been talking about so for example how query this style it's under summed up and you can see that from the global bound you can buy biceps twice so depth to you get you can figure out how you can figure out those bounds that the depth at the nominal data which is like just how many times you have to buy sector get to the query and that showed you earlier you end up with 17 the number doesn't really matter but the corporate is that you can do you can trivially calculates all of this at each step of the hierarchy where we saw earlier that pyramid so we're querying as 16 of the square in the pyramid and you can by continuing to apply that says that simple stop formula to the numbers you can get the span of numbers at each step that are a match the query you can also get the maximum points that will come back and the cool thing about that is if you split the if useful so obviously we don't have infinite arrays unfortunately so we have to so we have to split up some points and if you do that splitting intelligently and you use these ideas in your key storage system you can respond to a query of give me that title that we saw on the last slide at depth 10 by using a range query so selects all ideas that match this range and you magically get only the points that matches the query so then that's 1 example of the things you can do so I mentioned we don't have infinite so we have to split stuff a couple things we learned about that that's so it's trivial about the chunk spatially by using special points of special points along the edge of the 0 curve but 1 thing that we learned is whenever you when you're doing chunking for actually you're doing the splits traditionally split you bisects the choir at every level or the ACT interactants every level and at some point that they start to get really spas and you know what you end up with the news giant datasets is that you get hundreds of millions of 1 a 2 . chunks if you keep splitting so if you can but if you can guess the number of points within a factor of 4 16 and you start putting there that's a huge optimization because you get an order of magnitude less fewer files and was better compression so we have 1 to the other tricks that want to important ones that we ran a lot of data so it's very heuristically tuned well because couple custom libraries for memory pooling but the important thing is the results so I'm going to talk about so what you get from this so it didn't work assignment of about a bigger set then I was further results so here is the the Netherlands that the agent to data 41 thousand files 640 billion points about 4 times larger than that 1 1 and a half terabyte and this and about almost a the decompressor and here is a shot of the indexing rate so we got up to 74 billion points for our so it took us about some stats for if we use C 2 and S 3 and we read reproductive the data using 26 28 of Amazon big instances and we're done in about 9 and a half hours so each of those instances does about 2 . 6 billion . 4 our so and that the the scaling that is very linear so if your machine is 15 if your desktop is 15 core course 15 course 30 days which would be pretty good but you could you could expect to get about half of that case on a desktop machine so you can go you can that's just how quickly you know it so so what can you do with what what you get to have home and play

16:08

this video so this is the

16:11

dataset I was just talking about services obviously we're starting zoomed in very far so you can see the lines of the

16:18

laser scanner so clearly video so this is being so

16:25

those those points don't have colors the xyz only so these colors right now are being but from are being updated

16:34

on in real-time by this random so because we're a projected whether cater and this is

16:39

why we adapted to advocate the renderer stretching the point of titles and map styles and then overlaying the muppets tiles on the points so try to keep that was a famous museum was started out and I can't really see it more than this to give you an idea of the magnitude of the amount of data we have here and the the different color there is because

17:03

map box coloring is inconsistent across all resolutions and now a family completely send out so that's 640 billion points in your browser and so through the so that so that's an example of what you can do with this so we so you what you end up with is a bunch of files In the structure and so we built an abstraction on top of that so we mention point cloud service to both simple ground which is a simple HTTP server and these are the kinds of queries you can make so you can say good music in metadata for for forward this point clouds of for example training and things like number points what I what point outreach it have deserve RGB-D what work on system is it and you can make these real queries and this is 1 of the under you were just looking at is doing so it's asking for a abounds adapt and that's pretty much it so it's a simple type talent system that the client control so that the project that does this building is called entwined there's just an example of how you run it you given an input that could be a directory or you could just go to file you given output you can read projects and you can set the threads and a few other things that going show on so some related status and the related work so entwined is up there the renderer I just show the video of the it's the dynamic version of plans you know some you may be familiar with players here where you can view a lesson analogy falls and browser so the dynamic version of that is basically which is what I was just showing you some and find uses PDL for 2 point data so it doesn't care about it doesn't care and doesn't know about the types of all the types point clouds but for free we can read them all and write them all because that's abstracted away which you just want to talk about there is also they so maybe it may be familiar with poetry and so and that the codes printed in Paris humans that we a fork of poetry that communicates with ground so when we build 1 of these indexes on these datasets you can view it in poetry grounds and that's all have so thank you all the emotions and you

19:43

much common 1 in questions tensions you actually need cloud computing to to build an index to you can just from this on your desktop if you want to write that as the example of the seal the invocation show you knew the questions the truth that the presentation was occasionally uses so about 10 minutes to relax to see you wanted the Fourier analysis of

Inhaltliche Metadaten

Massive point cloud data collections present unique processing challenges. The first roadblock in providing large scale point cloud web services is data organization. We will describe the techniques used in the open source Entwine library for indexing and organizing province-scale LiDAR collections for delivery as streaming web services. We will describe how the software organizes point cloud data, demonstrate its use in providing web service infrastructure, and discuss future integration possibilities of Entwine with other point cloud softwares.