Raleigh

FOSS4G NA is a collaborative event by LocationTech and OSGeo

May 2-5, 2016

You are here

Big Data Day

TileReduce is an open source JavaScript library for writing highly scalable data analysis scripts using Mapbox Vector Tile data. Using parallel computing across tiles, TileReduce allows for extremely high performance geospatial processing with minimal complexity. We will go over how TileReduce works under the hood, and discuss how we use TileReduce at Mapbox to crunch massive datasets to improve OpenStreetMap.

LocationTech is a working group inside of Eclipse Foundation that a set of 4 open source projects dealing with large geospatial datasets call home: GeoTrellis, GeoWave, GeoMesa, and GeoJinni (sense a pattern?).

These projects were created to solve the type of problems that we are seeing more and more of: how do we ask very large geospatial data questions concerning location? In this talk, I will give an introduction to those four projects, and talk about how each project approaches processing geospatial data at scale.

We present a panoply of examples where empirical mining and statistical analysis of large data sets have already proven useful to help handle vexing problems within the realm of large-scale forest ecology. Some prejudices may exist against empirical approaches, in favor of more process-oriented analytical methods. Because a full understanding and appreciation of particular ecological phenomena are possible only after process-driven, hypothesis-directed research, some forest ecologists may feel that purely empirical data harvesting may represent a less-than-satisfactory approach.

GeoMesa is a LocationTech project which builds on open-source, distributed databases like Accumulo, HBase, and Cassandra to scale up indexing, querying, and analyzing billions of spatio-temporal data. This presentation will discuss the state of the GeoMesa project with a focus on two other LocationTech efforts: SFCurve and GeoBench. Distributed geo-databases built on key-value stores leverage space-filling curves for indexing, and SFCurve is a library aimed at providing implementations of the basic curves and a playground for further research.

Planet Labs provides an Open Data sandbox of satellite data from multiple providers, featuring multi-band imagery. OpenLayers 3 introduced the concept of raster sources, which provide a fast way to access and process raster imagery in a browser.

In this talk we'll start from the basics of displaying tiled, multi-band satellite data in a web map, and then walk through examples and live coding demos ranging from change detection algorithms to NDVI calculations, image classification, and more!

GeoWave is an open source project that bridges the gap between geospatial software and distributed compute systems. This presentation will primarily focus on the theory that enables the core functionality of GeoWave.

High resolution and accurate Digital Elevation Model (DEM) generation from satellite imagery is a challenging problem. In this work, a multiview stereo reconstruction framework is proposed that is applicable to non-stereoscopic satellite images which may have been captured by different satellites. Given a cross-platform satellite image archive, the images are first geolocation corrected with respect to each other using a fully-automated processing pipeline that applies sparse feature matching and bias correction of Rational Polynomial Coefficient (RPC) camera models.

During the past year we undertook an effort to transition a geospatial data processing system developed under a government research project from a small private cluster running Windows to an AWS-based environment running Linux. This talk will cover a variety of subjects including the steps taken to migrate the system, hurdles encountered with Linux and AWS (particularly differences in performance), and how we ended up working around those issues.

GeoWave is an open source project that bridges the gap between geospatial software and distributed computing systems. This presentation will primarily focus on applications of GeoWave. In this presentation we will illustrate how GeoWave can visualize massive datasets and further enables distributed processing to perform analysis at scale.

We will guide the audience through several examples of utilizing GeoWave to efficiently visualize and analyze very large datasets. Examples include:

One of the growing number of public data sets available on Amazon Web Services is the Next Generation Weather Radar (NEXRAD). This is data collected from a network of 160 high-resolution Doppler radar sites that detect precipitation and atmospheric movement at approximately 5 minute intervals from each radar site. NEXRAD enables severe storm prediction and is used by researchers and commercial enterprises to study and address the impact of weather across multiple sectors.