tag:blogger.com,1999:blog-35486944691011293092016-09-07T21:21:04.455-07:00Geospatial ElucubrationsA blog on spatial analysis, raster management in PostGIS and other related things.Pierre Racinehttp://www.blogger.com/profile/17199173181807664396noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-3548694469101129309.post-35803262532867311152014-05-13T07:47:00.001-07:002014-05-14T05:42:30.698-07:00A guide to the rasterization of vector coverages in PostGISA frequent question since the beginning of the PostGIS Raster adventure has been&nbsp;"<a href="http://gis.stackexchange.com/questions/30810/rasterize-a-vector-layer-in-postgis" target="_blank">How can I convert my geometry table to a raster coverage</a>?"&nbsp;(<a href="http://lists.osgeo.org/pipermail/postgis-users/2014-April/039021.html" target="_blank">here also</a>). People often <a href="http://www.packtpub.com/postgis-to-store-organize-manipulate-analyze-spatial-data-cookbook/book#chapter_5" target="_blank">describe the way to rasterize a single geometry using ST_AsRaster()</a> but I don’t know any text or tutorial explaining how to convert a complete vector coverage to a single raster or to a tiled raster coverage.<br /><br />In this article I will describe two methods to rasterize a vector coverage: one very fast but not very flexible and one very flexible but unfortunately a bit slow! I will rasterize a vector forest cover so that it aligns perfectly with an already loaded raster elevation layer. I will use the "height" column of the forest cover to assign values to pixels. I will assume that both layers are in the same SRID and that the forest only partially&nbsp;covers&nbsp;the elevation layer. The elevation layer is divided in 625 tiles of 100x100 pixels for a total of 6 250 000 pixels.<br /><br /><br />"<b>Intelligent</b>"<b>&nbsp;vector VS&nbsp;</b>"<b>stupid</b>"<b>&nbsp;raster!</b><br /><br />Why would one go from a human parsed, precise, object oriented,&nbsp;"intelligent"&nbsp;vector table to an&nbsp;unparsed, jagged, single variable,&nbsp;"stupid"&nbsp;raster layer? Over the years I could identify only two good&nbsp;reasons to do so. The first is to get all your data layers into <a href="http://inspire-forum.jrc.ec.europa.eu/pg/pages/view/1814/geographical-grid-systems" target="_blank">a single analysis paradigm</a> (in this case the raster one) to simplify the geoprocessing chain. The second is because this "stupid" raster&nbsp;paradigm is still also the fastest when it’s time to analyse large volume of data. Processing a raster stack relies on very simple matrix operations generally available through what geo people call map algebra operations. This is still generally faster than the linear algebra necessary to analyse vector coverages. Anyway, our task here is not to decide whether doing it in vector mode is better than in raster mode. Our task is just to get the conversion done properly…<br /><br /><br />"<b>Rasterizing</b>"<b>&nbsp;vs&nbsp;</b>"<b>rendering</b>"<b>…</b><br /><br />Another interesting matter is the difference between rasterizing and rendering. Rendering is the conversion of a vector model to an image for display purpose. Rasterizing is the conversion of a vector model to a raster for data analysis purpose. Renderers, like MapServer, GeoServer and Mapnik, besides blending many layers together, symbolizing geometries and labeling them, will generally produce anti-aliasing along the edge of polygons and translate values into symbolic colors. Any original data value is lost and there is no way to use the product of a renderer for GIS analysis. On the other hand, a rasterizer function, like the ones provided by GDAL, QGIS, ArcGIS or PostGIS, will produce raw pixels trying to represent the original vector surface as crisply and accurately as possible, assigning to each pixel one of the value (and only those values) associated with the original geometry. The raster resulting from a rasterization process is a valuable input into a geoprocessing analysis chain. The result of a rendering process is not. We must also say that although there are some methods to rasterize a vector coverage in PostGIS, there are still no ways to render one (unless you consider <a href="http://postgis.net/docs/RT_ST_AsJPEG.html" target="_blank">ST_AsJPEG()</a>, <a href="http://postgis.net/docs/RT_ST_AsTIFF.html" target="_blank">ST_AsTIFF()</a> and <a href="http://postgis.net/docs/RT_ST_AsPNG.html" target="_blank">ST_AsPNG()</a> as very basic renderers).<br /><br /><b><br /></b><b>Method 1.1 – Basic ST_Union() and ST_Tile()</b><br /><br />The first rasterization method I present is the one that was planned from the beginning of the PostGIS Raster project: simply convert all the geometries to small rasters and union them into a single raster like we union many geometries together using ST_Union(geometry). A complete query, aligning the resulting raster on an existing elevation raster coverage, should looks like this:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">CREATE TABLE forestheight_rast AS<br />SELECT ST_Union(ST_AsRaster(geom, rast, '32BF', height, -9999)) rast<br />FROM forestcover, (SELECT rast FROM elevation LIMIT 1) rast;</pre><br /><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody><tr><td style="text-align: center;"><a href="http://1.bp.blogspot.com/-kzQqGIcPgD4/U3ET8DgC2mI/AAAAAAAADIo/b1XE5AY9PkY/s1600/rastforestcover-closeup.gif" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="http://1.bp.blogspot.com/-kzQqGIcPgD4/U3ET8DgC2mI/AAAAAAAADIo/b1XE5AY9PkY/s1600/rastforestcover-closeup.gif" height="143" width="200" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Close-up on the unioned rasterized<br />version&nbsp;of the forest cover resulting<br />in "jagged" areas of same height.</td></tr></tbody></table>This results is a unique "big"&nbsp;raster, aggregating all the&nbsp;"small"&nbsp;rasterizations of the geometries.<br /><br />Note the arguments to ST_AsRaster(). The first one is the geometry to rasterize. The second one is the reference raster from which the alignment parameters are borrowed (a pixel corner x and y, the scale (or pixel size) and the skew (or rotation)). The third argument is the pixel type of the expected raster (32 bit float). The fourth parameter is the value to assign to the pixel. Here is it the 'height' column from the forest cover table. The last parameter ('-9999') is the nodata value to assign to pixels padding the area surrounding each rasterized geometry. This is the very <a href="http://postgis.net/docs/RT_ST_AsRaster.html" target="_blank">first variant of the ST_AsRaster() function in the PostGIS Raster reference</a>.<br /><br />Like <a href="http://postgis.net/docs/manual-2.1/ST_Union.html" target="_blank">its geometric counterpart</a>,&nbsp;<a href="http://postgis.net/docs/RT_ST_Union.html" target="_blank">ST_Union()</a> simply aggregate (merge) all those small rasters together into a unique raster. Without ST_Union(), the query would have resulted in 2755 small rasters. One for each geometry.<br /><br />Note also that we picked only one raster from the elevation layer as the reference raster to ST_AsRaster(). This is because all the rasters in this table are well aligned together and we need only one set of pixel corners coordinates to align everything. We could also have put this query selecting a single raster directly into the SELECT part:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">CREATE TABLE forestheight_rast AS<br />SELECT ST_Union(ST_AsRaster(geom, (SELECT rast FROM elevation LIMIT 1), '32BF', height, -9999)) rast<br />FROM forestcover;</pre><br />We normally don’t want to keep big rasters into PostGIS so we can split the merged raster into smaller tiles having the same dimensions as the elevation coverage with the <a href="http://postgis.net/docs/RT_ST_Tile.html" target="_blank">ST_Tile() function</a>:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">CREATE TABLE tiled_forestheight_rast AS<br />SELECT ST_Tile(rast, 100, 100) rast<br />FROM forestheight_rast;</pre><br /><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"><tbody><tr><td style="text-align: center;"><a href="http://4.bp.blogspot.com/-xMCBVFX9XZQ/U3EVOi-PXWI/AAAAAAAADIw/cvyq292B9ss/s1600/different-tiling.gif" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="http://4.bp.blogspot.com/-xMCBVFX9XZQ/U3EVOi-PXWI/AAAAAAAADIw/cvyq292B9ss/s1600/different-tiling.gif" height="320" width="292" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">The tiling&nbsp;of the rasterized forest cover, in red, is not well <br />aligned with the tiling of the&nbsp;the elevation raster, in grey.<br />Note that the squares are 100x100 pixels tiles, not pixels...</td></tr></tbody></table>This results in 48 tiles, most being 100x100 pixels. Some are only 18x100 pixels and some are only 100x46 pixels. The lower right one is only 18x46 pixels...<br /><br />There are 625 tiles of 100x100 pixels in the elevation coverage and 2755 geometries in the forest cover. These geometries cover only 31 of those tiles and the resulting big raster covers 56 of them. The query for creating the&nbsp;"big"&nbsp;raster takes 5 seconds (thanks to Bborie Park who speeded up ST_Union() by a&nbsp;huge factor in PostGIS 2.1) and the query to resample/retile it takes only 2 seconds.<br /><br />One problem with this result is that even though the pixels are well aligned with the elevation layer and most of the tiles have the same dimensions, the tiles themselves are not aligned with the elevation coverage and they do not cover the same area. Method 1.2 will fix that.<br /><br /><b><br /></b><b>Method 1.2 – ST_Union() and ST_MapAlgebra()</b><br /><br />In order to produce a raster coverage having a footprint identical to the elevation coverage we must use this coverage not only for alignment but also as the base for our final tiles. The trick is to&nbsp;"burn"&nbsp;the big raster into&nbsp;empty tiles based on each of the elevation tiles. For that we will use a quite simple (!) two rasters ST_MapAlgebra() call:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">CREATE TABLE welltiled_forestheight_rast AS<br />SELECT ST_MapAlgebra(fc.rast, ST_AddBand(ST_MakeEmptyRaster(e.rast), '32BF'::text, -9999, -9999), '[rast1]', '32BF', 'SECOND') rast<br />FROM forestheight_rast fc, elevation e;</pre><br /><a href="http://postgis.net/docs/RT_ST_MapAlgebra_expr.html" target="_blank">The two rasters ST_MapAlgebra()</a>&nbsp;function overlays the two rasters and compute an expression for each aligned set of two pixels. The result of this expression becomes the value assigned to each pixel of a new raster.<br /><br />As the two rasters are not necessarily well aligned, the new raster extent can be equal to 1) the extent of the union of both raster, 2) the extent of the intersecting area 3) the extent of the first raster or 4) the extent of the second raster. Here, no two rasters involved in the map algebra have the same extent. Since we want to "burn" the big raster into tiles having the same extent as the elevation tiles, which are second in the list of arguments, we have to specify that the new raster will have the same extent as the "SECOND" raster.<br /><br />The first argument is the&nbsp;"big"&nbsp;forest height raster. It is the only row of the forestheight_rast table.<br /><br />The second argument is a new tile generated with ST_MakeEmptyRaster() and ST_AddBand() from each of the 626 elevation tiles. We could have used the original tiles directly, but since we are specifying the resulting extent to be "SECOND" and that the nodata value of the&nbsp;result from ST_MapAlgebra() is picked from the "FIRST" or the "SECOND" raster when those values are specified, then the resulting raster would have had 32767 as nodata value. This is the nodata value of the elevation raster coverage. We want the nodata value of the result to be the same as the one of the forest height coverage, which is -9999. So the trick is to&nbsp;"build" a new tile&nbsp;from scratch with ST_MakeEmptyRaster() with the original elevation tiles as alignment reference and add them a band with ST_AddBand() specifying a pixel type ('32BF'), an initial value ('-9999') and a nodata value ('-9999') identical to the&nbsp;forest height raster.<br /><br />The third argument is the expression computing the value to assign to each pixel. Pixel values from the first raster are refered as "[rast1]" and pixel values from the second raster are referred as "[rast1]". You can also reference the x coordinate of the pixel with "[rast.x]" and the y coordinate with "[rast.y]". The expression is <a href="http://www.postgresql.org/docs/9.2/interactive/functions.html" target="_blank">any normal PostgreSQL expression</a> like for example "([rast1] + [rast2]) / 2".<br /><br />In our case the expression is simply the value of the first raster, the forest height one, that we want to "burn" into the tile. It does not involve values from the elevation raster. In other word the elevation tiles are merely used as well aligned blank pieces of paper on which we "print" the value of the forest height raster...<br /><br />The fourth argument is the pixel type ('32BF') expected for the resulting raster. It is the same as the forest height raster.<br /><br />The last argument ('SECOND') tells ST_MapAlgebra() to use the extent of the second raster (the elevation tiles) to build the resulting raster.<br /><br />ST_MapAlgebra() will hence copy each value from the forest height raster to new empty tiles which dimensions are based on the elevation ones. The query results in 625 tiles. When a tile does not intersect the area covered by the forest cover, all its pixel are simply set to nodata.<br /><br />If your big raster does not cover much of the reference raster coverage, you might speed up the process by adding a spatial index to the reference coverage (it should already be there) and restraining the ST_MapAlgebra() computation to tiles intersecting with the vector coverage:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">CREATE INDEX elevation_rast_gist ON elevation USING gist (st_convexhull(rast));<br /><br />CREATE TABLE welltiled_forestcover_rast AS<br />SELECT ST_MapAlgebra(fc.rast, ST_AddBand(ST_MakeEmptyRaster(e.rast), '32BF'::text, -9999, -9999), '[rast1]', '32BF', 'SECOND') rast<br />FROM forestcover_rast fc, elevation e<br />WHERE ST_Intersects(fc.rast, e.rast);</pre><br />Keep in mind however that this query is faster because it does not process and hence does not return tiles not touching the forest cover area. If you want to produce a complete set of tile, identical in number and extent to the elevation coverage, keep with the previous query.<br /><br /><b><br /></b><b>Some drawbacks to the methods using ST_AsRaster() and ST_Union()</b><br /><br />All those queries involving ST_AsRaster() and ST_Union() are quite fast but they present two drawbacks:<br /><ul><li><b>ST_Union limited by RAM - </b>Any query involving ST_Union() is limited by the RAM available to PostgreSQL. If the big raster resulting from the union of all the rasterized geometries is bigger than the available RAM, the query will fail. The trick to avoid this is to aggregate the small rasters not into a huge unique raster, but into smaller groups of rasters intersecting with tiles from the reference raster. Method 1.3 will show how to do that.</li></ul><ul><li><b>Only one extractable metric - </b>A second limitation is that ST_AsRaster() has only one method to determine if the geometry value must be assigned to the intersecting pixel or not. When the geometry is a polygon, for example,&nbsp;ST_AsRaster()&nbsp;assigns the value of the geometry to the pixel if the center of the pixel is inside&nbsp;this polygon whatever what proportion of this polygon covers the pixel area. In the worst case the value of a very small polygon encircling the center of the pixel would be assigned when all the rest of the pixel area would be covered by a polygon with a different value. You can think of many cases where the value at the center of the pixel is not necessarily representative of the area covered by the pixel.<br /><br />This is because ST_AsRaster() rasterize only one geometry at a time without consideration for other geometries intersecting with the pixel. If you consider all the geometries of a layer when assigning a value, you might want to assign other metrics to the pixels like the count of geometry intersecting with the pixel, the total length of all the polylines intersecting with the pixel, the value associated with the smallest polygon intersecting with the pixel, etc...<br /><br />Method 2 will show how to use the PostGIS Addons ST_ExtractToRaster() function to extract some of those metrics from a whole vector coverage and how to easily implement new ones.</li></ul><ul></ul><br /><b>Method 1.3 – Memory efficient ST_Union() and ST_MapAlgebra()</b><br /><br />This method is RAM safe in that it never produces a&nbsp;"big"&nbsp;raster bigger than the area covered by&nbsp;one tile from the reference raster. It is a little bit slower than the first query of method 1.2 because it union some rasters many times when they touches many tiles. It also produces 625 tiles like the elevation coverage.<br /><br /><pre style="background: #EEEEEE; padding: 5px;">CREATE TABLE ramsafe_welltiled_forestcover_rast AS<br />WITH forestrast AS (<br /> SELECT rid, ST_MapAlgebra(<br /> ST_Union(ST_AsRaster(geom, rast, '32BF', height, -9999)),<br /> ST_AddBand(ST_MakeEmptyRaster(rast), '32BF'::text, -9999, -9999), <br /> '[rast1]', '32BF', 'SECOND') rast<br /> FROM a_forestcover2, elevation<br /> WHERE ST_Intersects(geom, rast)<br />GROUP BY rid, rast<br />)<br />SELECT a.rid,<br /> CASE<br /> WHEN b.rid IS NULL THEN ST_AddBand(ST_MakeEmptyRaster(a.rast), '32BF'::text, -9999, -9999)<br /> ELSE b.rast<br /> END rast<br />FROM elevation a LEFT OUTER JOIN forestrast b <br />ON a.rid = b.rid;</pre><br />The first query in the WITH statement (forestrast) rasterize the geometries, union them by group intersecting the reference raster tiles and burn each of them to a new tile as seen before. This is done by the addition of a WHERE clause limiting the operation to geometries intersecting with tiles and a GROUP BY clause insuring that ST_Union() only aggregate small rasters&nbsp;"belonging"&nbsp;to the same tile.<br /><br />The WHERE clause however cause the extent to be limited to the area covered by the forest cover. This first part alone would return only 32 tiles. The second part of the query make sure we get the 593 remaining tiles. It uses a LEFT OUTER JOIN to make sure a row is returned for every 625 tiles from the reference raster coverage. When the rid of the reference raster coverage match one of the 32 tiles, these tiles are returned. Otherwise a new empty tile is returned. You might skip this part if your source vector layer covers the full area of the reference raster or if you wish to limit the rasterized area to the one of the vector layer.<br /><br /><b><br /></b><b>Method 2 – PostGIS Add-ons ST_ExtractToRaster()</b><br /><br />As said before, ST_AsRaster() is quite limited in terms of the kind of value it can assign to the pixels. Not so many values can be extracted from a single polygon: the value at the centroid, the proportion of the pixel area covered, the value weighted by this proportion, the length of the intersecting part of a polyline being rasterized... ST_AsRaster() however, because it is based on GDAL, only implements the first one.<br /><br />Furthermore, many more metrics can be imagined if the value assigned to each pixel comes not only from one geometry but from the aggregation of all the geometries (or the values associated with them) of a vector coverage intersecting the centroid or the surface of the pixel. We might, for example, want to compute the number of geometries intersecting with the pixel. We might be interested by the value of the polygon covering the biggest part of the pixel or of the polyline having the longest length inside the pixel are. We might want to merge together geometries having the same value before computing the metric. <a href="http://trac.osgeo.org/postgis/wiki/WKTRaster/SpecificationWorking03#ObjectiveFV.27-IntegratedifferentcustomfunctionstobeusedwithST_MapAlgebraFcttoextractvaluesfromacoverage" target="_blank">I could identify nearly 76 different possible metrics extractable from point, polyline and polygon coverages</a>.<br /><br />We hence need a function that is able to extract metrics from a vector coverage as a whole and which allows us to implement new extraction methods easily. The PostGIS Add-ons ST_ExtractToRaster() function was written with exactly this goal in mind.<br /><br /><a href="http://geospatialelucubrations.blogspot.ca/2013/11/launching-postgis-add-ons-new-postgis.html" target="_blank">The PostGIS Add-ons</a> is a set of about 15 pure PL/pgSQL community written functions helping to write advanced PostGIS queries. One of the most useful of these functions is ST_ExtractToRaster().<br /><br />A query using ST_ExtractToRaster(), returning the same set of tiles&nbsp;as method 1.3 and also not suffering from RAM limitations would look like this:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">CREATE INDEX forestcover_geom_gist ON forestcover USING gist (geom);<br /><br />CREATE TABLE extracttoraster_forestcover AS<br />SELECT ST_ExtractToRaster(<br /> ST_AddBand(ST_MakeEmptyRaster(rast), '32BF'::text, -9999, -9999), <br /> 'public', <br /> 'forestcover', <br /> 'geom', <br /> 'height', <br /> 'MEAN_OF_VALUES_AT_PIXEL_CENTROID') rast<br />FROM elevation;</pre><br />First and very important remark: this query takes around 200 seconds! This is about 16 times more than the equivalent RAM safe query of method 1.3! This is because ST_ExtractToRaster() does much more than ST_AsRaster(). It not only considers the whole coverage, it can compute very different things from this coverage. It is definitely not worth using ST_ExtractToRaster()&nbsp;if what you want&nbsp;is the value of the geometry intersecting the pixel centroid. The interest for this function is the other methods available...<br /><br />Note the last argument to ST_ExtractToRaster(): 'MEAN_OF_VALUES_AT_PIXEL_ CENTROID'. This is the method used to extract a value for the pixel. Fifteen (15) of these methods have been implemented with ST_ExtractToRaster() up to now. They are listed below with a short description.<br /><br />The 'MEAN_OF_VALUES_AT_PIXEL_CENTROID' method returns a raster coverage which values are identical to the raster coverage produced with the previous methods. Most other methods, however, will return quite different rasters…<br /><br />The other arguments to ST_ExtractToRaster() are quite trivial: the first is the raster which values are extracted to. It’s a good practice to pass a new empty raster band having the right pixel type, initial value and nodata value and which alignment is based on tiles from the reference raster. This is identical as what we did in 1.2 and 1.3. The second argument is the schema of the vector table to rasterize ('public'). The third and the fourth are the name of the table ('forestcover') and the name of the column&nbsp;&nbsp;('geom') containing the geometry to rasterize. The fifth argument is the column containing the value to be used in the computation of a new value when it is necessary ('height'). When the computation involve only geometries (e.g. COUNT_OF_VALUES_AT_PIXEL_ CENTROID or SUM_OF_LENGTHS), this argument is not necessary.<br /><br />The query will return a new raster for each raster in the elevation table, in this case 625.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://4.bp.blogspot.com/-UaGp9vmKCOY/U3EWrcOOANI/AAAAAAAADI8/2PPXIq8dWuI/s1600/diff3.GIF" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://4.bp.blogspot.com/-UaGp9vmKCOY/U3EWrcOOANI/AAAAAAAADI8/2PPXIq8dWuI/s1600/diff3.GIF" height="377" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Close-up on four pixel taking different values depending on the method used to assign them a value. The green and blue pixels&nbsp;were extracted using the&nbsp;MEAN_OF_VALUES_AT_PIXEL_CENTROID method. The pixels delimited by the red lines&nbsp;were extracted using the&nbsp;VALUE_OF_MERGED_BIGGEST method. The values displayed are the values associated with the polygons which a delineated in black. Note how the pixels delineated by the red lines take the values of the polygons occupying the biggest proportion of the pixel.</td></tr></tbody></table>Here is a list of methods already implemented for&nbsp;ST_ExtractToRaster(). All methods using the pixel centroid are postfixed with&nbsp;"_AT_PIXEL_CENTROID". All other methods involve the pixel as a rectangular surface.<br /><ul><li><b>COUNT_OF_VALUES_AT_PIXEL_CENTROID: </b>Number of geometries intersecting with the pixel centroid.</li><li><b>MEAN_OF_VALUES_AT_PIXEL_CENTROID:</b> Average of all values intersecting with the pixel centroid.</li><li><b>COUNT_OF_POLYGONS:</b> Number of polygons or multipolygons intersecting with the pixel surface.</li><li><b>COUNT_OF_LINESTRINGS:</b> Number of linestrings or multilinestrings intersecting with the pixel surface.</li><li><b>COUNT_OF_POINTS: </b>Number of points or multipoints intersecting with the pixel surface.</li><li><b>COUNT_OF_GEOMETRIES: </b>Number of geometries (whatever they are) intersecting with the pixel surface.</li><li><b>VALUE_OF_BIGGEST: </b>Value associated with the polygon covering the biggest area intersecting with the pixel surface.</li><li><b>VALUE_OF_MERGED_BIGGEST: </b>Value associated with the polygon covering the biggest area intersecting with the pixel surface. Polygons with identical values are merged before computing the area of the surface.</li><li><b>VALUE_OF_MERGED_SMALLEST: </b>Value associated with the polygon covering the smallest area in the pixel. Polygons with identical values are merged before computing the area of the surface.</li><li><b>MIN_AREA: </b>Area of the geometry occupying the smallest portion of the pixel surface.</li><li><b>SUM_OF_AREAS:</b> Sum of the areas of all polygons intersecting with the pixel surface.</li><li><b>SUM_OF_LENGTHS: </b>Sum of the lengths of all linestrings intersecting with the pixel surface.</li><li><b>PROPORTION_OF_COVERED_AREA: </b>Proportion, between 0.0 and 1.0, of the pixel surface covered by the union of all the polygons intersecting with the pixel surface.</li><li><b>AREA_WEIGHTED_MEAN_OF_VALUES:</b> Mean of all polygon values weighted by the area they occupy relative to the pixel being processed. The weighted sum is divided by the maximum between the intersecting area of the geometry and the sum of all the weighted geometry areas. That means if the pixel being processed is not completely covered by polygons, the value is multiplied by the proportion of the covering area. For example, if a polygon covers only 25% of the pixel surface and there are no other polygons covering the surface, only 25% of the value associated with the covering polygon is assigned to the pixel. This is generally the favorite behavior.</li><li><b>AREA_WEIGHTED_MEAN_OF_VALUES_2:</b> Mean of all polygon values weighted by the area they occupy relative to the pixel being processed. The weighted sum is divided by the sum of all the weighted geometry areas. That means if the pixel being processed is not completely covered by polygons, the full weighted value is assigned to the pixel. For example, even if a polygon covers only 25% of the pixel surface, 100% of its value is assigned to the pixel.</li></ul><b><br /></b><b>Adding methods to ST_ExtractToRaster()</b><br /><br />Want to compute some other esoteric metrics with ST_ExtracToRaster()? <a href="https://github.com/pedrogit/postgisaddons/blob/master/postgis_addons.sql" target="_blank">Internally the function use the callback version of ST_MapAlgebra()</a> which call one of the two predefined callbacks depending on if the value is computed for the pixel centroid or if the value is computed using the full surface of the pixel. Those functions are named ST_ExtractPixelCentroidValue4ma() and ST_ExtractPixelValue4ma(). They fulfil to <a href="http://postgis.net/docs/RT_ST_MapAlgebra.html" target="_blank">the criteria for ST_MapAlgebra() callback functions</a>. They take three arguments defining the value of the input pixel (there can be many), the position of the pixel in the raster and some extra parameters when necessary. In order to build a query for each pixel, both functions expect&nbsp;extra parameters: the alignment parameters of the raster for which values are being computed and the parameters of the geometry column for which values are being extracted. The callbacks then define a series of extraction methods.<br /><br />To add a new method to these callbacks, you can simply copy the query associated with an existing method that extract a value similar to what you want to extract, give it a proper name, modify it to compute the expected value and make sure it uses the callback extra arguments properly. It is a good idea to test the query directly on the vector coverage prior to test it in one of the two callback. Most of the methods do a ST_Intersects() between the shape of the pixel and the vector coverage (this is why it is important that the vector coverage be spatially indexed) and extract some metrics from the list of intersecting geometries.<br /><br />If you add a new method which might be useful to others, let me know. I will add it to the PostGIS Add-ons code…<br /><br /><b><br /></b><b>Conclusion</b><br /><br />Rasterizing a complete vector coverage can be done very quickly with ST_AsRaster() and ST_Union() but some projects require more sophisticated methods to assign values to pixels considering the coverage as a whole. PostGIS Add-ons ST_ExtractToRaster() is slow but provides much more control on the metric that is extracted from the vector coverage.<br /><br />In both case it is possible to extract values following a preloaded reference tiled coverage. That should be the case for most applications. It is also easy to add more extraction methods to ST_ExtractToRaster().<br /><div><br /></div>Pierre Racinehttp://www.blogger.com/profile/17199173181807664396noreply@blogger.com4tag:blogger.com,1999:blog-3548694469101129309.post-79447134968936781062013-11-18T07:20:00.002-08:002013-11-18T08:58:45.981-08:00Launching the PostGIS Add-ons! A new PostGIS extension for pure PL/pgSQL contributions.I'm glad today to launch the<b> PostGIS Add-ons</b>. This new&nbsp;PostGIS&nbsp;extension aims at providing a quick, more Agile way to release pure PL/pgSQL functions developed by the PostGIS user community. It includes 15 new functions. Among them, two functions to aggregate raster/vector overlay statistics, one to extract values from a vector coverage to a raster, one to generate random points inside a polygon and two new aggregates functions helping removing overlaps in a table of geometry.<br /><br />You can download the PostGIS Add-ons <a href="https://github.com/pedrogit/postgisaddons/releases" target="_blank">from my GitHub account</a>. I'm already at version 1.19. The second version number should increment for each significant addition. The major version number should increase when there will be a major change in some function parameters or when we will stop supporting the actual minimal required versions of PostgreSQL and PostGIS which are now PostgreSQL 9.x and PostGIS 2.1.x.<br /><br />You can contribute to the PostGIS Add-ons with your own PL/pgSQL functions by following the <a href="https://help.github.com/articles/using-pull-requests" target="_blank">Fork &amp; Pull GitHub model</a>. Just make sure that:<br /><br /><ul><li>your function is written in pure PL/pgSQL (no C!),</li><li>your function is generic enough to be useful to other PostGIS users,&nbsp;</li><li>you follow functions and variables naming and indentation conventions already in use in the files,&nbsp;</li><li>you document your function like the ones already provided,&nbsp;</li><li>you provide some tests in the&nbsp;postgis_addons_test.sql file,</li><li>you add the necessary DROP statements in the&nbsp;postgis_addons_uninstall.sql file.</li></ul><div><br />So why not contributing your&nbsp;PL/pgSQL&nbsp;work&nbsp;directly&nbsp;to the core PostGIS project? Good question! We can say that the it's easier to contribute pure PL/pgSQL functions to the PostGIS Add-ons for a number of reasons:</div><div><ul><li><b>Because you can do it yourself </b>without relying on one of the core PostGIS developers. No need to compile PostGIS, no need to write to the mailing list and explain the usefulness of your function, no need to write a ticket and wait for one of the main PostGIS developer to add it for you. Just make sure your code is ok and send a pull request to the GitHub project. It should be added pretty fast and trigger a new PostGIS Add-ons release. Your new function will be released in a matter of hours or days, not weeks or months.</li></ul><ul><li><b>Because your function does not fit well in the PostGIS SQL API. </b>Sometimes PL/pgSQL functions are not as generic as the one provided in the PostGIS SQL API or do not perform as well as most core PostGIS functions which are written in C. The PostGIS Add-ons will try to be very open to functions performing very high level tasks or which, because there are implemented in&nbsp;PL/pgSQL, would not perform as well as if they would be implemented in C. Requesting a function to&nbsp;be implemented in C generally means you will not see your function added to the PostGIS core before weeks or months. The PostGIS Add-ons allows you to release them NOW, making them easily accessible, documented, testable and changeable before they get a more definitive signature and find their way in the PostGIS core in C. Some PostGIS Add-ons functions will find their way in the core after being converted to C, some will not and will stay as they are.</li></ul></div><div>So the main goal is to provide casual PL/pgSQL developers, not necessarily skilled in C or not wanting to build PostGIS (on Windows for example which can take days), a channel to contribute to PostGIS. I'm sure there are dozens of PostGIS users out there who developed quite useful stuff but never took time to share them because they found it "too complicated". The PostGIS Add-ons is the first effort to give them and easy and consistent way to contribute to PostGIS.</div><div><br /></div><div>The PostGIS Add-ons tries to make things as simple as they can be. It consist in a single, easy to install, postgis_addons.sql file accompanied by an uninstall and a test file. For the sake of simplicity, documentation about each function is embedded in the main file, before each function declaration. A list of all the function available with a short description is provided at the beginning of the file.</div><br />So here is a list of what functions are available in this first release (in order of what I think are the most interesting contributions):<br /><br /><b>ST_ExtractToRaster</b><b>()</b><b>&nbsp;-&nbsp;</b>Compute a raster band by extracting values for the centroid or the footprint of each pixel from a global geometry coverage using different methods like count, min, max, mean, value of biggest geometry or area weighted mean of values. This function use ST_MapAlgebra() to compute stats about a geometry or a raster coverage, for each pixel of a raster. Stats are computed using custom SQL queries exploiting the spatial index existing on the coverage being extracted. New stats queries can easily be added on demand. A list of possible stats is provided in objective FV.27 in the <a href="http://trac.osgeo.org/postgis/wiki/WKTRaster/SpecificationWorking03" target="_blank">PostGIS raster working specifications</a>. A number of them are already implemented.<br /><br /><b>ST_SplitAgg</b><b>()</b><b>&nbsp;-&nbsp;</b>Returns the first geometry as a set of geometries after being split by all the second geometries being part of the aggregate. This function is a more robust and powerful alternative to <a href="http://trac.osgeo.org/postgis/wiki/UsersWikiExamplesOverlayTables" target="_blank">the solution provided in the PostGIS wiki to overlay two vector coverages</a>. It is normally used to remove overlaps in a vector coverage by splitting overlapping polygons. It does not imply the union of all the geometry so it can work on very large geometry tables. a tolerance parameter allows removing slivers resulting from the spitting.<br /><br /><b>ST_DifferenceAgg</b><b>()</b><b>&nbsp;-&nbsp;</b>Returns the first geometry after having removed all the subsequent geometries in the aggregate. It is used to remove overlaps in a geometry table by erasing parts from all but one of many overlapping polygons.<br /><br /><b>ST_RandomPoints</b><b>()</b><b>&nbsp;-&nbsp;</b>Generates points located randomly inside a geometry.<br /><br /><b>ST_GlobalRasterUnion</b><b>()</b><b>&nbsp;-&nbsp;</b>Build a new raster by extracting all the pixel values from a global raster coverage using different methods like count, min, max, mean, stddev and range. Similar and slower, but more flexible than ST_Union.<br /><br /><b>ST_AreaWeightedSummaryStats</b><b>()</b><b>&nbsp;-&nbsp;</b>Aggregate function computing statistics on a series of intersected values, weighted by the area of the corresponding geometry. The most notable statistics is the weighted mean very useful when intersecting buffers with a raster coverage.<br /><br /><b>ST_SummaryStatsAgg</b><b>()</b><b>&nbsp;-&nbsp;</b>Aggregate function computing statistics on a series of rasters generally clipped by a geometry.<br /><br /><b>ST_CreateIndexRaster()&nbsp;</b>- Creates a new raster as an index grid. Many parameters allow creating grids indexed following any order.<br /><br /><b>ST_BufferedSmooth</b><b>()</b><b>&nbsp;-&nbsp;</b>Returns a smoothed version of the geometry. The smoothing is done by making a buffer around the geometry and removing it afterward. This technique, sometimes called dilatation/erosion or inward/outward polygon offseting, was <a href="http://blog.cleverelephant.ca/2010/11/removing-complexities.html" target="_blank">described years ago by Paul Ramsey</a> and <a href="http://trac.osgeo.org/postgis/wiki/UsersWikiBufferCoast" target="_blank">more recently by "spluque" in the PostGIS wiki</a>&nbsp;(with a nice animated GIF).<br /><br /><b>ST_BufferedUnion</b><b>()</b><b>&nbsp;- </b>Alternative to ST_Union(geometry) making a buffer around each geometry before unioning and removing it afterward. Used when ST_Union leaves internal undesirable vertexes after a complex union or when holes want to be removed from the resulting union. This function, used with some very complex and detailed geometry coverages, fasten and simplify the unioning of all the geometries into one.<br /><br /><b>ST_DeleteBand() -&nbsp;</b>Removes a band from a raster.<br /><br /><b>ST_AddUniqueID</b><b>()</b><b>&nbsp;-&nbsp;</b>Adds a column to a table and fill it with a unique integer starting at 1.<br /><br /><b>ST_NBiggestExteriorRings</b><b>()</b><b>&nbsp;- </b>Returns the n biggest exterior rings of the provided geometry based on their area or thir number of vertex. Mostly useful to the&nbsp;ST_BufferedUnion() function.<br /><br /><b>ST_TrimMulti</b><b>()</b><b>&nbsp;-&nbsp;</b>Returns a multigeometry from which simple geometries having an area smaller than the tolerance parameter have been removed. Mostly useful to the ST_SplitAgg() function.<br /><br /><b>ST_ColumnExists</b><b>()</b><b>&nbsp;-</b>&nbsp;Returns true if a column exist in a table. Mostly useful to the&nbsp;ST_AddUniqueID() function.<br /><br />I hope the PostGIS Add-ons will be useful to some people and trigger a new wave of contributions to PostGIS.<br /><br /><br /><br />Pierre Racinehttp://www.blogger.com/profile/17199173181807664396noreply@blogger.com2tag:blogger.com,1999:blog-3548694469101129309.post-91620715658332582382013-10-07T07:21:00.000-07:002013-10-22T08:44:46.909-07:00Semantic MediaWiki: A promising platform for the development of web geospatial crowdsourcing applications<div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-I65vwX-AS88/UegI2ynYMyI/AAAAAAAADHI/GG0jUTMzsn4/s1600/220px-SemanticMediaWiki_Logo.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><br /><img border="0" src="http://4.bp.blogspot.com/-I65vwX-AS88/UegI2ynYMyI/AAAAAAAADHI/GG0jUTMzsn4/s1600/220px-SemanticMediaWiki_Logo.png" /></a></div>Are you looking for a simple web platform to develop a crowdsourcing data application? Are you planning to develop your own&nbsp;application&nbsp;with some web development language like PHP, Java, Python or Ruby? Do you want to do it quick at a minimal cost? Do you want your users to be able to capture geographical entities as part of the data? Read on...<br /><br /><a href="http://semantic-mediawiki.org/" target="_blank">Semantic MediaWiki</a> is a semantic extension to <a href="http://www.mediawiki.org/wiki/MediaWiki" target="_blank">MediaWiki</a>, the wiki platform on which Wikipedia is built. MediaWiki itself is serious business: It is used by thousands of organizations as the base for their websites and Wikipedia is the <a href="http://www.alexa.com/topsites" target="_blank">sixth most active website in the world</a>. MediaWiki can be scaled to accept thousands of editions and millions of visits per hours. There is no doubt MediaWiki is an incredibly robust website development platform.<br /><br />Semantic MediaWiki (SMW) adds the possibility to add structured data to MediaWiki pages. In other terms: "to create objects (pages) of a certain class (category) having certain properties". For those with a relational database background the transcription looks like: "to create records in tables having certain attributes". With the SMW extension (and a bunch of other extensions), you can build a complete relational database on the web, present the data&nbsp;&nbsp;into pages formatted with wikitext templates (instead of complex CSS or HTML) and list and edit them with fully customizable forms. You can query the data to create views and format them in a multitude of ways. All that online, without having to write one line of low level Python, Java, JavaScript or PHP code! Only <a href="http://en.wikipedia.org/wiki/Wikitext" target="_blank">wiki markups</a>! The same simple markup language that has allowed hundred of thousand of people with no development skills to contribute to Wikipedia!<br /><br />I challenge anybody to find a CMS offering the same features and flexibility SMW does. Seriously! All the big players in data management, Microsoft, Google, Oracle (certainly not Oracle) still don't offer any system that compete with Semantic MediaWiki in terms of simplicity, from an administrator and a user point of view, to develop a database on the web, styled with templates and editable with forms.<br /><br />In this article, I want to emphasis the characteristics of a good crowdsourcing web development platform and try to demonstrate that MediaWiki, when it is installed with the Semantic MediaWiki extension, is about the best platform for developing crowdsourcing websites and that it could also become, with little efforts, a very good platform for crowdsourcing of geographic information. I also try to show how fundamental geographic information systems concepts could be transposed to the world of a web semantic database like SMW. <a href="http://www.cef-cfr.ca/uploads/Membres/PierreRacine-SMWCon2013-BeyondPoints.pdf" target="_blank">I presented</a> most of these ideas during the last New-York City's&nbsp;SMW conference. I’m addressing two kinds of reader: geospatial web application developers familiar with geospatial and web technology but unfamiliar with wikis and mostly with MediaWiki, and Semantic MediaWiki gurus who might be interested in adding complex geospatial information to their wiki.<br /><br /><a name='more'></a><br /><br /><h3><b>MediaWiki for Crowdsourcing Content!</b></h3><br />Besides being open source, MediaWiki is the perfect platform for content developed by users for users (you could say bottom-bottom content creation). In opposition to top-down approaches, in which reputed organizations (like the Britannica encyclopedia or a governmental instances) create and publish content for the public, bottom-bottom content is created by the crowd for the crowd. Wikipedia, OpenStreeMap and YouTube are among the most well-known examples of databases exploiting this paradigm with great success.<br /><br />With governmental organizations budgets constantly decreasing, the growing demand for open data of any kind and the enormous task involved in assembling huge datasets, organizations are turning more and more towards internet citizens to collect and maintain data about everything. This tendency should increase, providing a nice future for crowdsourcing technologies. Geospatial data is no exception. With the venue of Google Map, hundreds of websites have popped up allowing <a href="http://en.wikipedia.org/wiki/Neogeography" target="_blank">neogeographers</a> to tag geographic features on a map, to inventory any kind of observations still invisible in satellite images (like potholes) and to <a href="http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)" target="_blank">mashup</a>&nbsp;various sources of data to create unexpected original datasets.<br /><br />SMW is a very good player at that. The <a href="http://semantic-mediawiki.org/wiki/Wiki_of_the_Month" target="_blank">Wiki of the Month page</a> on the Semantic MediaWiki website list featured sites which are using the platform as their web content management application. Notable sites include <a href="http://en.openei.org/" target="_blank">OpenEI</a>, a site enabling the sharing of information on energy and other digital resources and <a href="http://wikiapiary.com/" target="_blank">WikiApiary</a>, a site collecting information about the numerous sites based on MediaWiki. Both sites are based on crowdsourced data.<br /><br />Not every website development platform are good for building successful crowdsourcing data websites. We can say that there are really only three fundamental functionalities that make wikis very good at this task and make them different from other Content Management Systems (CMS):<br /><br /><b>Change history -</b> Every change made in a wiki is recorded and can be reverted. This is very important to identify deviant (or misinformed) editors and undo their changes. This is the key to content stability and confidence. This is why Wikipedia was based on a wiki platform, not a CMS. We can easily say that a web development system is not a wiki system if it doesn't provide change history.<br /><br /><b>Page-based content and edition -</b> In most CMS you get a published mode (what normal visitors see) and an administrative mode (where a privileged user named administrator can change the website structure and content). The pages in the administrative mode generally involve management concepts which are very different from CMS to CMS. That makes the learning curve, when passing from one system to another, steeper than it should be. The content you want to edit (e.g. the text of a menu item or the header of the website) may often be several clicks away from the administrative home page and refer to an abstract concept specific to this CMS. In a wiki platform, by contrast, everything can be change directly in a content page. Just navigate to the page containing the content you want to change, edit and save. You don’t have to understand a complex administrative interface and other management concepts. All contents, even the menus, are in wiki pages, all editable the same way. This is a unique, simple and easy to learn concept that is shared by all wiki platforms and makes them quite different from most CMS. This is why Wikipedia was based on a wiki platform, not a CMS.<br /><br /><b>Easy content edition -</b> Content contributors&nbsp;do not want to know about Content Management Systems concepts and do not want to learn how to edit complex data structures. Not to mention, they certainly do not want to mess with HTML and CSS. In a wiki, content is formatted using wikitext: a limited numbers of tag insuring minimal consistency of content appearance over all the pages created by different editors. No HTML, no CSS, no choice between 2000 colors, no hesitation on how to format a page header, just use the minimal set of tags and the contributed content will look consistent with the rest of the site. Easy content edition is the key to attract and retain a good number of collaborators and wikitext insure minimal formatting consistency. This is why Wikipedia was based on a wiki platform, not a CMS!<br /><br />Besides those three fundamental wiki must-haves and the fact that it is open source, MediaWiki also shares, with most popular CMS, other interesting features for developing open data portals. All features you certainly don't want to develop from scratch...<br /><br /><b>Support for many users - </b>CMS and wikis already allow many users to contribute to content. They are ready to register users, provide them with a profile and associate their changes to their profiles. All that can generally be done in a quite secure way. Most will let you create user groups and manage permissions to read, write, upload files and change those permissions for a page or a group of pages. You don’t have to reinvent the wheel by coding your own support for multiple users in PHP or Java and adventure in the development of bullet-proof security.<br /><br /><b>Easy global change of look and feel - </b>CMS and wikis generally offer a variety of templates (or skins) to change the global look and feel of a site. A new skin can generally be applied with a minor intervention at the file system level. The fact that many websites based on MediaWiki share the same look as Wikipedia discourages many web developers to select MediaWiki. But counter examples of nice looking websites based on MediaWiki abound: <a href="http://www.richardcarterconsultancy.com/2011/12/stunning-mediawiki-skin-design/" target="_blank">here</a>, <a href="http://asianwiki.com/" target="_blank">here</a>, <a href="http://www.ameriquefrancaise.org/en/" target="_blank">here</a>, <a href="http://www.cmkalliance.co.uk/" target="_blank">here</a>&nbsp;and <a href="http://mediawiki2u.com/" target="_blank">here</a>.<br /><br /><b>A myriad of extensions (notably the integration of Google Map or Open Layers) - </b>A mature website development platforms provides a great number of extensions adding functionalities to the base system. Extensions for adding a calendar, a picture gallery, a discussion forum, a RSS feed, etc. are common in most platforms. The fact that an extension exists for a platform or that an extension is customizable in some specific ways is often critical in the choice of the overall platform. This is particularly true for small organizations which can’t afford to pay a programmer to develop specific functionalities.<br /><br />A set of extensions of particular interest for the geographic community are the ones allowing the integration of maps in the website. Most mature website development platforms have an extension allowing embedding Google Map or OpenLayers in a page. MediaWiki has its own one called <a href="http://www.mediawiki.org/wiki/Extension:Maps" target="_blank">MediaWiki Maps</a>. Again, no need to write code: These extensions support a set of parameters matching the ones available in the Google Map or the OpenLayer API. They let you configure the navigation and zooming controls and which base maps should be&nbsp;available. They provide means to display markers with info bubbles or overlaying KML files. Some KML files might be composed of markers only and some might contain lines and polygons. We could say that this is actually the main difference between so called neogeographers sites and professional geographic sites. The former tend to be happy with markers and points and the latter require more complex representations of geographic features like lines and polygons. The fact of not providing easy means to add support for the creation, storage and edition of complex geographical entities like lines and polygons is certainly a limit encountered in most CMS and wikis mapping extensions. We'll speak more about this below.<br /><br /><b>Online modification of all content - </b>You don’t need to know PHP, Java, Ruby or Python to modify the two levels menu, add a new section to the website, edit a page or change permissions. This might sound strange for people working in big organizations where the website is handled by a team of designers and programmers, but the fact that 95% of a site is modifiable through a web interface without having to play with the underlying code is a fundamental advantage for small organizations who can’t afford the luxury of &nbsp;even one skilled web developer. The fact that a web designer with no programming skills can create and modify a website is a key feature that made the success of web development platforms like CMS and wikis. Even big organizations benefit from this easiness: the content of the website can be updated faster by a more diverse, not necessarily very skilled, group of employee, not requiring access to the server filesystem and the knowledge to deal with it.<br /><br /><h3><b>What’s Missing? Easy Development of Web Data Management Applications...</b></h3><br />Even though CMS and wikis are very flexible and let the most basic user modify almost every part of a website online, there is still one thing they hardly let you do simply by dealing with a couple of forms or wikitext style tags in a browser: Developing a custom data management application, i.e. an application that let you display and edit structured data on the web. Examples of this kind of application abound. Any organization of reasonable size is at some point confronted with the challenge of developing a web application to manage data specific to its business model: employees, members, products, locations, surveys, publications, all kinds of structured data which management has to be distributed over many persons on the web and for which CMS do not offers readily implemented solutions. Small and medium size organization will generally rely on a consulting company to develop such applications and specific solutions will be implemented by skilful developers in low level languages such as PHP, Ruby, Python of Java.<br /><br />To let a web developer with no programming skill develop such an application CMS and wiki should provide a web interface allowing to:<br /><ol><li><b>Define a custom data model (a simple table or a relational model). </b>The result of this definition should be a set of empty tables in a server side storage backend (a set of flat files in the file system or a set of tables in a DBMS). No direct interaction with the database and knowledge of SQL or any other exotic query language should be necessary.</li><li><b>Construct forms to feed this data model. </b>The structure and look of these forms should be customizable with a minimal set of dialog boxes or tags without having to know about HTML and CSS.</li><li><b>Embed different representations of the collected data in a web page</b> as lists or individual pages, using some kind of templates based on a simple metalanguage. Users should be able to sort and filter those the content of those lists.</li></ol>Besides design, collection and representation of data, you may also want to import precollected data in the system and let your users export them in their favorite file format (CSV, Excel, XML or shapefile if it includes some geographic information).<br /><br />All those things should be doable through a web interface by a web designer without having to write code in a low level language like Python, Java, Ruby or PHP.<br /><br />Even though most data management applications are not very complicated - the data structure often resume to a simple table with a few columns - most CMS and wikis<b> will NOT</b> let you define such a simple data model online! Most CMS will let you construct a form but the data collected will be sent to an email address. You normally prefer the data to be stored in your favorite database for further treatment. Some systems, like Google Spreadsheet or <a href="http://www.surveymonkey.com/" target="_blank">SurveyMonkey</a> will let you define a simple data structure and build a form, but they won’t let you customize the look of this form with HTML and CSS. Other systems (like Google Fusion Table or <a href="http://cartodb.com/" target="_blank">CartoDB</a>) will also let you create a nice representation of the data but they won’t let you change the spreadsheet-like input grid they use in place of a form. Generally speaking, you have to use a high level web application frameworks like <a href="https://www.djangoproject.com/" target="_blank">Django</a> (Python) or <a href="https://drupal.org/" target="_blank">Drupal</a> (PHP) to implement the simplest application and this always implies some sort of programming in Python, Java, Ruby, JavaScript or whatever presumably "simple language" the framework is based on. This is without speaking about implementing everything from scratch with one of those low level languages; a solution still too often chosen by developers who like to reinvent the wheel...<br /><br />To summarize we can say that some systems let you implement some of the feature necessary to a web data management applications, but none of them let you implement all of them without having to write low level code. As web developers and experts of <a href="http://en.wikipedia.org/wiki/Web_usability" target="_blank">web usability</a>, we want to be able to define data models, to construct required forms in a usable way and display the information collected as recommended by our best practices without always relying on a developer. Few systems let you do this easily now. Semantic MediaWiki is one of those.<br /><br /><h3><b>Semantic MediaWiki for&nbsp;</b><b>Structured&nbsp;</b><b>Crowdsourced Data!</b></h3><br />So here comes the Semantic MediaWiki extension. Along with a couple of other extensions, SMW extend MediaWiki to allow doing all the features defined above and develop a complete data management web application. All this online, only with wikitext, without having to write one line of PHP! You can define a set of generic properties, build a form (as a page written with normal wikitext in the Form namespace) so users can enter values for the properties and display the collected values formatted with wikitext based templates. You can create SQL-like views (or queries) on the data and display the results using the same type of templates. Among a number of simple and very useful features, forms created with wikitext support autocompletion based on already entered values which helps avoiding duplicate or misspelled values.<br /><br />Another extension called <a href="http://www.semantic-mediawiki.org/wiki/Semantic_Result_Formats" target="_blank">Semantic Result Formats</a> let any editor display query&nbsp;results in a very impressive number of sophisticated formats like calendars, graphs, timelines, trees, slideshows, image galleries, tagclouds, BibTeX, and many more! Another extension, <a href="http://www.mediawiki.org/wiki/Extension:Semantic_Drilldown" target="_blank">Semantic Drilldown</a> proposes an interface for progressive filtering and listing of semantic data. An extension named <a href="http://www.mediawiki.org/wiki/Extension:Data_Transfer" target="_blank">Data Transfer</a> let contributors import CSV or XML files as page properties and users export pages&nbsp;properties&nbsp;to XML.<br /><br />In brief Semantic MediaWiki let you do with structured data what MediaWiki let you do for textual content: putting quickly online a quite complex and generic data management system.<br /><br /><b>What about geospatial data?</b><br /><br />The MediaWiki Maps extension, which is used to&nbsp;<a href="http://semantic-mediawiki.org/wiki/Maps_examples/Google_Maps_map_with_multiple_points" target="_blank">display markers over basemaps</a>, already has its Semantic little sister called <a href="http://www.mediawiki.org/wiki/Extension:Semantic_Maps" target="_blank">Semantic Maps</a>. This extension allows <a href="http://mapping.referata.com/wiki/Help:Forms" target="_blank">capturing the location of a geographic feature in a form</a> and <a href="http://mapping.referata.com/wiki/Help:Queries" target="_blank">aggregating all the features resulting from a SQL-like query into a single map</a>. Everything you have ever dreamed of to build a neogeography application! Again, and I like to highlight the point, without having to write a single line of low level code! A non-programmer can develop forms to collect information about punctual features, query them and display the results on a map in a couple of hours. He has full control over the <a href="http://www.mediawiki.org/wiki/Extension:Semantic_Forms/Defining_forms" target="_blank">appearance of the form</a>, the <a href="http://www.mediawiki.org/wiki/Help:Templates" target="_blank">display format of the data collected</a>, the <a href="http://www.semantic-mediawiki.org/wiki/Help:Selecting_pages" target="_blank">parameters of the query</a> and the <a href="http://semantic-mediawiki.org/wiki/Maps_examples" target="_blank">characteristics of the map</a>. This is sufficient to develop a whole lot of different geoweb applications, but still Semantic MediaWiki has its niche...<br /><br /><b>The Semantic MediaWiki geospatial niche</b><br /><br />If we classify geoweb applications on a continuum going from application showing no information about each geographical feature to application showing a lot of information about each feature, we can group them into three fundamental categories considering the relative importance given to the content vs the map:<br /><ol><li><b>Map only applications -</b> in which only the geographical representation the features are presented to the users. These applications are mainly simple dynamic maps. Google Map, Bing Map and <a href="http://www.openstreetmap.org/" target="_blank">OpenStreetMap</a>, when no query is performed in their search boxes easily fall into this category.</li><li><b>Map with info window applications -</b> in which the map is prominent but information about each geographic feature is presented in a popup info balloon when the users click on the feature. They look more like traditional geographic information systems. Emphasis is put on the functionalities associated with the map, not with the content associated with the feature. Some examples: <a href="http://wikimapia.org/" target="_blank">WikiMapia</a>, <a href="http://geoindex-plus.bibl.ulaval.ca/" target="_blank">GeoIndex+</a>, the global <a href="http://www.drowningtracker.com/" target="_blank">Drowning Tracker</a>, the greek <a href="http://www.rae.gr/geo/?lang=EN" target="_blank">PAE-RAE Geospatial Map for Energy Units and Requests</a>. There are literally <a href="http://googlemapsmania.blogspot.com/" target="_blank">thousands of them</a>...</li><li><b>Content based applications - </b>in which textual and structured information about geographical feature is important and presented page per page (one geographic feature = one page). Maps serve only as a side representation of the data. Some examples: <a href="https://en.wikipedia.org/wiki/Canada" target="_blank">Wikipedia</a>, the <a href="http://www2.ulaval.ca/plan-du-campus/pavillons.html?tx_tacticunitepavillon_pi1%5Bmode%5D=pavillon&amp;tx_tacticunitepavillon_pi1%5Buid%5D=20&amp;cHash=898ccf3aac44ee6075ddb1a76ce2a7d3" target="_blank">list of buildings on the University Laval campus</a>.</li></ol>Some commentators of the geoweb <a href="http://mapbrief.com/2013/02/05/why-map-portals-dont-work-part-i/" target="_blank">pretend</a> that there are too many of the first and second type of applications and that many applications, mostly search and discovery ones, would be more efficient if they were constructed like the third type. It’s often hard to balance the importance to give to content vs mapping in a geoweb application. SMW is an excellent platform to develop the third type of application with rich content for each geographic feature.<br /><br />Here are some examples of applications with emphasis on rich textual, structured and geospatial content:<br /><ul><li><b>A gazetteer web service</b> in which the collection of information related to an entry (name, synonyms, description, classification, parent and children entries, statistics) is important and in which not only the latitude and the longitude of the feature are stored and delivered (e.g. as a WFS service) but detailed polygonal and linear geometries when appropriate.</li></ul><ul><li><b>An online atlas </b>proposing a lot of content and data about each geographical feature (like Wikipedia).</li></ul><ul><li><b>A cadastral survey</b> containing all the historical information related to each properties of a specific area.</li></ul><ul><li><b>A catalog of geospatial data</b> in which the description of each dataset is as or more important than its geographical coverage.</li></ul>Funnily the wiki of the <a href="http://wiki.osgeo.org/" target="_blank">OSGEO foundation</a>, which "supports the collaborative development of open source geospatial software", and most notably the development of PostGIS, the most used open source geospatial database, is made with MediaWiki. Even though SMW and the Semantic Form extension <a href="http://wiki.osgeo.org/wiki/Special:Version" target="_blank">are installed on the base platform</a> nobody in this organization apparently thought about its potential as spatial database. Hey guys! This is a web database! Would it not be nice if it would be a web geospatial database? Let see if we get some reactions here...<br /><br /><b>Exploring Microsoft SharePoint, Drupal and others</b><br /><br />Clearly, SMW is a web database coupled with a very flexible web development platform. If you know any other system bringing this data collection power and this flexibility to a non-programmer, let me know! I have been searching for such a system for a while now and I always come back to SMW. <a href="http://wikiworks.com/semantic-mediawiki-vs-sharepoint.html" target="_blank">Some have compared Microsoft SharePoint to SMW but still the comparison does not stand very long</a>.<br /><br />Another challenger is the Drupal CMS which allows defining content based on a custom data structure and collect corresponding data in the form of "nodes". You can then display views over those nodes in a very flexible way to build blocks of content to be embedded in some pages of the website. We are very near from the goal!<br /><br />Still, Drupal proposes very different interfaces: a hard (or impossible) to customize CMS-like one for administrators and a very customizable one for end-users. All Drupal configurations are done using forms which can get surprisingly overwhelmingly complex. It becomes quite hard for end-users to understand the new configuration forms available to them when they gain editing privileges and to assimilate the concepts associated with those forms. It’s funny to see experienced Drupal users presenting new Drupal features on YouTube&nbsp;<a href="http://www.youtube.com/watch?v=8qIydHdxNx0" target="_blank">getting themselves lost in the maze of Drupal forms</a>&nbsp; (at 22:00). That's in contrast with wikis which in general propose a "do it in a page using wikitext" general approach to everything independently of the editor being an experienced administrator or a simple end user.<br /><br />Furthermore Drupal makes a distinction between end users (or surveys) forms and administrative forms for creating website content. The first ones generate data that can be only viewed as list or sent to you as emails and the second ones generate content nodes. Data generated with end users forms cannot be reused as website content. End user forms are quite customizable online but content creation form can only be modified by a general change of skin or by altering some .php file.<br /><br />These observations stand for <a href="http://cartaro.org/" target="_blank">Cartaro</a>, a Drupal based geospatial CMS which represent one of the best attempt at offering a simple web development platform for geospatial data management. All content addition is done through administrative forms and Drupal specific concepts which can get quite complex for a newcomer. <a href="http://www.easysdi.org/en" target="_blank">Easy SDI</a>, a similar product based on the Joomla CMS suffers from the same drawbacks.<br /><br />At the lower end of the continuum, Django and its geospatial extension <a href="http://geodjango.org/" target="_blank">GeoDjango</a>, are targeted at Python developers. Developing a web data management application with this framework is impossible it you don't have programming skills. <a href="http://geonode.org/" target="_blank">GeoNode</a>, an open source product maintained by the ubiquitous <a href="http://opengeo.org/" target="_blank">OpenGeo</a>&nbsp;company and built on top of Django is an out of the box user-to-user Spatial Data Infrastructures. Good knowledge of Python and CSS is necessary to make any significant change to the GeoNode interface or add a metadata field to the catalog. "Open source" might mean open to skilled developers but we are looking at frameworks "open" to a larger user community which skills are in other domains like UI design, data modeling, content creation, community development or merely web site construction. Not programming...<br /><br />So Drupal, like most CMS, and Django, like most web development frameworks, are not designed as end user content creation platforms but as administrative centric ones, and, at worst, as developer centric ones. In contrast, SMW forms always create new reusable content and are easily customizable by non-developers. It is clearly an end user content creation platform making it a better choice for evolving crowdsourcing open data creation and modification.<br /><br />Here is a summary of wikis, SMW and CMS key features for crowdsourcing applications:<br /><br /><b>Wikis (and MediaWiki) features shared with most CMS</b><br /><ul><li>Online customization of content and formatting</li><li>User and group management</li><li>Easy change of global look and feel (even though MediaWiki is not necessarily very good at this right now)</li><li>Many extensions for various extra functionalities (like Google Map or Open Layers embedding)</li><li>Form creation and formatting</li></ul><b>Wikis (and MediaWiki) specific features</b><br /><ul><li>Change history (for everything)</li><li>Page-based content and edition or everything</li><li>Easy edition and formatting of page content</li></ul><b>Semantic MediaWiki specific features</b><br /><ul><li>Client side data structure definitions for storing form data on the server</li><li>Customizable rendition of data collected with forms (with templates)</li><li>Customizable rendition of queries on data collected with forms (with templates)</li></ul><b>Features provided by Semantic MediaWiki related extensions</b><br /><ul><li>Customizable rendition of geographic queries as maps (Semantic Maps extension)</li><li>Customizable rendition of queries as graph (Semantic Result Formats extension)</li><li>Bulk import of data (Data Transfer extension)</li><li>Export of collected data (Data Transfer extension)</li><li>Transparent use of external data (External Data extension)</li><li>Dynamic filtering of data (Semantic Drilldown extension)</li></ul><h3><b><br />The World is Never Perfect: MediaWiki Weaknesses</b></h3><br />No content management system is perfect. MediaWiki (by itself without the Semantic extension) also suffers from some weaknesses. <a href="http://yaronkoren.com/" target="_blank">Yaron Koren</a>, a major developer of SMW extensions and author of the book "<a href="http://workingwithmediawiki.com/" target="_blank">Working with MediaWiki</a>", already <a href="http://yaronkoren.com/blog/?p=152" target="_blank">identified some of them in his former blog</a>. Here is my own list:<br /><ul><li><b>WYSIWYG -</b> Even if this is presently identified as the number one weakness of MediaWiki, I share Yorun though that wikitext is simple enough for most applications. I run a wiki (based on PmWiki) open to about 200 researchers and all of them have been able to create their own page without any guidance other than a couple of very simple documentation items I wrote. Anyway this weakness is now a past issue with the release of the <a href="http://www.mediawiki.org/wiki/Extension:VisualEditor" target="_blank">Visual Editor</a> which seems to have overcome <a href="http://www.mediawiki.org/wiki/WYSIWYG_editor" target="_blank">all the difficulties involved in developing a robust WYSYWYG for wikitext</a>.&nbsp;</li></ul><ul><li><b>Flexible access permission control for page, sections and documents - </b>This would be my first real weakness for MediaWiki. Opening everything and securing pages edition only with <a href="http://www.mediawiki.org/wiki/Extension:ConfirmEdit" target="_blank">some sort of CAPTCHA</a> might be sufficient for organizations' intranets in which users are trustable but if your wiki is also the website of the organization, you want to have different degree of security on some sections (e.g. on the home page or on key administrative pages). MediaWiki does not have good support for page security since it was conceived as an open-to-all wiki. Most security configurations are done by modifying the main .php configuration file which means only administrators with low level file access can change actions groups are allowed to do and pages on which they are allowed.<br /><br />A typically Linux-like security scheme would normally let an administrator: <br /><br /><ul><li>Define users and groups of users.</li><li>Assign users or groups of users read permissions to pages or groups of pages (and their associated attached files).</li><li>Assign users or groups of users write permissions to pages or groups of pages (and their associated attached files).</li><li>Assign users or groups of users permissions to change permissions.</li></ul><div><br /></div>All that through a web interface without having to directly modify files on the server. Except for attached file permission, this security scheme is the one adopted by <a href="http://www.pmwiki.org/wiki/PmWiki/AuthUser" target="_blank">PmWiki</a>. <br /><br />As for MediaWiki, it proposes eight user groups by default: unauthenticated users (anonymous), authenticated users, autoconfirmed authenticated users, emailconfirmed authenticated users, bots, sysops, bureaucrats and filesystem access administrators. Only a filesystem access administrator can create new groups (by editing the main .php MediaWiki configuration file). Users can be created and assigned to groups online. Permissions are generally applied globally in the .php file. Some permissions, like ‘edit’ and ‘move’ (<a href="http://www.mediawiki.org/wiki/Manual:Preventing_access#Restrict_viewing_of_certain_specific_pages" target="_blank">curiously not the ‘read’ one</a>), can be applied on a page per page basis but hardly to a group of pages.<br /><br /><a href="http://www.mediawiki.org/wiki/Category:User_rights_extensions" target="_blank">Many MediaWiki extensions try to address this lack of flexibility</a> but there are so many (107) of them, it gets quite touchy to choose the right one addressing your particular needs. Furthermore none of them guarantee 100% security on the additional feature they provide. Clearly, setting page per page or group of page levels of security in MediaWiki is an adventurous endeavour and this is probably why most MediaWiki gurus tend to promote full access to almost everything in the wiki (maybe not for the right reason).<br /></li></ul><ul><li><b>Too many configurations still done through&nbsp;<b>configuration files&nbsp;</b>- </b>Even though all the content of a MediaWiki installation is modifiable online, it is quite different for many administrative tasks. Many of them can be done via <a href="http://en.wikipedia.org/wiki/Special:SpecialPages" target="_blank">Special pages</a> but still too many are doable only by editing the main .php configuration file.</li></ul><ul><li><b>No extension manager - </b>One of the administrative tasks which is often possible to do online is the installation and configuration of extensions. However, despite a long history of request by users and many attempts from skilled developers, <a href="http://www.mediawiki.org/wiki/Requests_for_comment/Extension_manager" target="_blank">no extension manager worthy of the name exists yet</a>&nbsp;for MediaWiki.</li></ul><ul><li><b>Difficult skin development - </b>Developing a MediaWiki skin is not reputed being an easy task. Daniel J. Barrett, in his <a href="http://shop.oreilly.com/product/9780596519681.do" target="_blank">O’Reilly book about MediaWiki</a> explains why: "Of all the ways to configure MediaWiki, skinning is one of the most complicated, mainly because the code is not well-factored. From the supplied skin files, it’s nontrivial to understand which parts are required to write a skin and what are the best practices." This might be the reason why there are <a href="http://www.quora.com/MediaWiki/Why-is-there-not-a-vibrant-skin-marketplace-for-MediaWiki" target="_blank">not many beautiful skins for MediaWiki</a>, most of them being close variations of the Wikipedia one.</li></ul><ul><li><b>Too much geared towards the needs of the Wikimedia Foundation - </b><a href="http://www.mediawiki.org/wiki/Manual:What_is_MediaWiki%3F" target="_blank">From the MediaWiki manual</a>: "The software is primarily developed to run on a large server farm for Wikipedia and its sister projects. Features, performance, configurability, ease-of-use, etc. are designed in this light; if your needs are radically different the software might not be appropriate for you." This might explain why there is still not a good extension manager, why advanced document management and access right features are neglected and why skin development is still hard... Maybe MediaWiki would gain features allowing developing a broader spectrum of applications if it would distance itself a bit from Wikipedia? Maybe a fork could reach a more diversified group of potential users?</li></ul><br /><b>Semantic MediaWiki Weaknesses</b><br /><br />MediaWiki along with its semantic set of extensions is an amazing website development platform; Don’t misinterpret me on that. But still, as I tried to express in <a href="http://www.cef-cfr.ca/uploads/Membres/PierreRacine-SMWCon2013-RethinkingSMWforDummies.pdf" target="_blank">a short 5 minutes presentation</a> during the last North American SMW conference, it suffers from some weaknesses:<br /><br /><ul><li><b>Semantic! - </b>To me the first thing that keeps people away from Semantic MediaWiki is its name. How many people in your website development team know the meaning of the word "semantic"? Most people in the web site development business know about "relational database", about "tables", about "attributes" but nothing about "semantic". Speak to them with a language they understand.&nbsp;Speak about "meaning", "data", "database", "properties", but not about "semantic".&nbsp;I thing SMW needs a serious rebranding to better connect with the community of web developers which constitute its main user target.</li></ul><ul><li><b>Triples VS relational schema - </b>Who knows about triplestore databases? Not many people. Most people know about relational databases. Having to deal with a new website platform is one thing; dealing with an unknown data storage paradigm is another. Does this unknown triplestore paradigm scares web developers when they consider SMW as an option for the data management application they have to put on the web? I tend to think "yes". I think SMW would not only gain from getting rid of the "semantic" terminology but also from adopting a storage paradigm more similar to the relational database one. That would significantly ease selling this platform to the huge relational database savvy community who is in quest for simple web solutions.<br /><br />"Impossible" will say the Semantic Web 2.0 gurus, "as semantic is tied to RDF and the way to store RDF relationships is with triplestores." Funnily SMW main database back-end is MySQL, a relational database, not a triplestore. Another oddity is that most people use SMW with the <a href="http://www.mediawiki.org/wiki/Extension:Semantic_Forms" target="_blank">Semantic Form extension</a> which forces users to always define the same set of properties on entities of the same category. This is a direct mapping to a relational schema, not to a triplestore schema which allows properties to be set on any entity. When using the Semantic Form extension, categories act like tables and properties act like tables’ columns. A SMW installation based on Semantic Forms acts very much like a relational database, not like a triplestore. Why continuing to refer to RDF and triplestore? No matter how subject-predicate-object are stored in the database (always storing them as triples is handy as it prevent continuously modifying the database schema), the important thing is how they are structured from the SMW user point of view. When using Semantic Form they look like a relational schema: there is no need to speak about "semantic", "triplestore" or "RDF". When every property can be set in any page with annotations they ARE triples and cannot appear like relations in a relational schema: you can now speak of them with "semantic", "triplestore" and "RDF".<br /><br />This incongruity could easily be fixed by coercing the definition of properties only in the context of categories (directly in the category pages) instead of in an independent page. With this slight modification, SMW data schema would map directly to a relational schema and a better-known relational database terminology could be adopted. This could get relational database dummies to stop by and consider SMW for their web applications. This represents a lot of people and potential contributors to the SMW code base and related extensions. Each property would be tied to a category by definition (like columns are tied to tables), describing the equivalent of a relational database table schema for this category. Right now, properties can be defined using annotations, in a triplestore way, outside of any context and without using Semantic Forms, on any object of any categories. It is indeed very strange for someone having a relational database background to be able to define properties outside any context and then be able to assign values for those properties on any entity.<br /><br />SMW could leave the possibility to define properties outside the context of a category only when it is explicitly installed over a triplestore backend and values for properties are not set using Semantic Forms. This would result in two kinds of properties: "category properties" describing a relational database-like schema and "global properties" describing a triplestore "semantic" or "RDF"-like schema. Both properties would be a different way to append semantic to entities. "Category properties" would be queried using SQL-like "ask" queries and "global properties" would be queried using SPARQL-like queries. The default would be to accept only "category properties" when SMW is installed over a relational database. It would be possible to define "global properties" in addition to "category properties" if a triplestore is installed on the back end.<br /><br />Forcing the definition of properties in the context of category would not have much impact on the rest of SMW functionalities. {{#ask}} and {{#show}} queries would work transparently. {{{for template}}} Semantic Forms tags would have to refer to a specific category and subsequent {{{field}}} tags would have to refer only to properties defined for this category.<br /><br />Semantic is not the prerogative of RDF triples. As far as we don’t get into OWL and we stay into the "Semantic MediaWiki with the Semantic Forms extension" domain, a relational schema is sufficient to define semantic on objects. There is a <a href="http://www.w3.org/TR/rdb-direct-mapping" target="_blank">direct mapping between relation schemas to RDF schemas</a>. <a href="http://www.w3.org/DesignIssues/RDB-RDF.html" target="_blank">RDF can also be seen as a generalization of the relational paradigm</a> in which properties do not have the constraint of having to be defined in the context of a table. "Category properties" are just another way, closer to the relational database way, to define semantic. Allowing a simpler relational perspective would certainly make SMW more accessible for most data model developers. It could be marketed as a true "web relational database solution" and certainly attract more users, developers and funds.</li></ul><ul><li><b>Limited number of subobject levels</b>&nbsp;- In MediaWiki everything is stored as a page. It make sense since everything is in the form of an article. One article per page, fair enough. When using SMW, things gets a little bit more complicated. It was designed to add properties to articles. All properties about an article are stored within the page for this article using special tags. When using the Semantic Form extension, property values are stored in a template call, still within the article. When the article is saved, the template "tag" the values so they are properly defined as properties.<br /><br />Up to now the structure are always very basic: one article per page, many properties per article. Now let say you want to "group" some properties together and have many of those groups for one article. Database users would call this a one-to-many relationship. A classical example are the many actors plying in a movie. This is simple since it is natural to create articles (or pages) about actors and hence to store properties about actors in those pages.<br /><br />But what if groups of properties represent something that doesn't make much sense to store as a page? For example consider the list of references in an article. Each reference has an author, a title, a year of publication, etc... We don't want to create one page for each reference. We would end up with thousands or millions of almost empty pages. SMW answer to this are "<a href="http://semantic-mediawiki.org/wiki/Help:Adding_subobjects" target="_blank">subobjects</a>". Subobjects are like articles (objects) with properties but they don't have to "live" (or be stored) in their own pages. They can "live", many at the same time, in the page of an existing article. Very clever, very useful. This is a two levels, one-to-many relationship and thanks god, not everything has to become a page.<br /><br />Now lets complicate the problem a little bit by wanting to store all reference authors&nbsp;themselves&nbsp;as objects with properties, not just as a mere, unparsed list of authors for each reference. Each author could have a name, a surname, an email address and an affiliate institution (to make things even more complicated they could have many addresses and many affiliate institutions but we don't need this level of complexity to make our point). Remember that we don't want to store references, nor authors, as pages. We want to store them as subobjects of articles in the articles pages. So we need to be able to define a third level, one-to-many relationship, as subobjects (authors) of existing subobjects (references), all to be stored in the same page.<br /><br />Unfortunately this is not possible now in SMW because it is not possible to embed subobjects in other subobjects... You then have to make compromises and always define some levels as pages (articles). This is most of the time possible but generally not desirable as, very often, these objects can be defined with a very limited set of properties and we don't want to create pages for objects having only two or three properties.&nbsp;That chills our willingness to implement a true database schema in SMW and use it as a data gathering application. Don't read me bad, you CAN implement any relational schema, but at some point, you have make some tables, pages, even when the objects represented in those tables are not well represented as pages.</li></ul><ul><li><b>Weak support for PostgreSQL - </b>Despite the fact that MediaWiki supports PostgreSQL as a database backend since version 1.7 (current version is 1.21) <a href="http://semantic-mediawiki.org/wiki/PostgreSQL" target="_blank">effective support for this database in SMW has been broken since the end of 2012</a>. PostgreSQL support is essential to provide serious support for geospatial data (with PostGIS) as <a href="http://gis.stackexchange.com/questions/57995/which-spatial-dbms-should-i-select/57996#57996" target="_blank">support for spatial functions in MySQL is kind of weak</a>.</li></ul><ul><li><b>Messy extension support - </b>A very interesting fork of SMW called <a href="http://semantic-mediawiki.org/wiki/Semantic_MediaWiki_Plus" target="_blank">SMW+</a> have been developed over the years by startup companies like <a href="http://en.wikipedia.org/wiki/Ontoprise_GmbH" target="_blank">Ontoprise</a> (now <a href="http://www.semafora-systems.com/en/" target="_blank">Semafora</a>) and <a href="http://diqa-pm.com/en/DataWiki" target="_blank">DIQA</a>. For some unclear reasons, the development of those extensions has been discontinued and they became incompatible with the last versions of MediaWiki. Some functionalities provided by new, well supported, extensions of SMW seems to reimplement some features that were provided by the old SMW+. It is also unclear which ones are provided by new products from those companies. SWM+ provided some very interesting and unique features and it is hard to tell what is still available and what is lost. There is now a <a href="http://semantic-mediawiki.org/wiki/OntoLTS" target="_blank">project wanting to revive these features</a> but it doesn’t look very active. A similar pattern seems to have happened to another major extension called <a href="http://www.mediawiki.org/wiki/Extension:Halo_Extension" target="_blank">Halo</a>. Everything should have been done by the MediaWiki and the SMW+ teams to avoid such loss of precious and unique functionalities. Better backward compatibility and inter team communication would probably have avoid the loss.</li></ul><br /><h3><b>What is Missing for Better Geospatial Support in Semantic MediaWiki?</b></h3><br />From a developer of geospatial applications, the main weakness of SMW is its limited support for geospatial data. The Semantic Map extension already works very well with points. What is missing to make SMW go from a web database and web application development platform to a full featured web spatial database and a web geographic information system? Mostly notably support for complex geometries. Here is rough picture of the critical functionalities that are missing:<br /><ul><li><b>Tools for importing, creating and editing complex geometries - </b>As in any GIS application, creation of new features and edition are key building blocks. The Semantic Map extension allows creating new geographic coordinates (or points) by entering them in a form or by clicking on a map. Direct creation of complex geometries is however harder. We generally want to import them from an existing KML or shapefiles. We might also want to import a single geometry by specifying its unique ID in the shapefile or to import many features at the same time using the SMW Data Transfer extension. In a typical SMW project, each row from a shapefile would create or would add a property to a page describing an entity having a geographical footprint: a town, a country, a road, a lake, etc... The other shapefile attributes could also be imported and become additional semantic properties. Another way of creating a geometry property might be to simply drop a WKT string in the field of a form.<br /><br />Direct creation and edition of complex geometries is already possible with the <a href="http://semantic-mediawiki.org/wiki/Special:MapEditor" target="_blank">Semantic Map MapEditor</a>. This editor should be enhanced in a number of ways: 1) It should be integrated as a special input type widget in the Semantic Form extension so edition of complex geographic entities could be done at the same time as the creation or edition of other type of information. This implies that resulting geometries should automatically be stored as properties when edition is finished. Right now you have to copy the geometry string generated by the tool in the target object page. 2) Multipolygons and holes creation should be allowed. This is essential to be able to create realistic geographical entities like lakes with island or countries sprinkled with lakes. 3) Display of previously entered features should be allowed in the editor and snapping of vertexes to these entities while creating or modifying the current entity should be enabled. This is essential to be able to create and edit topological, non-overlapping coverages of entities.</li></ul><ul><li><b>Storage for complex geometries in the database - </b>Once geometries have been captured or imported, SMW has to store them in the database as it does for any other page property. All databases supported by SMW have support for storing complex geometries following the OGC (Open Geospatial Consortium) Simple Features Specification standard. <a href="http://dev.mysql.com/doc/refman/5.6/en/spatial-extensions.html" target="_blank">MySQL supports the geometry class</a>, PostgreSQL has a very popular extension called <a href="http://postgis.net/" target="_blank">PostGIS</a>, which is the most used and advanced spatial database in the world, and SQLite has an extension called <a href="https://www.gaia-gis.it/fossil/libspatialite/index" target="_blank">SpatiaLite</a>. All of them store geometries in a GEOMETRY complex column type.<br /><br />Simple points (or geographic coordinates), as supported by the Semantic Map extension, can easily be manipulated by modifying their latitude or longitude component. It is hence useful to store them as a two decimal numbers in the wiki page of the geographical entity. Complex geometries however, sometimes composed of thousands of vertexes, cannot really be modified this way (think about the complex multilinestring composing the limits of the United States). One needs a graphical interface to move, add or delete the numerous vertexes. It might be misleading and useless to actually store and show the values for all those vertexes in a wiki page (in the WKT format for example). It would be better to simply show the unique identifier of each geometry and allow their modification only through a proper graphical interface.<br /><br />It would also be misleading to try to reinvent a way to store complex geometries specifically for SMW. It is true that one might be tempted to store complex geometries as WKT strings in a column of type STRING or TEXT but this would not benefit from a very important feature of the GEOMETRY type: spatial indexing, which dramatically speed up search for spatial entities. Complex geometries should be stored using the database proper spatial extension. Period. This implies that the support for complex geometry would be available over a PostgreSQL or a SQLite back end ONLY if the corresponding spatial extensions would be installed with the database. MySQL comes with the geometry type installed by default so there would be no need for an extra installation step for this backend.<br /><br /><b>Geospatial triplestores</b><br /><br />There are many reasons why some users prefer to store SMW data in a <a href="http://en.wikipedia.org/wiki/Triple_store" target="_blank">triplestore databases</a>. The first is to be able to use the SPARQL language which is better suited to query RDF stores and offers better performance than equivalent relation SQL-based implementations. A second reason is to <a href="http://semantic-mediawiki.org/wiki/Help:Using_SPARQL_and_RDF_stores" target="_blank">benefit from the ontologies capabilities</a> of some triplestore implementations. A third reason is to be able to interact with data stored in SMW with other SPARQL capable interface. These users will eventually also want equivalent support for geographical objects. Support for geometries storage and querying is however way less common in triplestores than it is in relational databases. There is an OGC standard called <a href="https://en.wikipedia.org/wiki/GeoSPARQL" target="_blank">GeoSPARQL</a> specifying how to represent and query geographical information in a triplestore but few systems support it up to now.<br /><br />Of the two triplestores most well-supported by SMW (4Store and Virtuoso), only the commercial version of Virtuoso has support for geometry and this is for points only. Support for complex geometries is planned for future versions. A number of other triplestores that support GeoSPARQL also provide support for complex geometries. The following table lists popular triplestores and their support for&nbsp;complex geometries and for&nbsp;GeoSPARQL.</li></ul><br /><table align="center" border="1"><tbody><tr><td><b>TRIPLESTORE</b></td><td><div style="text-align: center;"><b>LICENSE</b></div></td><td><div style="text-align: center;"><b>GEOMETRY SUPPORT</b></div></td><td><div style="text-align: center;"><b>GEOSPARQL SUPPORT</b></div></td></tr><tr><td><b>SQLStore3</b></td><td><div style="text-align: center;">Open Source</div></td><td><div style="text-align: center;">No</div></td><td><div style="text-align: center;">No</div></td></tr><tr><td><a href="http://4store.org/" target="_blank"><b>4Store</b></a></td><td><div style="text-align: center;">Open Source (GPL)</div></td><td><div style="text-align: center;"><a href="https://groups.google.com/forum/#!topic/4store-support/iTGhy9WPxUM" target="_blank">No</a></div></td><td><div style="text-align: center;">No</div></td></tr><tr><td><a href="http://virtuoso.openlinksw.com/" target="_blank"><b>Virtuoso</b></a></td><td><div style="text-align: center;">Open Source (GPL) <br />and commercial</div></td><td><div style="text-align: center;"><a href="http://www.mail-archive.com/virtuoso-users@lists.sourceforge.net/msg05241.html" target="_blank">Points only<br />&nbsp;in the commercial <br />version only</a></div></td><td><div style="text-align: center;">No</div></td></tr><tr><td><b><a href="http://parliament.semwebcentral.org/" target="_blank">Parliament</a></b></td><td><div style="text-align: center;">Open Source (BSD)</div></td><td><div style="text-align: center;"><a href="http://ontolog.cim3.net/file/work/SOCoP/Educational/GeoSPARQL%20User%20Guide.docx" target="_blank">Yes</a></div></td><td><div style="text-align: center;"><a href="http://parliament.semwebcentral.org/" target="_blank">Yes</a></div></td></tr><tr><td><a href="http://www.strabon.di.uoa.gr/" target="_blank"><b>Strabon</b></a></td><td><div style="text-align: center;">Open Source (MPL)</div></td><td><div style="text-align: center;">Yes</div></td><td><div style="text-align: center;">Yes</div></td></tr><tr><td><b><a href="http://www.openrdf.org/" target="_blank">Sesame</a></b></td><td><div style="text-align: center;">Open Source (BSD)</div></td><td><div style="text-align: center;">Not without uSeekM</div></td><td><div style="text-align: center;">Yes (<a href="https://dev.opensahara.com/projects/useekm" target="_blank">with uSeekM</a>)</div></td></tr><tr><td><b><a href="http://www.systap.com/bigdata.htm" target="_blank">Bigdata</a></b></td><td><div style="text-align: center;">Open Source (GPL)</div></td><td><div style="text-align: center;">Not without uSeekM</div></td><td><div style="text-align: center;">Yes (<a href="https://dev.opensahara.com/projects/useekm" target="_blank">with uSeekM</a>)</div></td></tr><tr><td><a href="http://www.franz.com/agraph/allegrograph" target="_blank"><b>AllegroGraph</b></a></td><td><div style="text-align: center;">Commercial (free and <br />paying editions)</div></td><td><div style="text-align: center;"><a href="http://www.franz.com/agraph/support/documentation/current/geospatial-tutorial.html" target="_blank">Yes</a></div></td><td><div style="text-align: center;">No</div></td></tr><tr><td><a href="http://www.ontotext.com/owlim" target="_blank"><b>OWLIM-SE</b></a> <br />(formerly BigOWLIM)</td><td><div style="text-align: center;">Commercial</div></td><td><div style="text-align: center;"><a href="http://owlim.ontotext.com/display/OWLIMv53/OWLIM-SE+Geo-spatial+Extensions" target="_blank">Points only</a></div></td><td><div style="text-align: center;">No</div></td></tr><tr><td><a href="http://jena.apache.org/" target="_blank"><b>Jena</b></a></td><td><div style="text-align: center;">Open Source (Apache)</div></td><td><div style="text-align: center;">No</div></td><td><div style="text-align: center;"><a href="http://mail-archives.apache.org/mod_mbox/jena-users/201205.mbox/%3C4FA38215.7020700@googlemail.com%3E" target="_blank">No</a></div></td></tr></tbody></table><br /><ul><li><b>Display and symbology - </b>Display of individual geometries does not really pose a problem. Technology for displaying georeferenced information like points, lines and polygons is mature and already very well used by the Semantic Map extension. Complex geometries can be displayed the same way points are displayed over a base map like Google Map or OpenStreetMap using the mapping widget API. A challenge arises when geometries have to be displayed as the result of a spatial query (like what are the geographical entities 5 kilometers close to another entity). Spatially querying for points in a database is one thing (it can resume to a search based on two indexed columns like longitude and latitude), but querying for complex geometries is another. Queries have to be efficient to select only geometries that must be displayed at some zoom level in the map displaying query results. And this is where it becomes important to store the geometries using the spatial facilities provided by each databases, indexes included.<br /><br />Another consideration is map styling (or <a href="http://en.wikipedia.org/wiki/Map_symbolization" target="_blank">symbolization</a>). The same way semantic data is formatted using templates in SMW, geographic information has to be drawn using some reusable styles. As in other information systems, it is a very good practice to separate data (here the geometries themselves and their attributes) from their representation (some styling protocol).<br /><br />The KML format proposes <a href="https://developers.google.com/kml/documentation/kmlreference#style" target="_blank">a number of styling primitives</a>. They are normally embedded with the data in the KML file but the same primitives could be reused to define styles which maps refered to using some identifiers. SLD (<a href="http://www.opengeospatial.org/standards/sld" target="_blank">Styled Layer Descriptor</a>) is another standard for styling geographic entities which is increasing in popularity and is clearly inspired by CSS.<br /><br />The same way templates are stored in pages in the "Template" namespace, geographic styles could be stored as pages in a "Geographic Style" namespace. The page names could be the way to identify styles used in a map. These styles would be created and modified using a Semantic Form form. A number of predefined styles could be provided for often used points, lines and polygons geographical entities as part of a spatially enabled Semantic MediaWiki installation. Sounds like a Geographic Information System, no?</li></ul><ul><li><b>Functions and Queries - </b>If SMW is to provide the main functionalities of a web spatial database, it must provide, as far as possible, the main functions and operators that are expected to be found in a spatial database. In MediaWiki and SMW terminology we speak about <a href="http://www.mediawiki.org/wiki/Help:Magic_words#Parser_functions" target="_blank">Parser functions</a>&nbsp;and&nbsp;<a href="http://semantic-mediawiki.org/wiki/Help:Selecting_pages#Comparators" target="_blank">Comparators</a>. Parser functions, like in any other language, take input arguments and return a value computed from those arguments. They are used in the wikitext of any page to display some values derived from a property. Comparators are special function that return boolean values and are used in <a href="http://semantic-mediawiki.org/wiki/Help:Semantic_search" target="_blank">semantic queries</a> to discriminate entities based on their properties.<br /><br />There is a <a href="http://postgis.net/docs/manual-2.0/reference.html" target="_blank">huge number of functions possible on complex georeferenced geometries</a>. A spatially enabled SMW should first provide basic functions that could be useful in articles about geographic entities: area of the geometry, length of the geometry, type of the geometry, a geometry representing the envelope (or the extent) of the geometry, a geometry representing a buffer around the geometry or the centroid of the geometry.<br /><br />The <a href="http://postgis.net/docs/manual-2.0/reference.html#Spatial_Relationships_Measurements" target="_blank">most used operators</a>&nbsp;are&nbsp;"intersects" which&nbsp;returns TRUE if a geometry touches another one and "within" if one geometry is completely inside another one.</li></ul><ul><li><b>Support for various coordinate systems - </b>Old geographic information systems did not have any support for spatial data georerenced in different coordinate systems. Everything had to be projected in the same coordinate system BEFORE being integrated into the system. Then came more sophisticated systems reprojecting geographic features on-the-fly in different coordinate systems. Those systems require a way to describe coordinate systems and associate them with a unique identifier so that geometries in one system can be projected to another one. This unique identifier is stored in the geometry to refer to the coordinate system in which it is stored.<br /><br />There are <a href="http://spatialreference.org/ref/epsg/4326/" target="_blank">many ways to describe coordinate systems</a>: OGC WKT, Proj4, GML, ESRI and many coordinate systems have unique identifier (SRID) given by organizations like ESRI or the <a href="http://www.epsg.org/" target="_blank">EPSG</a>.<br /><br />On the same principle described above for maps styles, coordinate systems could be stored in SMW each in a separate page of a special namespace as a set of properties. One of those properties would be the SRID and this number could then be referenced in each geometry.<br /><br />SMW does not have to support multiple coordinate systems now. As in former GIS, it could at first only support representation of entities using the same coordinates system. The KML format, for instance, and hence Google map and Google Earth, only support geometries defined in the WGS 84 coordinate system. Sticking to one coordinate system avoid the problem of storing many of them in a new namespace. On-the-fly reprojection could be added later and a namespace with many predefined coordinate systems could be provided. An easy way to add coordinate systems should be described so that only the ones used by a specific application could be loaded in the namespace.</li></ul><ul><li><b>Support for raster data - </b>A geographic information system or geodatabase is not complete until it provides support for raster datasets. Geometries and rasters are the two most fundamental ways to represent geographic phenomena: geometries (or vector) for discontinuous phenomena like administrative limits, roads, hydrographic networks or populated places and rasters for continuous phenomena like elevation or temperature. Most spatial databases provide support for raster in addition to vector and a pleiade of functions to manipulate rasters and extract information from them based on geometries (e.g. what is the mean elevation for a set of villages?).<br /><br />It is still however hard to figure how images could be used in a SMW installation. A typical SMW installation act like a database storing information about entities having properties. A geometry is just one property of an entity among others&nbsp;(its geographical footprint). We can think of two cases where raster act as the property of an entity: 1) the raster is the entity itself (e.g. in a catalog of satellite or aerial images) 2) the entity is represented better by a raster than by a geometry (e.g. fuzzy objects like animal territories or soil deposits). In any case, a raster would be stored very much like a geography, in a special format, held as a property. We are still far from needing raster support in a geospatial enabled SMW. After all, some geodatabase systems, like PostGIS, have been used for almost a decade before providing support for raster and that did not prevent users from using it in thousands of applications.</li></ul><br /><h3><b>A Development Roadmap</b></h3><div><b><br /></b></div>So far we have defined what SMW needs to be turned into a full web geospatial database. Such a project has to be well planned and implemented gradually by little useful steps. What needs to be done? What are the priorities? Here is a sketch of what could become a possible development roadmap:<br /><ol><li>Start from the Semantic Map extension. Either extend it or fork it. The first task would be to make the "Geographic Coordinate" type to be stored as a WKT string.</li><li>Make the backend code store "Geographic Coordinates" as a "geometry" type with the spatial extension when such an extension is available in the database.</li><li>Change the name "Geographic Coordinates" for "geometry" to be conformant with other spatial databases. Geographic coordinates would become geometries of type "point".</li><li>Add the capacity to create lines and polygons by adding a "Special:ImportSHP" to the Data Transfer extension to be able to import shapefiles as new SMW pages. Add the possibility to import only some rows and assign them to existing pages. Add the option to import other attributes as well. Store those geometries with the database spatial extension "geometry" type and make sure to be able to display them in the wiki pages. You can display all the vertexes coordinates in the wikitext for now.</li><li>For lines and polygons, stop showing the vertexes coordinates in the wikitext. Show only an identifier representing the geometry stored in the DB. You could also hide the coordinates only when there are more than 10 or 20 of them. Show them for "point" geometries as they are easy to edit.</li><li>Add an "intersects" operator (or comparator) so we can query for entities inside some geographical limits or not farther than some kilometers from another one. Such an operation should be delegated to the database spatial extension. Don’t reinvent the wheel...</li><li>Reuse the Special:ImportSHP code to create a "Load Shape" form input in Semantic Form so geometries can be imported, one at a time, when creating new pages with the Semantic Form extension.</li><li>Do a "Special:ImportKML" and a form input. Importing from shapefile might be sufficient for a while but people might want to import from KML files very soon.</li><li>Make sure to be able to edit points, lines and polygons with the Semantic Map MapEditor directly from a Semantic Form form. Make sure it supports multipolylines and multipolygons and holes creation and edition. Allows it to display other surrounding geometries so one can "snap" to their vertexes or edges while editing.</li><li>Define and create predefined MapStyle pages to be used as symbology alias when mapping features.</li><li>Provides standardized coordinates systems and a way to import some of them in an active wiki. Implement on the fly reprojection of entities defined in different coordinate systems when mapping them.</li></ol><br />This is it! This should turn Semantic MediaWiki in one of the most flexible and accessible web geospatial application development platform. That would take development of rich geoweb applications from the hands of web developers and put it in the hands of web and content designers. I hope some organizations will find this project attractive and estimate that it would be worth investing some developer time or money in it. There are a number of <a href="http://www.mediawiki.org/wiki/Professional_development_and_consulting" target="_blank">companies specialized in MediaWiki consulting</a> out there just ready to hear from you!<br /><br />And, please, your comments are welcome!<br /><br /><br />Pierre Racinehttp://www.blogger.com/profile/17199173181807664396noreply@blogger.com5tag:blogger.com,1999:blog-3548694469101129309.post-29630293874531311822012-12-13T08:36:00.004-08:002012-12-13T12:30:38.794-08:00Duncan Golicher's series of articles on PostGIS Raster<a href="http://bdi.ecosur.mx/personal/InformacionGeneral.aspx?ID=GolicherDuncan" target="_blank">Duncan Golicher</a> is researcher in forest ecology at the <a href="http://www.ecosur.mx/" target="_blank">Colegio de la Frontera sur</a> in Mexico. "The Colegio de la Frontera Sur is a scientific research center that seeks to contribute to the sustainable development of the southern border of Mexico, Central America and the Caribbean through the generation of knowledge, the formation of human resources and the linkage between social and natural sciences."<br /><br />In the past few months, Duncan has been experiencing a lot with the new raster/vector analysis capacities in PostGIS 2.0 and the integration of PostGIS data in his R workflow. The nicest about it is that he has posted a quite impressive series of articles about his experience in his <a href="http://duncanjg.wordpress.com/" target="_blank">blog</a>&nbsp;and&nbsp;I thought it was worth sharing them here (so I gain a bit of time to write my next long article :). Here are the posts in chronological order:<br /><br /><ul><li><a href="http://duncanjg.wordpress.com/2011/09/24/installing-postgis-in-ubuntu-natty/" target="_blank">Installing PostGIS in Ubuntu Natty</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2011/09/25/nearest-neighbour-in-postgis/" target="_blank">Nearest neighbour in POSTGIS</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2011/10/04/connecting-to-a-postgresql-data-base-from-r-ubuntu-natty/" target="_blank">Connecting to a Postgresql data base from R (Ubuntu Natty)</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/09/23/edge-to-edge-distance-using-postgis/" target="_blank">Edge to edge distance using PostGIS</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/10/11/installing-postgis2-0-on-windows-7/" target="_blank">Installing PostGIS2.0 on Windows 7</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/10/28/postgis-raster/" target="_blank">Using PostGIS raster to hold WorldClim data</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/10/29/working-with-raster-data-in-postgis/" target="_blank">Working with raster data in PostGIS</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/10/30/tile-size-for-raster-vector-overlays-in-postgis/" target="_blank">Tile size for raster vector overlays in PostGIS</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/10/31/vector-raster-overlays-in-postgis-2/" target="_blank">Vector raster overlays in PostGIS 2</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/11/01/speeding-up-buffer-on-raster-overlays-in-postgis/" target="_blank">Speeding up buffered point on raster overlays in PostGIS</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/11/10/building-utility-functions-in-postgis/" target="_blank">Building utility functions in PostGIS</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/11/10/comparing-worldclim-with-local-climate-data/" target="_blank">Comparing WorldClim with local climate data</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/11/20/the-basics-of-postgis-raster/" target="_blank">Loading raster data in PostGIS and visualising the results in QGIS</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/11/29/importing-modis-fire-pixels-into-postgis/" target="_blank">Importing Modis fire pixels into PostGIS</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/12/05/ten-fold-speed-up-using-postgis2-1-svn/" target="_blank">Ten fold speed up using PostGIS2.1.SVN</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/12/11/loading-modis-ndvi-time-series-into-postgis-raster/" target="_blank">Loading Modis NDVI time series into PostGIS raster</a></li></ul><ul><li><a href="http://duncanjg.wordpress.com/2012/12/11/postgis-raster-statistics-with-plr/" target="_blank">PostGIS raster statistics with PLR</a></li></ul><br />Nice work Duncan!Pierre Racinehttp://www.blogger.com/profile/17199173181807664396noreply@blogger.com0tag:blogger.com,1999:blog-3548694469101129309.post-91635363659477016362012-10-22T12:13:00.001-07:002012-10-24T01:28:34.999-07:00Read and write PostGIS rasters with FME Beta now!It's done! <a href="http://www.safe.com/support/support-resources/fme-downloads/beta/" target="_blank">FME 2013 Beta</a> reads and writes rasters from and to PostGIS! Safe Software posted <a href="http://www.youtube.com/watch?v=BoWjZLmV7Qc" target="_blank">a short tutorial in YouTube</a> at the beginning of the month explaining how to write rasters into PostGIS and how to read them back using FME Workbench. There is no option to tile the raster before loading it because FME provides another tool, the "RasterTiler" transformer, to tile a raster source before feeding the PostGIS Writer.<br /><br /><a name='more'></a><br /><br /><br />FME Workbench provides plenty of other tools to manipulate rasters before writing them to PostGIS or after reading them. You can for example create a mosaic from many rasters, georeference a raster or create a new raster using a map algebra expression.<br /><br />A number of other issues related to PostGIS have also been fixed in the current beta:<br /><br /><ul><li>Added support for new 3D geometry types in reader and writer (PR#40339)</li><li>Added support for Compound Curves in Multicurves (PR#30317, PR#32936)</li><li>Fixed some problems in PostGIS 2.0 support (C57751 C58005, C58619, C60664, C61360, C61794, PR#32936, PR#39490)</li><li>Fixed escaping of special characters in string literals to avoid an unwanted warning (C44557, C57725 PR#29298, PR#32905)</li><li>Improved client encoding handling (C55816, PR#37045)</li></ul><div>The Beta is available for <a href="http://www.safe.com/support/support-resources/fme-downloads/beta/" target="_blank">download</a>.</div><br />Pierre Racinehttp://www.blogger.com/profile/17199173181807664396noreply@blogger.com0tag:blogger.com,1999:blog-3548694469101129309.post-30925495631713337032012-09-05T08:53:00.002-07:002012-10-24T01:29:34.723-07:00Loading many rasters into separate tables with the PostGIS raster loaderThere is no way yet, with the PostGIS raster loader, to load a series of rasters into separate tables. You can certainly load a series of rasters invoking the loader like this:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">raster2pgsql -s 4236&nbsp;-t 100x100 -I -C -F dem*.* public.dem | psql -d mydb</pre><br />But all the rasters will be loaded into the same table, splited into 100x100 tiles. If you load ten 1000x1000 rasters, you end up with a table of 1000 rows.<br /><br /><a name='more'></a><br /><br /><b>A DOS batch</b><br /><br />You can write a DOS or a bash script to load everything in separate tables Here is a DOS one:<br /><br /><pre style="background-color: #eeeeee; padding: 5px;">@ECHO OFF<br />SETLOCAL ENABLEDELAYEDEXPANSION<br />FOR %%F IN (dem*.tif) DO (<br /> SET _rid=%%F<br /> raster2pgsql -s 4236 -t 100x100 -I -C -F %%F public.dem_x_!_rid:~4,2! | psql -d mydb<br />)</pre><br class="Apple-interchange-newline" />Just save this code in a .bat file, adjust the filename pattern (dem*.tif) to match the raster files you want to load, the table name generator (public.dem_x_!_rid:~4,2!) and the raster2pgsql options you want and run the script. It should load all the rasters matching the pattern in their&nbsp;own&nbsp;tables. In this example, the table name generator takes the fourth and the fifth characters from the filename (presumably two numbers) to create a unique identifier for each table.&nbsp;Very quick, very handy! (Thanks for sending me your Linux or Unix bash version so I can add it here.)<br /><br /><b>How to do that in SQL</b><br /><br />Sometimes I&nbsp;prefer to do things in SQL.&nbsp;I wrote, some times ago, a PL/pgSQL function to split a table into many tables based on one of its attribute. It goes like this:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">-----------------------------------------------------------------------<br />-- SplitTable<br />-- Split a table into a series of tables which names are composed of the <br />-- concatenation of a prefix and the value of a column.<br />--<br />-- sourcetablename &nbsp; - Name of the table to split into multiple table<br />-- targettableschema - Name of the schema in which to create the new set<br />-- of table<br />-- targettableprefix - Prefix of the set of table names to create.<br />-- suffixcolumnname &nbsp;- Name of the column providing the suffix to each name.<br />-----------------------------------------------------------------------<br />CREATE OR REPLACE FUNCTION SplitTable(sourcetablename text, targettableschema text, targettableprefix text, suffixcolumnname text)<br />RETURNS int AS<br />$BODY$<br />DECLARE<br /> newtablename text;<br /> uniqueid RECORD;<br />BEGIN<br /> FOR uniqueid IN EXECUTE 'SELECT DISTINCT ' || quote_ident(suffixcolumnname) || '::text AS xyz123 FROM ' || quote_ident(targettableschema) || '.' || quote_ident(sourcetablename) LOOP<br /> newtablename := targettableprefix || uniqueid.xyz123;<br /> EXECUTE 'CREATE TABLE ' || quote_ident(targettableschema) || '.' || quote_ident(newtablename) || ' AS SELECT * FROM ' || sourcetablename || ' WHERE ' || suffixcolumnname || '::text = ' || quote_literal(uniqueid.xyz123);<br /> END LOOP;<br /> RETURN 1;<br />END;<br />$BODY$<br />LANGUAGE plpgsql VOLATILE STRICT;</pre><br /><div>Taking for granted that you loaded the rasters with the first command and that you included the -F option to create a column containing the filename, you can then update this column in order to extract the two numbers uniquely identifying the filename that we will use to uniquely identify the tables:<br /><br /><div><pre style="background-color: #eeeeee; padding: 5px;">UPDATE dem SET filename = substring(filename from 4 for 2);</pre></div><br />You can then use SplitTable() like this:<br /><br /><pre style="background-color: #eeeeee; padding: 5px;">SELECT SplitTable('dem', 'public', 'dem_', 'filename');</pre></div><br />to split the original table into as many tables as there are unique ids in the filename column, each tiles in the right table. For the same example as above, you would end up with 10 tables of 100 rows each.<br /><br />This function is very useful in the PostGIS raster context but can also be used to split non-PostGIS tables as well. This is why it takes care&nbsp;neither&nbsp;of the spatial index, nor the constraints created by the original command with the -I and the -C options. For that I prefered to write another very generic function executing a simple SQL statement on a series of tables:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">---------------------------------------------<br />-- QueryTables<br />-- Execute a query on a series of tables based on a prefix.<br />-- The 'tablename' string will be replaced by the name of the table.<br />--<br />-- schemaname - Name of the schema where to execute the queries.<br />-- prefix - Prefix to restraint the query to tables names starting with this-- string.<br />-- inquery - Query to execute. Can contain the 'tablename' string which will -- be replaced buy the name of the current table.<br />---------------------------------------------<br />CREATE OR REPLACE FUNCTION QueryTables(schemaname text, prefix text, inquery text)<br />RETURNS int AS<br />$BODY$<br />DECLARE<br />&nbsp; &nbsp; tabletoquery RECORD;<br />BEGIN<br />&nbsp; &nbsp; FOR tabletoquery IN EXECUTE 'SELECT tablename FROM pg_tables WHERE schemaname = ' || quote_literal(schemaname) || ' AND tablename LIKE ' || quote_literal(prefix || '%') LOOP<br />&nbsp; &nbsp; &nbsp; &nbsp; RAISE NOTICE 'Querying %', schemaname || '.' || tabletoquery.tablename;<br />&nbsp; &nbsp; &nbsp; &nbsp; EXECUTE replace(inquery, 'tablename', tabletoquery.tablename);<br />&nbsp; &nbsp; END LOOP;<br />&nbsp; &nbsp; RETURN 1;<br />END;<br />$BODY$<br />LANGUAGE plpgsql VOLATILE STRICT;</pre><br />Using this function, you can create a spatial index on all the splitted tables like this:<br /><br /><pre style="background-color: #eeeeee; padding: 5px;">SELECT QueryTables('public', 'dem_', 'CREATE INDEX tablename_gist&nbsp;<br />ON public.tablename USING gist (st_convexhull(rast));');</pre><br /><div>If you want all your tables to be listed properly in the 'raster_column' view, you can add all the rasters constraints:<br /><br /><pre style="background-color: #eeeeee; padding: 5px;">SELECT QueryTables('public', 'dem_', 'SELECT AddRasterConstraints(''public''::name, ''tablename''::name, ''rast''::name);');</pre><br /><div>Take care at the double single quote in the 'inquery' parameter.<br /><br />You can finally remove the now useless 'filename' column from all the tables the same way:<br /><br /><pre style="background-color: #eeeeee; padding: 5px;">SELECT QueryTables('public', 'dem_', 'ALTER TABLE public.tablename DROP column filename;');</pre><br class="Apple-interchange-newline" />QueryTables() is also a very generic function that can be used, with caution, for many other purposes. You could,&nbsp;for example,&nbsp;use it to DROP all the tables created in one single statement.<br /><br />So, for splitting many rasters into separate tables using the above SQL two functions:</div><div><ol><li>Load your rasters using the wildcard in raster2pgsql,</li><li>Update the 'filename' column to create a short unique table identifier,</li><li>Split the raster into their respective tables using the SplitTable() function,</li><li>Create an index on each table using the&nbsp;QueryTables() function,</li><li>Add the raster constraints to each table&nbsp; using the&nbsp;QueryTables() function.</li></ol><div>Using the SQL functions certainly looks cumbersome comparing with the DOS batch and should be slower since you are writing the rasters two times in the database. However, with the SQL solution, you get two functions very useful to manipulate PostgreSQL tables.<br /><br />No need to say, we definitely need an extra option to raster2pgsql to load a series of rasters to their respective table...</div></div></div>Pierre Racinehttp://www.blogger.com/profile/17199173181807664396noreply@blogger.com4tag:blogger.com,1999:blog-3548694469101129309.post-56591270337786549572012-08-27T08:17:00.001-07:002012-10-24T01:26:22.437-07:00FME 2013 will soon read and write rasters stored in PostGIS<a href="http://www.safe.com/" target="_blank">Safe Software</a> will write a reader, a writer and some others functionalities to interact with rasters stored in PostGIS! These tools should be released with FME 2013 and already available in <a href="http://www.safe.com/support/support-resources/fme-downloads/beta/" target="_blank">future beta versions of FME this fall</a>. A great news for the PostGIS community!<br /><br />On the GDAL side, thanks to Jorge Arevalo,&nbsp;<a href="http://libregis.org/2012/08/25/updating-gdal-postgis-raster-driver/" target="_blank">lots of improvements</a> have been recently done on the <a href="http://trac.osgeo.org/gdal/wiki/frmts_wtkraster.html" target="_blank">PostGIS Raster driver</a>. You can now read irregular arrangements of raster and translate them into any format supported by GDAL. We are still looking for sponsors to implement the writer part of the driver.Pierre Racinehttp://www.blogger.com/profile/17199173181807664396noreply@blogger.com0tag:blogger.com,1999:blog-3548694469101129309.post-1839717581271131592012-07-23T14:19:00.001-07:002012-10-24T01:30:51.451-07:00A Slow, Yet 1000x Faster, Alternative to ST_Union(raster) in PostGIS!<div dir="ltr" style="text-align: left;" trbidi="on">As a first post in this new blog I'd like to share an alternative method to merge a tiled raster coverage into a single row raster. For those who followed PostGIS Raster development, you have seen how we now use the very generic ST_Union() aggregate function to merge rasters in a very identical way you would merge geometries:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">SELECT ST_Union(rast) rast<br />FROM tiledrastertable</pre><br />ST_Union(raster) is a PL/pgSQL aggregate using ST_MapAlgebraExpr(raster, raster, expression) as a state function. It is very flexible in that you can add a parameter to specify how to aggregate the pixel values when two or more pixels overlap. You can use 'FIRST', 'LAST', 'MAX', 'MIN' 'MEAN', 'SUM'. 'LAST' is the default.<br /><br /><a name='more'></a><br /><br />ST_Union() works very well and fast enough when rasters to aggregate have the same footprint. In other word when they mostly overlap. You can, for example, compute a raster of the mean temperature from a temporal series of temperature rasters stored in the same table like this:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">SELECT ST_Union(rast, 'MEAN') meantemprast<br />FROM temp_series<br />WHERE year &gt; 1999 AND year &lt; 2010<br /></pre><br />Because the extent of a raster produced with ST_Union() is the aggregated union of the extent of all the rasters involved, ST_Union() can also be used to merge non overlapping rasters (like all the non overlapping tiles of a tiled coverage).<br /><br />Unfortunately ST_Union() proved to be very slow in this case. Merging together the&nbsp;187200, 50x50 pixels, tiles of a SRTM raster would, for instance, take almost forever.&nbsp;The reason is that every time a new tile is passed to the ST_MapAlgebraExpr(raster, raster, expr)&nbsp;state function, ST_MapAlgebraExpr() has to iterate over all the pixels of the temporary unioned area, even the ones who where already unioned!<br /><br />You can see le blue line in the graph below representing the time ST_Union() takes to merge 12x12 tiles depending on the number of pixels expected in the resulting unioned raster.<br /><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-dcPBnhychOk/UAmkFnQsadI/AAAAAAAADDc/FCpmsIIozKk/s1600/New+Picture.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="207" src="http://3.bp.blogspot.com/-dcPBnhychOk/UAmkFnQsadI/AAAAAAAADDc/FCpmsIIozKk/s800/New+Picture.png" width="400" /></a></div><br />We can compute that the total number of pixels visited grows, for n tiles of fixed size K, like the following summation:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-sDNPH7j-G74/UA1bitrxQ1I/AAAAAAAADD4/O3LeyKNLfyc/s1600/equation.GIF" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="93" src="http://3.bp.blogspot.com/-sDNPH7j-G74/UA1bitrxQ1I/AAAAAAAADD4/O3LeyKNLfyc/s1600/equation.GIF" width="320" /></a></div>That means the number of pixel visited grows by an order O(n<sup>2</sup>).&nbsp;Very inefficient!&nbsp;This equation is right only for the case when rasters involved in the union are totally&nbsp;disjoints. This is not the case when the rasters totally overlap like when computing the mean of a time series. In this case ST_Union(raster) stays the best option.<br /><br />I will discuss further how we can improve ST_Union() performance in another post. For now I would just like to propose a temporary, still much faster, alternative.<br /><br /><b>A faster, less flexible alternative: ST_FasterUnion()</b><br /><br />This alternative is still slow, if you compare with GDAL or any other package able to merge rasters, but if you look at the red line in the graph above, the computing time grows linearly when the number of aggregated pixel grows. This is why I can&nbsp;confidently&nbsp;say that, starting at some size, the new function is 1000x times faster! Much, much better than the actual ST_Union() function.&nbsp;I could even have said 1 000 000x faster since this is also true starting when the number of tile reach a certain number. But let's stay humble :-)<br /><br />The new function uses ST_MapAlgebraFct() which&nbsp;iterates over all the pixels of a raster assigning new values computed by a custom PL/plSQL function. Our custom function will simply search the whole original raster coverage to get these values. This operation is very fast when the coverage is tiled and indexed.<br /><br />Here is the custom&nbsp;PL/plSQL&nbsp;function&nbsp;ST_MapAlgebraFct() will call. It has to be defined before our new ST_FasterUnion() function:<br /><br /><pre style="background: #EEEEEE; padding: 5px;">CREATE OR REPLACE FUNCTION ST_FirstRasterValue4ma(pixel FLOAT,<br /> pos INTEGER[], <br /> VARIADIC args TEXT[])<br />RETURNS FLOAT<br />AS $$ <br /> DECLARE<br /> pixelgeom text;<br /> result float4;<br /> query text;<br /> BEGIN<br /> -- Reconstruct the current pixel centroid<br /> pixelgeom = ST_AsText(<br /> ST_Centroid(<br /> ST_PixelAsPolygon(<br /> ST_MakeEmptyRaster(args[1]::integer,<br /> args[2]::integer, <br /> args[3]::float,<br /> args[4]::float, <br /> args[5]::float, <br /> args[6]::float,<br /> args[7]::float,<br /> args[8]::float, <br /> args[9]::integer), <br /> pos[1]::integer, <br /> pos[2]::integer)));<br /> <br /> -- Intersects it with the raster coverage to find the right value<br /> query = 'SELECT ST_Value(' || quote_ident(args[12]) || <br /> ', ST_GeomFromText(' || quote_literal(pixelgeom) || <br /> ', ' || args[9] || <br /> ')) FROM ' || quote_ident(args[10]) || <br /> '.' || quote_ident(args[11]) || <br /> ' WHERE ST_Intersects(ST_GeomFromText(' ||<br /> quote_literal(pixelgeom) || ', '|| args[9] || '), ' ||<br /> quote_ident(args[12]) || ') LIMIT 1';<br /> EXECUTE query INTO result;<br /> RETURN result;<br /> END; $$<br /> LANGUAGE 'plpgsql' IMMUTABLE;<br /></pre><br />It basically converts the current pixel into its centroid, convert this geometry to its text representation (so it can be inserted in the query string) and then intersects this point back with the raster coverage which schema's name, table's name and raster column's name are provided.<br /><br />Next is the wrapper function aggregating all the raster extents into the final raster and passing this raster to&nbsp;ST_MapAlgebraFct():<br /><br /><pre style="background: #EEEEEE; padding: 5px;">CREATE OR REPLACE FUNCTION ST_FasterUnion(schemaname text, tablename text, rastercolumnname text)<br />RETURNS raster<br />AS $$ <br /> DECLARE<br /> query text;<br /> newrast raster;<br /> BEGIN<br /> query = '<br />SELECT ST_MapAlgebraFct(rast,<br /> ''ST_FirstRasterValue4ma(float,<br /> integer[],<br /> text[])''::regprocedure, <br /> ST_Width(rast)::text,<br /> ST_Height(rast)::text,<br /> ST_UpperLeftX(rast)::text,<br /> ST_UpperLeftY(rast)::text,<br /> ST_ScaleX(rast)::text,<br /> ST_ScaleY(rast)::text,<br /> ST_SkewX(rast)::text,<br /> ST_SkewY(rast)::text,<br /> ST_SRID(rast)::text,' || <br /> quote_literal(schemaname) || ', ' ||<br /> quote_literal(tablename) || ', ' ||<br /> quote_literal(rastercolumnname) || '<br /> ) rast<br />FROM (SELECT ST_AsRaster(ST_Union(rast::geometry), <br /> min(scalex),<br /> min(scaley),<br /> min(gridx),<br /> min(gridy),<br /> min(pixeltype),<br /> 0,<br /> min(nodataval)<br /> ) rast<br /> FROM (SELECT ' || quote_ident(rastercolumnname) || <br /> ' rast,<br /> ST_ScaleX(' || quote_ident(rastercolumnname) || ') scalex, <br /> ST_ScaleY(' || quote_ident(rastercolumnname) || ') scaley, <br /> ST_UpperLeftX(' || quote_ident(rastercolumnname) || ') gridx, <br /> ST_UpperLeftY(' || quote_ident(rastercolumnname) || ') gridy, <br /> ST_BandPixelType(' || quote_ident(rastercolumnname) || ') pixeltype, <br /> ST_BandNodataValue(' || quote_ident(rastercolumnname) || ') nodataval<br /> FROM ' || quote_ident(schemaname) || '.' || quote_ident(tablename) || ' <br /> ) foo<br /> ) foo2';<br /> EXECUTE query INTO newrast;<br /> RETURN newrast;<br /> END; $$<br /> LANGUAGE 'plpgsql' IMMUTABLE;<br /><br /></pre><br />And here is how to call the function passing the names of the schema, table and raster column we want to union:<br /><br /><pre style="background: #EEEEEE;">CREATE TABLE schema.rastertable_unioned AS<br />SELECT ST_FasterUnion('schema', 'rastertable', 'rast') rast;<br /></pre><div><br /></div>The function is less flexible than the ST_Union() aggregate which work on any result of a SELECT query (only year 2000 from a table containing many years for example) but you can always build a view on the table and pass the schema, name and raster column of this view to the function if you want to union a table subset.<br /><br />You can modify the custom function if some of your rasters overlaps and you would like to get the mean, the max, the min or the sum of all the pixel found for each pixel of the aggregate instead of simply the value of the first pixel.<br /><br />You could also, and this is very interesting, modify the custom function to get values from a vector coverage. Do do that, you simply reconstitute the shape of the pixel (just remove the part that create the centroid) and then intersect this polygon with a point, a line or a polygon coverage to get the metric you wish. You can, for instance, get the value of the biggest intersecting area from a polygon coverage, or the number of intersecting points from a point coverage to create a density raster. You could also create a buffer around the pixel to get the statistic and hence create a moving window over a vector layer. Possibilities are endless...<br /><br />I hope this temporary solution to merge tiles from a table will satisfy in real need of doing unions of rasters...<br /><br />If a more flexible, even faster ST_Union() is&nbsp;crucial&nbsp;for your workflow, you can also sponsor the work on a clever ST_MapAlgebra(raster, raster, expression) (I will explain how in a future post) and a C implementation of ST_Union(raster). All PostGIS Raster users would greatly benefit from this work ;-) </div>Pierre Racinehttp://www.blogger.com/profile/17199173181807664396noreply@blogger.com3