A fascination with internet maps

Main menu

Category Archives: PostGIS

Post navigation

Fig 1 – Population skyline of New York - Census SF1 P0010001 and Bing Road Map

Demographic Terrain

My last blog post, Demographic Landscapes, described leveraging new SFCGAL PostGIS 3D functions to display 3D extruded polygons with X3Dom. However, there are other ways to display census demographics with WebGL. WebGL meshes are a good way to handle terrain DEM. In essence the US Census is providing demographic value terrains, so 3D terrain meshes are an intriguing approach to visualization.

Babylon.js is a powerful 3D engine written in javascript for rendering WebGL scenes. In order to use Babylon.js, US Census SF1 data needs to be added as WebGL 3D mesh objects. Babylon.js offers a low effort tool for generating these meshes from grayscale height map images: BABYLON.Mesh.CreateGroundFromHeightMap.

Modifying the Census WMS service to produce grayscale images at the highest SF1 polygon resolution i.e. tabblock, is the easiest approach to generating these meshes. I added an additional custom request to my WMS service, “request=GetHeightMap,” which returns a PixelFormat.Format32bppPArgb bitmap with demographic values coded as grayscale. This is equivalent to a 255 range classifier for population values.

Microsoft HoloLens will up the coolness but not the hipster factor while improving usability immensely ( when released ). I’m inclined to the minimalist movement myself, but I’d be willing to write a Windows 10 app with the HoloLens SDK to see how well it performs.

PostGIS 2.2 is due for release sometime in August of 2015. Among other things, PostGIS 2.2 adds some interesting 3D functions via SFCGAL. ST_Extrude in tandem with ST_AsX3D offers a simple way to view a census polygon map in 3D. With these functions built into PostGIS, queries returning x3d text are possible.

x3d format is not directly visible in a browser, but it can be packaged into x3dom for use in any WebGL enabled browser. Packaging x3d into an x3dom container allows return of an .x3d mime type model/x3d+xml, for an x3dom inline content html.

In this case I added a non-standard WMS service adapted to add a new kind of request, GetX3D.

x3d is an open xml standards for leveraging immediate mode WebGL graphics in the browser. x3dom is an open source javascript library for translating x3d xml into WebGL in the browser.

“X3DOM (pronounced X-Freedom) is an open-source framework and runtime for 3D graphics on the Web. It can be freely used for non-commercial and commercial purposes, and is dual-licensed under MIT and GPL license.”

If you can see this, your browser doesn’t
understand IFRAME. However, we’ll stilllink
you to the file.

Why X3D?

I’ll admit it’s fun, but novelty may not always be helpful. Adding elevation does show demographic values in finer detail than the coarse classification used by a thematic color range. This experiment did not delve into the bivariate world, but multiple value modelling is possible using color and elevation with perhaps less interpretive misgivings than a bivariate thematic color scheme.

However, pondering the visionary, why should analysts be left out of the upcoming immersive world? If Occulus Rift, HoloLens, or Google Cardboard are part of our future, analysts will want to wander through population landscapes exploring avenues of millennials and valleys of the aged. My primitive experiments show only a bit of demographic landscape but eventually demographic terrain streams will be layer choices available to web users for exploration.

Demographic landscapes like the census are still grounded, tethered to real objects. The towering polygon on the left recapitulates the geophysical apartment highrise, a looming block of 18-22 year olds reflect a military base. But models potentially float away from geophysical grounding. Facebook networks are less about physical location than network relation. Abstracted models of relationship are also subject to helpful visualization in 3D. Unfortunately we have only a paltry few dimensions to work with, ruling out value landscapes of higher dimensions.

Fig 3 – P0010001 Jenks Population

Some problems

For some reason IE11 always reverts to software rendering instead of using the system’s GPU. Chrome provides hardware rendering with consequent smoother user experience. Obviously the level of GPU support available on the client directly correlates to maximum x3dom complexity and user experience.

In some cases the ST_Extrude result is rendered to odd surfaces with multiple artifacts. Here is an example with low population in eastern Colorado. Perhaps the extrusion surface breaks down due to tessellation issues on complex polygons with zero or near zero height. This warrants further experimentation.

Fig 2 – rendering artifacts at near zero elevations

The performance complexity curve on a typical client is fairly low. It’s tricky to keep the model sizes small enough for acceptable performance. IE11 is especially vulnerable to collapse due to software rendering. In this experiment the x3d view is limited to the intersections with extents of the selected territory using turf.js.

var extent = turf.extent(app.territory);

In addition making use of PostGIS ST_SimplifyPreserveTopology helps reduce polygon complexity.

Xml formats like x3d tend to be verbose and newer lightweight frameworks prefer a json interchange. Json for WebGL is not officially standardized but there are a few resources available.

An interesting GIS development over the years has been the evolution from monolithic applications to multiple open source plug and play tools. Considering GIS as a three tier system with back end storage, a controller in the middle, and a UI display out front, more and more of the middle tier is migrating to either end.

SQL DBs, such as SQL Server, Oracle Spatial, and especially PostGIS, now implement a multitude of GIS functions originally considered middle tier domain. On the other end, the good folks at turf.js and proj4js continue to push atomic GIS functions out to javascript, where they can fit into the client side UI. The middle tier is getting thinner and thinner as time goes on. Generally the middle is what costs money, so rolling your own middle with less work is a good thing. As a desirable side effect, instead of kitchen sink GIS, very specific user tools can be cobbled together as needed.

Looking for an excuse to experiment with turf.js, I decided to create a Territory Builder utilizing turf.js on the client and some of the US Census Bureau Services on the backend.

US Census Bureau does expose some “useful” services at TigerWeb. I tend to agree with Brian Timoney that .gov should stick to generating data exposed in useful ways. Apps and presentation are fast evolving targets and historically .gov can’t really hope to keep up. Although you can use the census custom TigerWeb applications for some needs, there are many other occasions when you would like to build something less generic. For example a Territory builder over Bing Maps.

Fig 2 – Territory polygon with a demographic overlay on top of Bing Maps

Note: Because there doesn’t appear to be an efficient way to join demographic data to geography with current TigerWeb service APIs, the demographic tab of this app uses a custom WMS PostGIS backend, hooking SF1 to TIGER polygons.

WMS GetMap requests simply return an image. In order to overlay the image on a Bing Map this Territory app uses an html5 canvas and context app.context.drawImage(imageObj, 0, 0); The trick is to stack the canvas in between the Bing Map and the Bing Map navigation, scale, and selector tools. TigerWeb WMS conveniently exposes epsg:3857 which correctly aligns with Bing Map tiles.

Unfortunately the WMS GetMap request has no IDs available for polygons, even if requested format is image/svg+xml. SVG is an xml format and could easily contain associated GeoID values for joining with other data resources, but this is contrary to the spirit of OGC WMS service specs. Naturally we must obey OGC Law, which is too bad. Adding a GeoID attribute would allow options such as choropleth fills directly on existing svg paths. For example adding id = “80132″ would allow fill colors by zipcode with a bit of javascript.

The GEOID retrieved from our proxied GetFeatureInfo request allows us to grab vertices with another TigerWeb service, Census REST.

This spec is a little more proprietary and requires some detective work to unravel.FeatureUrl.url = “http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/PUMA_TAD_TAZ_UGA_ZCTA/MapServer/1/query?where=GEOID%3D” + geoid + “&geometryPrecision=6&outSR=4326&f=pjson”;

There are different endpoint urls for the various polygon types. In this case Zip Codes are found in PUMA_TAD_TAZ_UGA_ZCTA. We don’t need more than 1 meter resolution so precision 6 is good enough and we would like the results in epsg:4326 to avoid a proj4 transform on the client.

Census REST doesn’t appear to offer a simplify parameter so the coordinates returned are at the highest resolution. Highly detailed polygons can easily return several thousand vertices, which is a problem for performance, but the trade-off is eliminating hosting data ourselves.

TigerWeb offers some useful data access. With TigerWeb WMS and REST api, developers can customize apps without hosting and maintaining a backend SQL store. However, there are some drawbacks.

Some potential improvements:
1. Adding an svg id=GeoID would really improve the usefulness of TigerWeb WMS image/svg+xml, possibly eliminating steps 2 and 3 of the workflow.

2. Technically it’s possible to use the TigerWeb REST api to query geojson by area, but practically speaking the results are too detailed for useful performance. A helpful option for TigerWeb REST would be a parameter to request simplified polygons and avoid lengthy vertice transfers.

turf.js is a great tool box, however, occasionally the merge function had trouble with complex polygons from TigerWeb.

Fig 1 - SF1QP Quantile Population County P0010001 P1.TOTAL POPULATION Universe: Total population

Preparation for US 2020 Census is underway at this mid-decennial point and we’ll see activity ramping up over the next few years. Will 2020 be the last meaningful decennial demographic data dump? US Census has been a data resource since 1790. It took a couple centuries for Census data to migrate into the digital age, but by Census 2000, data started trickling into the internet community. At first this was simply a primitive ftp data dump, ftp2.census.gov, still very useful for developers, and finally after 2011 exposed as OGC WMS, TigerWeb UI, and ESRI REST.

However, static data in general, and decennial static data in particular, is fast becoming anachronistic in the modern era. Surely the NSA data tree looks something like phone number JOIN Facebook account JOIN Twitter account JOIN social security id JOIN bank records JOIN IRS records JOIN medical records JOIN DNA sequence….. Why should this data access be limited to a few black budget bureaus? Once the data tree is altered a bit to include mobile devices, static demographics are a thing of the past. Queries in 2030 may well ask “how many 34 year old male Hispanic heads of households with greater than 3 dependents with a genetic predisposition to diabetes are in downtown Denver Wed at 10:38AM, at 10:00PM?” For that matter let’s run the location animation at 10 minute intervals for Tuesday and then compare with Sat.

“Privacy? We don’t need no stinking privacy!”

I suppose Men and Black may find location aware DNA queries useful for weeding out hostile alien grays, but shouldn’t local cancer support groups also be able to ping potential members as they wander by Star Bucks? Why not allow soda vending machines to check for your diabetic potential and credit before offering appropriate selections? BTW How’s that veggie smoothie?

By late 2011 census OGC services began to appear along with some front end data web UIs, and ESRI REST interfaces. [The ESRI connection is a tightly coupled symbiotic relationship as the Census Bureau, like many government bureaucracies, relies on ESRI products for both publishing and consuming data. From the outside ESRI could pass as an agency of the federal government. For better or worse “Arc this and that” are deeply rooted in the .gov GIS community.]

For mapping purposes there are two pillars of Census data, spatial and demographic. The spatial data largely resides as TIGER data while the demographic data is scattered across a large range of products and data formats. In the basement, a primary demographic resource is the SF1, Summary File 1, population data.

“Summary File 1 (SF 1) contains the data compiled from the questions asked of all people and about every housing unit. Population items include sex, age, race, Hispanic or Latino origin, household relationship, household type, household size, family type, family size, and group quarters. Housing items include occupancy status, vacancy status, and tenure (whether a housing unit is owner-occupied or renter-occupied).”

The intersection of SF1 and TIGER is the base level concern of census demographic mapping. There are a variety of rendering options, but the venerable color themed choropleth map is still the most widely recognized. This consists of assigning a value class to a color range and rendering polygons with their associated value color. This then is the root visualization of Census demographics, TIGER polygons colored by SF1 population classification ranges.

Unfortunately, access to this basic visualization is not part of the 2010 TigerWeb UI.

There are likely a few reasons for this, even aside from the glacially slow adoption of technology at the Bureau of the Census. A couple of obvious reasons are the sheer size of this data resource and the range of the statistics gathered. A PostGIS database with 5 level primary spatial hierarchy, all 48 SF1 population value files, appropriate indices, plus a few helpful functions consumes a reasonable 302.445 GB of a generic Amazon EC2 SSD elastic block storage. But, contained in those 48 SF1 tables are 8912 demographic values which you are welcome to peruse here. A problem for any UI is how to make 8912 plus 5 spatial levels usable.

Fig 3 – 47 SF1 tables plus sf1geo geography join file

Filling a gap

Since the Census Bureau budget did not include public visualization of TIGER/Demographics what does it take to fill in the gap? Census 2010 contains a large number of geographic polygons. The core hierarchy for useful demographic visualization is state, county, tract, block group, and block.

Fig 4 – Census polygon hierarchy

Loading the data into PostGIS affords low cost access to data for SF1 Polygon value queries such as this:

These polygon counts rule out visualizations of the entire USA, or even moderate regions, at tract+ levels of the hierarchy. Vector mapping is not optimal here.

B. The number of possible image tile pyramids for 8912 values over 5 polygon levels is 5 * 8192 = 44,560. This rules out tile pyramids of any substantial depth without some deep Google like pockets for storage. Tile pyramids are not optimal either.

C. Even though vector grid pyramids would help with these 44,560 demographic variations, they suffer from the same restrictions as A. above.

One possible compromise of performance/visualization is to use an old fashioned OGC WMS GetMap request scheme that treats polygon types as layer parameters and demographic types as style parameters. With appropriate use of WMS <MinScaleDenominator> <MaxScaleDenominator> the layers are only rendered at sufficient zoom to reasonably limit the number of polygons. Using this scheme puts rendering computation right next to the DB on the same EC2 instance, while network latency is reduced to simple jpeg/png image download. Scaling access to public consumption is still problematic, but for in-house it does work.

There are still issues with a scale rendering approach. Since population is not very homogenous over US coverage extent, scale dependent rendering asks to be variable as well. This is easily visible over population centers. Without some type of pre-calculated density grid, the query is already completed prior to knowledge of the ideal scale dependency. Consequently, static rendering scales have to be tuned to high population urban regions. Since “fly over” US is generally less interesting to analysts, we can likely live with this compromise.

Dividing a value curve to display adequately over a viewport range can be accomplished in a few different ways: equal intervals, equal quantile, jenks natural break optimization, K-means clustering, or “other.” Leaning toward the simpler, I chose a default quantile (guarantees some color) with a ten class single hue progression which of course is not recommended by color brewer. However 10 seems an appropriate number for decennial data. I also included a jenks classifier option which is considered a better representation. The classifier is based only on visible polygons rather than the entire polygon population. This means comparisons region to region are deceptive, but after all this is visualization of statistics.

“There are three kinds of lies: lies, damned lies, and statistics.” Mark Twain

Fig 9 – SF1JP SF1 Jenks Census Tract P0010001 (not density)

In order to manage Census data on a personal budget these compromises are involved:

This is a workable map service for a small number of users. Exposing as an OGC WMS service offers some advantages. First there are already a ton of WMS clients available to actually see the results. Second, the Query, geometry parsing, and image computation (including any required re-projection) are all server side on the same instance reducing network traffic. Unfortunately the downside is that the computation cost is significant and discouraging for a public facing service.

Because this data happens to be read only for ten years, scaling is not too hard, as long as there is a budget. It would also be interesting to try some reconfiguration of data into NoSQL type key/value documents with perhaps each polygon document containing the 8912 values embedded along with the geometry. This would cost a bit in storage size but could decrease query times. NoSQL also offers some advantages for horizontal scaling.

Summary

The Census Bureau and its census are obviously not going away. The census is a bureaucracy with a curious inertial life stretching back to the founding of our country (United States Constitution Article 1, section 2). Although static aggregate data is not going to disappear, dynamic real time data has already arrived on stage in numerous and sundry ways from big data portals like Google, to marketing juggernauts like Coca Cola and the Democratic Party, to even more sinister black budget control regimes like the NSA.

Census data won’t disappear. It will simply be superseded.

The real issue for 2020 and beyond is, how to actually intelligently use the data. Already data overwhelms analytic capabilities. By 2030, will emerging AI manage floods of real time data replacing human analysts? If Wall Street still exists, will HFT algos lock in dynamic data pipelines at unheard of scale with no human intervention? Even with the help of tools like R Project perhaps the human end of data analysis will pass into anachronism along with the decennial Census.

I was privileged to spend a few days at FOSS4G in Denver. It’s quite the opportunity to have an international convergence of GIS developers handily just a bus ride up the highway. I could walk among the current luminaries of OsGeo, Arnulf Christl, Paul Ramsey, Frank Warmerdam, Simone Giannecchini, and so on and on, and trade a handshake with the hard working front range folks, Peter Batty, Brian Timoney (way to go Brian!), and Matt Kruzemark too. The ideas are percolating in every corner, but it strikes me in an odd way that this world has changed irrevocably. The very technology that makes Geo FOSS possible is also making it, of all things, mundane.

Open Standards

It wasn’t that long ago that OGC standards were brand new. Each release anticipated a new implementation for discovery. Now implementations proliferate helter skelter. OGC services? Take your pick. Or, slice, dice, and merge with MapProxy. Map Tiling engines line up like yogurt brands at Whole Foods. GeoFOSS has progressed a long way from naive amazement over OGC WMS connections. WFS and WCS have already passed their respective moments of glorious novelty.

This is the year of WPS, Web Processing Service, and the hope of constructing webs of analysis chains from disparate nodes. Each node adds a piece toward a final solution. Anyone with some experience using FME Transformers has some idea of what this looks like on the desktop. WPS moves analytic workflows into open standards and onto the web. Ref: 52north.org and GeoServer Ext

Fig2 – example of FME Workbench processing chain

On the floor at FOSS4G 2011

In a different vein, the next release of PostGIS will push raster analysis right into SQL. This is new territory for PostGIS and elevates a raster datatype to the same level as geometry. This PostGIS Raster Functions page gives some flavor for the raster functions soon to be available when 2.0 releases. Ever needed a raster reprojection?

“For the web to move forward and for consumers to get the most out of touch-first browsing, the Metro style browser in Windows 8 is as HTML5-only as possible, and plug-in free. The experience that plug-ins provide today is not a good match with Metro style browsing and the modern HTML5 web.” – Steven Sinofsky

CouchDB/GeoCouch was present although “eventually consistent” NoSQL seems less pressing.

Although Oracle is trying hard to break Java, it is still a popular platform among FOSS projects. The Javascript platform is a growth industry with the help of backend tools like Node.js, html5 WebSockets, and Asynchronous WebWorkers. C/C++ takes away wins in the popular performance shootout.

With over 900 attendees, the small world of FOSS seems all of a sudden crowded, jostling through accelerating growth. The Ordnance Survey was represented at a plenary session, “We have to learn to embrace open source. We want to use OS. We want to encourage OS.” Who’d of ever thought? FCC.gov opens their kimono in a big way with data transparency and OS APIs. Work shop topics include such things as: “Open Source Geospatial Software Powering Department of Defense Installation and Environment Business Systems,” or catch this, “The National Geospatial-Intelligence Agency OSS Challenge.” Is this the new Face of FOSS?

Open Bureaucracy?

Such bureaucracies know little of agile. The memorable phrase “embrace, extend, extinguish” comes to mind. But who embraces whom? Is the idea of FOSS and its larger www parent, a trend that eventually extinguishes bureaucracy or does the ancient ground of bureaucratic organization trump connection? Byzantine bureaucracy has been around since … well the Byzantine Empire, and given human nature is likely enduring. But, what does a “flat” Byzantine Bureaucracy look like? How about crowd sourced Byzantia? Would an aging Modernity mourn the loss of “Kafkaesque” as an adjective?

Assuming growth continues, will success reduce the camaraderie of community as a motivation? Just this year we hear of OSM’s Steve Coast sidling into Microsoft, followed a little later by GDAL’s Frank Warmerdam beaming up into the Google mother ship. The corporate penalty, of course, is the loss of personal intellectual property. In other words, will Steve Coast’s imaginative ideas now fall under the rubric, “all your base are belong to us,” and Frank’s enduring legacy recede into the patent portfolio of “Do no evil?” FOSS4G halls are still filled with independent consultants and academics, but a significant corporate representation is evident by this apparent oxymoron, a presentation by ESRI, “Open Source GIS Solutions.”

Some Perspective:

The GIS nervous system grows connections around the world faster than a three year old’s brain. OK, maybe I exaggerate, after all, “During the first three years, your child’s brain establishes about 1,000 trillion nerve connections!” Really, how many connections are there in the World Wide Web. For all its impact on life, our beloved internet encompasses the equivalent of what, a few cubic centimeters of a single infant’s anatomy.

Open Geospatial and FOSS are just a small part of this minor universe, but it’s easy to forget that even ten years ago this all hardly existed. “The first Interoperability Program testbed (Web Mapping Testbed) appeared in 1999.” About the same time Frank Warmerdam started GDAL and the GFOSS engines started.

I still recall the wonder of downloading Sol Katz utilities from BLM’s ftp. The novelty of all this data free to use from the USGS was still fresh and amazing. Sol sadly died before seeing his legacy, but what a legacy. The Sol Katz award this year went to Java Topology Suite’s well deserving Martin Davis.

Bing Maps Silverlight Control has a lot of power, power that is a lot of fun to play with even when not especially practical. This weekend I was presented with a challenge to find a way to show web maps, but restricted to a subset of states, a sub-region. I think the person making the request had more in mind the ability to cut out an arbitrary region and print it for reporting. However, I began to think why be restricted to just the one level of the pyramid. With all of this map data available we should be able to provide something as primitive as coupon clipping, but with a dynamic twist.

Silverlight affords a Clip masking mechanism and it should be simple to use.

1. Region boundary:

The first need is to compile the arbitrary regional boundary. This would be a challenge to code from scratch, but SQL Server spatial already has a function called “STUnion.” PostGIS has had an even more powerful Union function for some time, and Paul Ramsey has pointed out the power of fast cascaded unions. Since I’m interested in seeing how I can use SQL Serrver, though, I reverted to the first pass SQL Server approach. But, as I was looking at STUnion it was quickly apparent that this is a simple geo1.STUnion(geo2) function and what is needed is an aggregate union. The goal is to union more than just two geography elements at a time, preferably the result of an arbitrary query.

Fortunately there is a codeplex project, SQL Server Tools, which includes the very thing needed, along with some other interesting functions. GeographyAggregateUnion is the function I need, Project/Unproject and AffineTransform:: will have to await another day. This spatial tool kit consists of a dll and a register.sql script that is used to import the functions to an existing DB. Once in place the functions can be used like this:

SELECT dbo.GeographyUnionAggregate(wkb_geometry)) as BoundaryFROM [Census2009].[dbo].[states]WHERE NAME = ‘Colorado’ OR NAME = ‘Utah’ or NAME = ‘Wyoming’

Noting that CO, WY, and UT are fairly simple polygons but the result is 1092 nodes I tacked on a .Reduce() function.dbo.GeographyUnionAggregate(wkb_geometry).Reduce(10) provides 538 points
dbo.GeographyUnionAggregate(wkb_geometry).Reduce(100) provides 94 points
dbo.GeographyUnionAggregate(wkb_geometry).Reduce(100) provides 19 points

Since I don’t need much resolution I went with the 19 points resulting from applying the Douglas-Peuker thinning with a tolerance factor of 1000.

2. Adding the boundary

The next step is adding this union boundary outlining my three states to my Silverlight Control. In Silverlight there are many ways to accomplish this, but by far the easiest is to leverage the builtin MapPolygon control and add it to a MapLayer inside the Map hierarchy:<m:MapLayer>
<m:MapPolygon x:Name=”region”
Stroke=”Blue” StrokeThickness=”5″
Locations=”37.0003960382868,-114.05060006067 37.000669,-112.540368
36.997997,-110.47019 36.998906,-108.954404
.
.
.
41.996568,-112.173352 41.99372,-114.041723
37.0003960382868,-114.05060006067 “/>
</m:MapLayer>

Now I have a map with a regional boundary for the combined states, CO, WY, and UT.

3. The third step is to do some clipping with the boundary:

UIElement.Clip is available for every UIElement, however, it is restricted to Geometry clipping elements. Since MapPolygon is not a geometry it must be converted to a geometry to be used as a clip element. Furthermore PathGeometry is very different from something as straight forward as MapPolygon, whose shape is defined by a simple LocationCollection of points.

This Clip PathGeometry can be applied to the m:Map named MainMap to mask the underlying Map. This is easily done with a Button Click event. But when navigating with pan and zoom, the clip PathGeometry is not automatically updated. It can be redrawn with each ViewChangeEnd:private void MainMap_ViewChangeEnd(object sender, MapEventArgs e)
{
if (MainMap != null)
{
if ((bool)ShowBoundary.IsChecked) DrawBoundary();
if ((bool)ClipBoundary.IsChecked) DrawClipFigure();
}
}

This will change the clip to match a new position, but only after the fact. The better way is to add the redraw clip to the ViewChangeOnFrame:MainMap.ViewChangeOnFrame += new EventHandler<MapEventArgs>(MainMap_ViewChangeOnFrame);

In spite of the constant clip redraw with each frame of the navigation animation, navigation is smooth and not appreciably degraded.

Summary:

Clipping a map is not terrifically useful, but it is supported with Silverlight Control and provides another tool in the webapp mapping arsenal. What is very useful, are the additional functions found in SQL Server Tools. Since SQL Server spatial is in the very first stage of its life, several useful functions are not found natively in this release. It is nice to have a superset of tools like GeographyAggregateUnion, Project/Unproject,
and AffineTransform::.

The more generalized approach would be to allow a user to click on the states he wishes to include in a region, and then have a SQL Server query produce the boundary for the clip action from the resulting state set. This wouldn’t be a difficult extension. If anyone thinks it would be useful, pass me an email and I’ll try a click select option.

This is not really apples to apples since Amazon is a virtual instance, while Azure is per deployed application. A virtual instance can have multple service/web apps deployed.

2. Amazon announces a Relational Database Service RDS
Based on MySQL 5.1, this doesn’t appear to add a whole lot since you always could start an instance with any database you wanted. MySQL isn’t exactly known for geospatial even though it has some spatial capabilities. You can see a small comparison of PostGIS vs MySQL by Paul Ramsey. I don’t know if this comparison is still valid, but I haven’t seen much use of MySQL for spatial backends.

This is similar to Azure SQL Server which is also a convenience deployment that lets you run SQL Server as an Azure service, without all the headaches of administration and maintenance tasks. Neither of these options are cloud scaled, meaning that they are still single instance versions, not cross partition capable. SQL Azure Server CTP has an upper limit of 10Gb, as in hard drive not RAM.

These are new virtual instance AMIs that scale up as opposed to scale out. Scaled out options use clusters of instances in the Grid Computing/Hadoop type of architectures. There is nothing to prohibit using clusters of scaled up instances in a hybridized architecture, other than cost. However, the premise of Hadoop arrays is “divide and conquer,” so it makes less sense to have massive nodes in the array. Since scaling out involves moving the problem to a whole new parallel programming paradigm with all of its consequent complexity, it also means owning the code. In contrast scaling up is generally very simple. You don’t have to own the code or even recompile just install on more capable hardware.

Returning us back to the Amazon RDS, Amazon has presumably taken an optimized compiled route and offers prepackaged MySQL 5.1 instances ready to use:

db.m1.small (1.7 GB of RAM, $0.11 per hour).

db.m1.large (7.5 GB of RAM, $0.44 per hour)

db.m1.xlarge (15 GB of RAM, $0.88 per hour).

db.m2.2xlarge (34 GB of RAM, $1.55 per hour).

db.m2.4xlarge (68 GB of RAM, $3.10 per hour).

Of course the higher spatial functionality of PostgreSQL/PostGIS can be installed on any of these high memory instances as well. It is just not done by Amazon. The important thing to note is memory approaches 100Gb per instance! What does one do with all that memory?

Here is one use:

“Google query results are now served in under an astonishingly fast 200ms, down from 1000ms in the olden days. The vast majority of this great performance improvement is due to holding indexes completely in memory. Thousands of machines process each query in order to make search results appear nearly instantaneously.”

Google Fellow Jeff Dean keynote speech at WSDM 2009.

Having very large memory footprints makes sense for increasing performance on a DB application. Even fairly large data tables can reside entirely in memory for optimum performance. Whether a database makes use of the best optimized compiler for Amazon’s 64bit instances would need to be explored. Open source options like PostgreSQL/PostGIS would let you play with compiling in your choice of compilers, but perhaps not successfully.

Here is Todd Hoff’s point about having your DB run inside of RAM – remember that 68Gb Quadruple Extra Large memory:

“Why are Memory Based Architectures so attractive? Compared to disk, RAM is a high bandwidth and low latency storage medium. Depending on who you ask the bandwidth of RAM is 5 GB/s. The bandwidth of disk is about 100 MB/s. RAM bandwidth is many hundreds of times faster. RAM wins. Modern hard drives have latencies under 13 milliseconds. When many applications are queued for disk reads latencies can easily be in the many second range. Memory latency is in the 5 nanosecond range. Memory latency is 2,000 times faster. RAM wins again.”

The real question, of course, is what will actual benchmarks reveal? Presumably optimized memory caching narrows the gap between disk storage and RAM. Which brings up the problem of configuring a Database to use large RAM pools. PostgreSQL has a variety of configuration settings but to date RDBMS software doesn’t really have a configuration switch that simply caches the whole enchilada.

Here is some discussion of MySQL front-ending the database with In-Memory-Data-Grid (IMDG).

Tuning for large memory is not exactly straightforward. There is no “one size fits all.” You can quickly get into Managing Kernel Resources. The two most important parameters are:

shared_buffers

sort_mem

“As a start for tuning, use 25% of RAM for cache size, and 2-4% for sort size. Increase if no swapping, and decrease to prevent swapping. Of course, if the frequently accessed tables already fit in the cache, continuing to increase the cache size no longer dramatically improves performance.”

OK, given this rough guideline on a Quadruple Extra Large Instance 68Gb:

shared_buffers = 17Gb (25%)

sort_mem = 2.72Gb (4%)

This still leaves plenty of room, 48.28Gb, to avoid dreaded swap pagein by the OS. Let’s assume a more normal 8Gb memory for the OS. We still have 40Gb to play with. Looking at sort types in detail may make adding some more sort_mem helpful, maybe bump to 5Gb. Now there is still an additional 38Gb to drop into shared_buffers for a grand total of 55Gb. Of course you have to have a pretty hefty set of spatial tables to use up this kind of space.

Here is a list of PostgreSQL limitations. As you can see it is technically possible to run out of even 68Gb.

To get some perspective on size an Open Street Map dump of the whole world fits into a 90Gb EBS Amazon Public Data Set configured for PostGIS with pg_createcluster. Looks like this just happened a couple weeks ago. Although 90Gb is just a little out of reach for a for even a Quadruple Extra Large, I gather the current size of planet osm is still in the 60Gb range and you might just fit it into 55Gb RAM. It would be a tad tight. Well maybe the Octuple Extra Large Instance 136Gb instance is not too far off. Of course who knows how big Planet OSM will ultimately end up being.
See planet.openstreetmap.org

Another point to notice is the 8 virtual cores in a Quadruple Extra Large Instance. Unfortunately

“PostgreSQL uses a multi-process model, meaning each database connection has its own Unix process. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs. However, if only a single database connection is active, it can only use one CPU. PostgreSQL does not use multi-threading to allow a single process to use multiple CPUs.”

Running a single connection query apparently won’t benefit from a multi cpu virtual system, even though running multi threaded will definitely help with multiple connection pools.

I look forward to someone actually running benchmarks since that would be the genuine reality check.

Summary

Scaling up is the least complex way to boost performance on a lagging application. The Cloud offers lots of choices suitable to a range of budgets and problems. If you want to optimize personnel and adopt a decoupled SOA architecture, you’ll want to look at Azure + SQL Azure. If you want the adventure of large scale research problems, you’ll want to look at instance arrays and Hadoop clusters available in Amazon AWS.

However, if you just want a quick fix, maybe not 2000x but at least a some x, better take a look at Big RAM. If you do, please let us know the benchmarks!

I gather that there is some twittering about neo versus paleo geography. See Peter Batty’s blog entry or James Fee’s blog. I myself don’t Twitter, but in general I’m happy for Peter’s paleo accomodation of the non twitterers, repeating the conversation in a blog entry. Peter has also updated comments with a new post questioning, “Are we now in a post neogeography era?” The dreaded paradigm shifts are coming fast and furiously.

I am not really able to comment on neo vs paleo as I myself fall even further back into “Jurassic Geography.” Looking at connotations we have this accounting:

Obviously the “paleo” label is not carried with quite the honor of “neo.” It’s reminiscent of the Galen / Myers-Brigg personality typology characterized as Lion, Otter, Beaver, and Golden Retriever. What do you want to be? Obviously not the beaver, but there has to be a significant part of the world in that category, like it or not. After all what would lions eat for dinner without a few of us beavers? Likewise there is a significant branch of paleo in the GIS kingdom.

However, in the pre-paleolithic era there are still a few of us left, falling into the “long tail” of the Jurassic. So carrying on down the connotation stream here is the Jurassic geography equivalent:

Wait “Open Source” – Jurassic Geography? How did that get in there? The notoriously frugal days of Hoover never made it into the paleolithic era’s “Supply Side” economy. It’s Keynesian economics all over the neo world, so Jurassic geography is the frugal end of the spectrum and how can you get more frugal than free! Obviously Open Source is as Jurassic as they come in Geography circles.

As I’ve recently been in a gig hunting mode, I’ve been having quite a few in depth conversations about GIS stacks. As a small businessman outside the corporate halls of paleo geography, I’ve had few occasions to get an in depth education on the corporate pricing world. So I spent the past couple of days looking into it.

Let’s start at the bottom of the stack. Here is some retail pricing on a few popular GIS databases:

Oracle Standard Spatial $17,500 + $3850 annual

Oracle Enterprise Locator $47,500 + $10,450 annual

SQL Server 2008 Web edition ~ $3500

PostgreSQL/PostGIS $0.00

If you’re a Jurassic geographer which do you choose? Probably not Oracle Enterprise Locator. If your Paleo you look at that and think, “Man, I am the negotiator! We don’t pay any of that retail stuff for the masses.” Neo? – well how would I know how a neo thinks?

OWS GeoServer or MapServer minimum $0 + annual $0
But, this is GIS isn’t it? We want some real analytic tools not just a few hundred spatial functions in JTS Topology suite. OK, better throw in a few QGIS or GRASS installs and add a few $0s to the desktop production. Oh, and cores, we need some, “make that a 16core 64 bit please” – plus $0.

I think you catch the Jurassic drift here. How about client side.

ESRI Silverlight free, well sort of , if you’re a developer, NGO, educational, or non-profit otherwise take a look at that ArcGIS license back a few lines.

Database:PostgreSQL/PostGIS – Standards compliant SQL server with GIST spatial indexing on OGC “Simple Features for SQL” specification compliant geometry with extended support for 3DZ, 3DM and 4D coordinates. A full set of roughly 175 geometry, management, and spatial functions. It supports almost all projections. All this and performance? maybe a little vague but not shabby:

“PostGIS users have compared performance with proprietary databases on massive spatial data sets and PostGIS comes out on top.”

“GeoServer is the reference implementation of the Open Geospatial Consortium (OGC) Web Feature Service (WFS) and Web Coverage Service (WCS) standards, as well as a high performance certified compliant Web Map Service (WMS). “

Summary:
Well in these economic times Jurassic may in fact meet Neo. The GIS world isn’t flat and Jurassic going right eventually meets Neo going left, sorry Paleos. Will Obama economics turn out to be Hooverian in the end? Who knows, but here’s a proposition for the Paleos:

Let me do a GIS distribution audit. If I can make a Jurassic GIS Stack do what the Paleo stack is currently providing, you get to keep those annual Paleo fees from here to recovery. How about it?

Using an mxd in ArcMap 9.3, I first made sure all of my shape layers were connected. Then I ran the ArcMap2SLD tool which requires a session of ArcMap to be running.

This is a simple ArcMap project with a set of shp layers. It takes about 5-10 minutes to run through all the layers. The analysis seems to read through each symbol in a layer and then to loop through every record. After a few minutes it asks again for a file to store the sld and now the test.sld is ready.

The resulting SLD is a single file with all of the Feature types and rules listed together:

Copying a subset for the roads layer into GeoServer style did not produce any features the first time around.

First the Filter PropertyName is case sensitive. The loading batch I used produces lower case field names in the PostGIS tables. As a result the RDCLS property name had to be changed to rdcls through out the SLD.

If you notice in the above SLD output, the Geometry reference is missing:<Geometry><ogc:PropertyName>the_geom</ogc:PropertyName></Geometry>

Also missing are scale elements:<MinScaleDenominator>32000</MinScaleDenominator><MaxScaleDenominator>20000000</MaxScaleDenominator>

This still requires some manual intervention, but it does handle setting Filters, stroke color, stroke-width, and stroke-opacity parameters with a text editor along with cut & paste. It handled the PolygonSymbolizer in the mxd as well.

This is a time saver, however, the appearance didn’t seem to match in the resulting SLD tilesources. It appears as though some of the SLD stroke widths end up masking underlying strokes so that for example the purple highway symbolization does not appear in the SLD tilesource version.

Fig 2 – Silverlight tile source with new SLD

After some examination I realized that multiple LineSymbolizations were in reverse order. The thickest line width needs to be first with thinner stroke widths later in the SLD file, because rendering is in file order. Here is an example of the final resulting SLD LineSymbolization with corrected ordering:

I put my name down for a possible unconference talk at Wherecamp5280 coming up in Aug. In case it interests folks at the unconference, I decided to put together some notes here on my latest little project.

Open source GIS stacks do a great job of exposing ordinary GIS functionality. For web applications they do as well as anything out there from Oracle, Microsoft, or even ESRI. I use PostGIS and GeoServer all the time. Now that Microsoft’s Silverlight MapControl is available the client side is also easy to create. Generally porting a project involves just a few steps:a. load all the shape files to PostGIS tables – ogr2ogr, shp2pgsqlb. create the WMS service layers in GeoServerc. create the client using, my preference at the moment, Silverlight MapControl

Step a. is fairly straightforward. There are a lot of good tools including shp2pgsql and ogr2ogr. Here are a couple of example loading commands:shp2pgsql -s 4326 -d -I CityLimits.shp citylimits | psql -h localhost -d City -U username

Loading a set of shape files is normally a matter of writing a batch script. If there are some complicating factors then it is also easy enough to create a simple custom Java or C# loader specific to a particular job. In a more involved ArcSDE environment, you could theoretically skip the intermediary .shp and use table dumps from the backing DB. The complication there is doing the due diligence to get the DB idiosynchrosies mapped from a commercial license DB to PostgreSQL, which is generally more standards compliant.

Step b. above is the hardest part at present. The GeoServer RESTful configuration service extension promises to make life a bit more developer friendly. At least you can build Post and Put tools to add layers and styles in your own program. But, the really sticky part is translating all of the numerous styling, legend, range,.. etc parameters from an ESRI mxd project to SLD (or in the case of TatukGIS ttkgp files). SLD is a powerful style description language which can do just about any kind of styling you can think of, but it seems to be lacking interactive tools. It will be nice to see layer sld editors with popup colors, selectable view ranges, filter query generators, captions, auto legend generator etc. Perhaps it already exists and I’ve just missed it?

At any rate to do this manually with a text editor is not impossible, but very tedious. It requires creating sets of FeatureTypeStyle Rules with filters, symbolizers, min and max scale denominators, labels etc. This can get quite involved. For example, layers containing roads with numerous classifications to arrange in the SLD require some fairly involved xml plus a lot of cut & paste. Putting together icon symbols can get tedious too. ESRI mxd, like most things ESRI, is not exactly open format, but some diligent folks have done a bit of Arc scripting. On my input stack is trying out ArcMap2SLD to see how effectively it can be used to create the SLD side of things.
·· ·mxd to SLD – ArcMap2SLD
Well I tried out the ArcMap2SLD for a Arc9.3 project but received an error in German so it’s manual mode for the present.

Once GeoServer FeatureTypes are configured for a WMS, step c. is just wrapping the WMS calls inside a Silverlight MapControl. There are two ways to approach this. First, the one I like, is using MapTileLayer.TileSources for the various layers. The tiles are sourced from the geowebcache, gwc, included with newer versions of GeoServer. A second approach is to create a GetMap WMS request with explicit BBox. They both work fine, I just kind of like the dynamic zoom in affect of MultiScale tile sources as they spiral down the quadkey.

Silverlight 3 is out now so I installed the latest and then ran the new Silverlight Client through the update process in VisualStudio. It worked without a problem. I also downloaded the latest version of Silverlight Toolkit. This tool kit has a lot of nice controls to put to use. Using a control is pretty simple, just reference in the library dlls and start adding controls:

Want to change the entire page style? Just add one of the dozen theme styles under root to wrap the whole page: <expressionDark:ExpressionDarkTheme>

It’s convenient to include a base map tile source using OSM, but if this is an offline situation you can use a road layer of your own. Those with need of a high level background map including aerial imagery can use the Bing Maps tile service, license required. Silverlight MapControl cannot quite be offline yet. Looking at the request response traffic with Fiddler shows a couple of requests to the Microsoft server. Since I turned off Bing Maps in this project it does nothing more than confirm that their server is online and has a crossdomain.xml. If you are offline it throws up a white box message indicating “service is unavailable. try again later.” The message unfortunately is sticky and stays with the viewport regardless of zooming panning around your local tile sources.

The custom code for Silverlight is easy to create using C# which I’m learning to really like. One great feature compared to Javascript client code is performance. I was amazed the first time I brought in a list of a thousand ellipse shapes, no change in zoom and pan performance at all.

I then added a few specific ‘Find’ and ‘Query’ functions using some of the spatial functions in PostGIS. Double click FeatureInfo requests drill through the visible layer stack creating an expandable tree view for table attribute fields.

The end result, for those for whom this matters, is a complete online map viewer for your ArcMap project minus any license cost or hassles.

One caveat I should mention, the Silverlight MapControl is still CTP. Final release ETA is unknown at present, so work is currently subject to change and revision once final release is out.

8. Add a line draw and profile view window to the LiDAR Silverlight client
a. try a Silverlight 3 Perspective 3d Transform to follow a profile around an arbitrary LiDAR profile polyline. This is a non standard extension to the WMS request spec, which allows a client to grab a profile view out of the point cloud. Pretty interesting stuff since you can see ghostly building point cloud shapes rising out of the terrain.

10. Find a gig – resume Unfortunately, my last line item needs to be moved up to the top of the input stack. My primary client was forced to drop their software division due to the economic climate and it looks like I will need to focus on a new gig … but there are so many other interesting things to try out . . . If anyone out there needs a GIS WEB developer please shoot me an email, thanks!