EDIT (outlining new bounty added):
I have added the largest bounty possible to this question in hope of getting a very detailed answer. Your solution should outline a step-by-step guide utilizing open source tools (it believe giscloud.com is using some custom code combined with tools like tilestach, leaflet possibly postgis v2.0,++??...) and any code/customization to get the same rendering performance that giscloud.com is getting without using flash and while being visible on the same browser versions as giscloud achieved. Please note: the existing answers below provide some good info but point to an answer which one can implement. Bonus points for anyone who can outline how we can get QGIS to render layers using the same/similar method :)

I have been looking for a solid solution which would allow me to create a web map and overlay vector polygons without taking forever to load such data with the goal of allowing me to make each polygon display a different color on a hover event.

As far as I am aware there are 3 specific options to achieve this through either canvas, SVG, Flash.

Flash seems like it would be the best solution if it would work on apple iphones/ipads as its seems to provide the fastest rendering and cleanest display. Canvas seems to be the second best choice but takes VERY long if you have hundreds of polygons being displayed on a map whereas SVG takes even longer to render.

I almost lost hope in finding a solution to this problem but today I came across a company called GISCloud http://www.giscloud.com (currently in beta with free signup).

This company has SOMEHOW managed to figure out an amazing way to render hundreds of vectors on a map in near real-time. I was amazed with their approach and my question to the community relates to how we can replicate their approach for use with existing technologies such a leaflet, openlayers, wax...

I am assuming the url structure follows the standard tiling service logic (for example the 3rd to last folder being the zoom level...).

In any case I have analysed the actual data of these json files and it seems the logic they are using follows some type of logic by which they create their vectors just based off these data values:

width/heith: they define the width and height of the data being served in each
json request

pixels: here they define pixels values which I am
assuming somehow relates to some general x/y pixel coordinates for
generalized point levels? I am guessing they somehow have a way of
automatically simplifying the region depending on the zoom level. I
am assuming by them using pixels coordinates I am guessing they are
dramatically reducing the size of the data that needs to be loaded
compared to lat/long data.

geom: here is where I am guessing they
are somehow defining specifically defining each polygon within the
tile being loaded where such data is being defined based off the map
container window. Whats also interesting is that each entry has a "S"
value which I am assuming is used as an optional attribute or feature
link value and at the end of each entry here there is an area which
seems to define a specific per vector ID along with the layer ID
which I am guessing is utilized to somehow join the data from each
json tile request being called.

EDIT: I am also assuming they somehow have figured out a way to automatically determine and split up the data which needs to be loaded for each tile depending upon the size of the data which would need to be loaded for the requested tile.

OK, how many bites to transfer this? Provided that we are utf8 (1 byte per character when dealing with this content). Well, we have around 176 chars (without counting tabs or spaces) which makes this 176 bytes (this is being optimistic for various reasons that I will omit for the sake of simplicity). Mind you, this is for 2 points!

Still, some smart ass that doesn't understand what he is talking about, somewhere, will claim that "json gives you higher compression".

How many bytes here? Say ~115 characters. I even cheated a bit and made it smaller.

Say that my area covers 256x256 pixels and that I am at a zoom level so high that each feature renders as one pixel and I have so many features, that it is full. How much data do I need to show that 65,536 features?

54 characters (or utf bytes - and I am even ignoring some other things) per "feature" entry multiplied x 65,536 = 3,538,944 or about 3.3MB

I think you get the picture.

But this is how we transport data in a service oriented architecture. Readable bloated crap.

What if I wanted to transport everything in a binary scheme that I invented myself? Say that instead, I encoded that information in single band image (i.e black and white). And I decided that 0 means sold, and 1 means available, and 2 means I do not know. Heck, in a 1 byte, I have 256 options that I can use - and I am only using 2 or three of them for this example.

What is the storage cost of that? 256x256x 1 (one band only). 65,536 bytes or 0.06MB. And this doesn't even take in consideration other compression techniques that I get for free from several decades of research in image compression.

At this point, you should be asking yourself why do people not simply send data encoded in binary format instead of serializing to json? Well first, turns out, javascript sucks big time for transporting binary data, so that is why people have not done this historically.

An awesome work around has been used by some people when the new features of HTML5 came out, particularly canvas. So what is this awesome work-around? Turns out, you can send data over the wire encoded on what appears to be an image, then you can shove that image into an HTML5 Canvas, which allows you to manipulate the pixels directly! Now you have a way to grab that data, decode it on the client side, and generate the json objects in the client.

Stop a moment and think about this.

You have a way of encoding a huge amount of meaningful geo-referenced data in a highly compressed format, orders of magnitude smaller than anything else done traditionally in web applications, and manipulate them in javascript.

The HTML canvas doesn't even need to be used to draw, it is only used as a binary decoding mechanism!

That is what all those images that you see in Firebug are about. One image, with the data encoded for every single tile that gets downloaded. They are super small, but they have meaningful data.

So how do you encode these in the server side? Well you do need to generalize the data in the server side, and create a meaningful tile for every zoom level that has the data encoded. Currently, to do this, you have to roll your own - an out of the box open source solution doesn't exist, but you have all tools you need to do this available. PostGIS will do the generalization through GEOS, TileCache can be used to cache and help you trigger the generation of the tiles. On the client side, you will need to use HTML5 Canvas to pass on the special "fake tiles" and then you can use OpenLayers to create real client-side javascript objects that represent the vectors with mouse-over effects.

If you need to encode more data, remember that you can always generate RGBA images per pixel (which gives you 4 bytes per pixel or 4,294,967,296 numbers you can represent per pixel). I can think of several ways to use that :)

Update: Answering the QGIS question below.

QGIS like most other *Desktop GIS*es, do not have a fixed set of zoom levels. They have the flexibility of zooming at any scale and just render. Can they show data from WMS or tiles based sources? Sure they can, but most of the time they are really dumb about it: Zoom to a different extent, calculate the bounding box, calculate the required tiled, grab them, show them. Most of the time they ignore other things, like http header caches that would make it so they did not have to refetch. Sometimes they implement a simple cache mechanism (store the tile, if you ask for it, check for the tile, don't ask for it). But this is not enough.

With this technique the tiles and the vectors need to be refetched at every zoom level. Why? Because the vectors have been generalized to accomodate zoom levels.

As far as the whole trick of putting the tiles to an HTML5 canvas so you can access the buffers, that whole thing is not necessary. QGIS allows you to write code in Python and C++, both languages have excellent support for handling binary buffers, so this work around is really irrelevant for this platform.

UPDATE 2*:

There was a question about how to create the generalized vector tiles in the first place (baby step 1 before being able to serialize the results into images). Perhaps I did not clarify enough. Tilestache will allow you create effective "vector tiles" of your data at every zoom level (it even has an option that allows you to either clip or not clip the data when it passes the tile boundary). This takes care of separating the vectors into tiles at various zoom levels. I would choose the "not clip" option (but it will pick an arbitrary tile where it covers more area). Then you can feed every vector through GEOS generalize option with a big number, in fact, you want it big enough that polylines and polygons collapse onto themselves, because if they do, you can remove them from the zoom level since for that stage they are irrelevant. Tilestache even allows you to write easy pythonic data providers where you can put this logic. At that stage, you can choose to serve them as json files (like they do with some of the african map samples) or as serialized geometries into the pngs, like they do in other samples (or the Trulia one) I gave above.

So far, every single person that I have seen using this technique has not posted the code. IMHO, because the important part is really happening on the server and there is no "standard" for this and because choosing what each pixels means (1=sold,2=avail,etc) is so specific to your current map that this code is most likely not "generic".
–
Ragi Yaser BurhumOct 20 '11 at 15:21

1

As far as QGIS, the answer is a bit more involved, I will update my answer on the way to work. Dont freak out, I take a train, so no driving while replying to GIS.SE for me :)
–
Ragi Yaser BurhumOct 20 '11 at 15:34

9

+1 Thank you for not compressing this very readable response :)
–
Kirk KuykendallOct 20 '11 at 16:49

1

You could do this with Silverlight or Flash for sure. Nevertheless, remember that the important part is happening on the server so Flash or Silverlight would not be of that much of a help.
–
Ragi Yaser BurhumOct 20 '11 at 20:44

1

Interesting technique, but the data on the GISCloud Africa example does all seem to come via JSON.
–
geographikaOct 21 '11 at 6:07

It's not a big secret how we did it so I would be happy to share that
with you..the key is in two things:

removing from a tile all vectors which are to small to be visible
i.e. their area when calculated into pixels is less than 1px. so we
drop such a vector and instead of it place a pixel hence there is
"pixels" property in our json tile

vectors which will be actually visible are being generalized and
then written into a tile with their coordinates in pixels

On the client part we render on canvas those static pixels and visible
vectors. On top of vectors we implemented mouse event handling to
achieve hovering i.e. interactivity. and that's it.

Our backend map engine does all the heavy-lifting because we don't use
any precaching and all tiles are being generated on the-fly. it's very
important to us to have a map that can be quickly refreshed.

So it sounds like the client side is the easy part. It's impressive that the data is rendered without any caching.

He also mentions a hosting service which may be of interest to you. You may want to weigh the cost of trying to recreate this with the cost of using a ready made service.

The part which confuses me here is that it seems like requests are being sent to postgis and instead of getting standard geojson with lat/long values back they seem to be (in realtime) converting the lat/long values to xyz coordinates and spitting them out based on the zoom level and map tiles needed. What do you guys think is being used to get these speeds?
–
NetConstructor.comOct 9 '11 at 14:47

@netconstructor Maybe the geometry is already stored in xyz geometry so no need to convert?
–
geographikaOct 12 '11 at 18:58

As I described on the OSGeo list the key is in delivering data as vector JSON tiles that have pixels for subpixel geometry and generalized geometry for those features that will be actually visible on a certain level. Performance is great because this technique eliminates all unnecessary vector information and leave only those vectors that will actually have a visual impact on the map. Pixels are there to fill the gaps and be placed instead of those subpixel vectors. That is it regarding the tile format.

On the backend side is the true heavy-lifting. We are not using TileStache or any other map engine since we wrote our own that can, with a number of optimizations, produce such vector graphics in real-time.

First we started with delivering map tiles as SWFs and lately we just enabled output to JSON so we could use HTML5 Canvas to render the graphics. You can find below a benchmark comparing this kind of vector technology with raster technology (mapnik). For fair comparison only look for results in CGI mode.

We are planning to provide this technology as a map tile hosting service. The idea is to host your geo data on the cloud and through HTML5 deliver it into any map client at high speed, without any need to precache the tiles. If you are interested to join this beta feel free to contact us here: http://www.giscloud.com/contact/

The idea to use tiles for vector data is very interesting (it looks like another wording for "spatial indexing"). How do you deal with features crossing several tiles? Are they cut?
–
julienOct 21 '11 at 10:10

Looks like a very similar question was recently asked on the OSGeo Open Layers forum, with the GIS Cloud developers describing their approach, which is a interesting mix of GeoJSON geometries and static pixels. They actually generate all vector tiles on the fly instead of using a pre-built cache of GeoJSON files.

Esri has implemented a similar approach, using ArcGIS Server and Feature Layers, which can generalize the geometries on the fly and send them over the wire as JSON.

For a straight forward method that you can actually implement now, you can build vector tiles with Tilestache (which has PostGIS support)and consume them in Polymaps. Polymaps uses SVG, but the performance is quite good, and it CSS rules to style map elements, so the feature rendering is totally up to you. Here is a blog post working through something similar to what you are asking.

@wwnick - Thanks for your answer but it seems that GisCloud.com is utilizing some additional methods which allow them such amazing processing power without having to cache elements meaning everything is realtime. I added a bounty to the question and was hoping you might be willing to participate in providing an in-depth solution. Thanks for your response thus far!
–
NetConstructor.comOct 19 '11 at 4:13

As mentioned in the other answers: to deliver and show vectors in the fly - they need to be generalised for each zoom level and each dataset. Also, you can use Google polyline encoding to cut the size down considerably.

I used a simple delivery mechanism. Each geometry was a JavaScript function within a JavaScript HTTP response. not as advanced as tile based vector delivery, but simple and Open Source!

I didn't get to try Google Maps v3 with Canvas, but have seen a couple of New York Times demos which impressed.

Problem with this approach is that it's defiantly not as fast as their solution when dealing with 500,000 polygons and ie performance is really bad
–
NetConstructor.comOct 9 '11 at 14:49

please notice the added bounty and if you can please provide a detailed solution. BTW the New York Times while being very cool utilizes flash unlike the solution giscloud.com is using.
–
NetConstructor.comOct 19 '11 at 4:30

Yep, sorry about that - my "hobby" has now come to an end after 4 years of tinkering with polygons! GISCloud shows you how far the technology has come since my census demo went live a few years ago... I've removed references to it in the above comment.
–
minus34Apr 3 '12 at 1:36

1

Well better late than never! I've updated things to be as "out of the box" as possible and posted the client side code on GitHub. The setup for the new code has been blogged. It now reads polygons directly from PostGIS as is and applies the thinning on the fly via the PostGIS RESTful Web Service Framework (PRWSF) to a Leaflet Javascript API client. There's almost no backend coding required!
–
minus34Jun 4 '12 at 10:12

I do not know exactly which solution is used by this company (you could maybe ask them directly) but I have an idea.

The key solution to improve the network transfer and rendering speed of vector data is to generalise them according to the zoom level: Transferring and rendering at a high zoom level thousand of objects designed for a much lower zoom level is often very time consuming (and also useless because the final display is usually not legible - see for example this image). To implement that, your postgis server database has to be multi-scale: For each zoom level, there should be one representation of the object suitable for this zoom level. These different representations can be computed automatically using generalisation techniques. Furthermore, the vector data sent by the server to the client should not depend only on the spatial extend, but also on the zoom level: The server sends suitable data depending on the zoom level. This is the approach defended in this excellent paper :-)

There is a interesting paper, a demo and source code of a software developed by Stanford Visualization Group that use datacube for each tile in order to visualize and explore a large geographical dataset.
It can be used only for point dataset but can be an interesting way.