Friday, 16 December 2016

postgis - Creating Vector Polygons with rendering performance like GISCloud?


I have been looking for a solid solution which would allow me to create a web map and overlay vector polygons without taking forever to load such data with the goal of allowing me to make each polygon display a different color on a hover event.


As far as I am aware there are 3 specific options to achieve this through either canvas, SVG, Flash.


Flash seems like it would be the best solution if it would work on apple iphones/ipads as its seems to provide the fastest rendering and cleanest display. Canvas seems to be the second best choice but takes VERY long if you have hundreds of polygons being displayed on a map whereas SVG takes even longer to render.


I almost lost hope in finding a solution to this problem but today I came across a company called GISCloud http://www.giscloud.com (currently in beta with free signup).


This company has SOMEHOW managed to figure out an amazing way to render hundreds of vectors on a map in near real-time. I was amazed with their approach and my question to the community relates to how we can replicate their approach for use with existing technologies such a leaflet, openlayers, wax...


Take a look for yourself by viewing this amazing demo: http://www.giscloud.com/map/284/africa


Make sure you hover over any of the polygons on the page and test the zoom controls to see that these polygons are indeed vectors.



What I have noticed by looking at requests with firebug is that the map is requesting specific json files. It seems that depending on the zoom level/area there are multiple json files being requested.




I should also mention here that once giscloud loads the data on the page hovering over a vector immediately changes the color without creating a new request.


EXAMPLES:



I am assuming the url structure follows the standard tiling service logic (for example the 3rd to last folder being the zoom level...).


In any case I have analysed the actual data of these json files and it seems the logic they are using follows some type of logic by which they create their vectors just based off these data values:



  • width/height: they define the width and height of the data being served in each json request

  • pixels: here they define pixels values which I am assuming somehow relates to some general x/y pixel coordinates for generalized point levels? I am guessing they somehow have a way of automatically simplifying the region depending on the zoom level. I am assuming by them using pixels coordinates I am guessing they are dramatically reducing the size of the data that needs to be loaded compared to lat/long data.


  • styles: here they define two RGB css values. "F" representing the polygon file color and "S" representing the polygon border color.

  • geom: here is where I am guessing they are somehow defining specifically defining each polygon within the tile being loaded where such data is being defined based off the map container window. Whats also interesting is that each entry has a "S" value which I am assuming is used as an optional attribute or feature link value and at the end of each entry here there is an area which seems to define a specific per vector ID along with the layer ID which I am guessing is utilized to somehow join the data from each json tile request being called.




I am also assuming they somehow have figured out a way to automatically determine and split up the data which needs to be loaded for each tile depending upon the size of the data which would need to be loaded for the requested tile.


Here is an extracted breakdown of one of these requests:


{"width":256,"height":256,"tile":
{"pixels":
[0,6461,-1,0,5,148,0,509,-1,10715,-1,1,-1,251,-1,1,-1,1,-1,251,-2,3,-1,255,-1,249,-2,5,-2,247,-1,509,-3,251,-1,2,-2,253,-2,252,-2,254,-1,255,-1,254,-1,255,-1,1276,-2,13,-1,233,-1,2,-1,253,-1,1,-1,255,-1,247,-1,1306,-1,1533,-1,1269,-1,1276,-1,2303,-1]},


"styles":
[{"f":"rgb(99,230,101)","s":"rgb(5,148,0)","lw":"0"}],

"geom":
[
{"s":0,"p":[4,143,5,144,3,146,1,146,2,143,4,143],"c":"layer1156_5098"},
{"s":0,"p":[-2,143,0,140,2,141,2,144,1,146,-2,144,-2,143],"c":"layer1156_5067"},
{"s":0,"p":[7,143,5,144,4,143,2,143,2,141,5,138,6,139,5,141,7,143],"c":"layer1156_5051"},
{"s":0,"p":[10,141,11,137,12,137,14,137,12,142,9,143,9,142,10,141],"c":"layer1156_5041"},
{"s":0,"p":[1,136,0,140,-2,143,-2,136,1,136],"c":"layer1156_5038"},

{"s":0,"p":[8,143,5,141,5,137,8,136,10,137,10,141,8,143],"c":"layer1156_5033"},
{"s":0,"p":[5,137,2,141,0,140,1,136,1,136,2,135,3,136,5,137],"c":"layer1156_5028"},
{"s":0,"p":[10,134,12,136,11,138,8,135,10,134],"c":"layer1156_5020"},
{"s":0,"p":[-2,133,0,136,-2,136,-2,133],"c":"layer1156_5005"},
{...}
...
]
}

How can we replicate the same (or similar) type of speed using postgis (which I what they seem to be using as well)?




Answer



I have seen this technique used in the past. It was explained to me by Zain Memon (from Trulia) who helped giving some input when Michal Migurski was creating TileStache. Zain went over it while explaining his Trulia demo that uses this technique at one of our older SF GeoMeetup meetings some time back. In fact, if you are in SF next week (this is my lame attempt at a plug, he will touch on this, so feel free to show up :)


OK, now to the explanation.


First, you are looking slightly in the wrong place when looking at the json files above.


Let me explain (as short as I can), why.


The tiles are being passed just as regular rendered tiles, no big deal there, we know how to do that and so I don't need to explain that.


If you inspect it in Firebug, you will see that you also get a whole bunch of images that seem to be blank, like this one.


Why is it blank? It is not. The pixels contain data - just not traditional visible image data. They are using a very clever technique to pass data encoded in the pixels themselves.


What has been going in the past decade, is that people have been trading off readability and portability data of formats at the expense of storage efficiency.


Take this example of xml sample data:







-32.1231
10.31243


sold






-33.1231
11.31243


available






OK, how many bites to transfer this? Provided that we are utf8 (1 byte per character when dealing with this content). Well, we have around 176 chars (without counting tabs or spaces) which makes this 176 bytes (this is being optimistic for various reasons that I will omit for the sake of simplicity). Mind you, this is for 2 points!


Still, some smart ass that doesn't understand what he is talking about, somewhere, will claim that "json gives you higher compression".


Fine, let's put the same xml nonsense as json:


{ "data": [
"feature" : { "x" : -32.1231, "y" : 10.31243 , "type": "sold" },
"feature" : { "x" : -33.1231, "y" :11.31243, "type": "avail" },

]
}

How many bytes here? Say ~115 characters. I even cheated a bit and made it smaller.


Say that my area covers 256x256 pixels and that I am at a zoom level so high that each feature renders as one pixel and I have so many features, that it is full. How much data do I need to show that 65,536 features?


54 characters (or utf bytes - and I am even ignoring some other things) per "feature" entry multiplied x 65,536 = 3,538,944 or about 3.3MB


I think you get the picture.


But this is how we transport data in a service oriented architecture. Readable bloated crap.


What if I wanted to transport everything in a binary scheme that I invented myself? Say that instead, I encoded that information in single band image (i.e black and white). And I decided that 0 means sold, and 1 means available, and 2 means I do not know. Heck, in a 1 byte, I have 256 options that I can use - and I am only using 2 or three of them for this example.


What is the storage cost of that? 256x256x 1 (one band only). 65,536 bytes or 0.06MB. And this doesn't even take in consideration other compression techniques that I get for free from several decades of research in image compression.



At this point, you should be asking yourself why do people not simply send data encoded in binary format instead of serializing to json? Well first, turns out, javascript sucks big time for transporting binary data, so that is why people have not done this historically.


An awesome work around has been used by some people when the new features of HTML5 came out, particularly canvas. So what is this awesome work-around? Turns out, you can send data over the wire encoded on what appears to be an image, then you can shove that image into an HTML5 Canvas, which allows you to manipulate the pixels directly! Now you have a way to grab that data, decode it on the client side, and generate the json objects in the client.


Stop a moment and think about this.


You have a way of encoding a huge amount of meaningful geo-referenced data in a highly compressed format, orders of magnitude smaller than anything else done traditionally in web applications, and manipulate them in javascript.


The HTML canvas doesn't even need to be used to draw, it is only used as a binary decoding mechanism!


That is what all those images that you see in Firebug are about. One image, with the data encoded for every single tile that gets downloaded. They are super small, but they have meaningful data.


So how do you encode these in the server side? Well you do need to generalize the data in the server side, and create a meaningful tile for every zoom level that has the data encoded. Currently, to do this, you have to roll your own - an out of the box open source solution doesn't exist, but you have all tools you need to do this available. PostGIS will do the generalization through GEOS, TileCache can be used to cache and help you trigger the generation of the tiles. On the client side, you will need to use HTML5 Canvas to pass on the special "fake tiles" and then you can use OpenLayers to create real client-side javascript objects that represent the vectors with mouse-over effects.


If you need to encode more data, remember that you can always generate RGBA images per pixel (which gives you 4 bytes per pixel or 4,294,967,296 numbers you can represent per pixel). I can think of several ways to use that :)


Update: Answering the QGIS question below.


QGIS like most other Desktop GISes, do not have a fixed set of zoom levels. They have the flexibility of zooming at any scale and just render. Can they show data from WMS or tiles based sources? Sure they can, but most of the time they are really dumb about it: Zoom to a different extent, calculate the bounding box, calculate the required tiled, grab them, show them. Most of the time they ignore other things, like http header caches that would make it so they did not have to refetch. Sometimes they implement a simple cache mechanism (store the tile, if you ask for it, check for the tile, don't ask for it). But this is not enough.



With this technique the tiles and the vectors need to be refetched at every zoom level. Why? Because the vectors have been generalized to accomodate zoom levels.


As far as the whole trick of putting the tiles to an HTML5 canvas so you can access the buffers, that whole thing is not necessary. QGIS allows you to write code in Python and C++, both languages have excellent support for handling binary buffers, so this work around is really irrelevant for this platform.


*UPDATE 2**:


There was a question about how to create the generalized vector tiles in the first place (baby step 1 before being able to serialize the results into images). Perhaps I did not clarify enough. Tilestache will allow you create effective "vector tiles" of your data at every zoom level (it even has an option that allows you to either clip or not clip the data when it passes the tile boundary). This takes care of separating the vectors into tiles at various zoom levels. I would choose the "not clip" option (but it will pick an arbitrary tile where it covers more area). Then you can feed every vector through GEOS generalize option with a big number, in fact, you want it big enough that polylines and polygons collapse onto themselves, because if they do, you can remove them from the zoom level since for that stage they are irrelevant. Tilestache even allows you to write easy pythonic data providers where you can put this logic. At that stage, you can choose to serve them as json files (like they do with some of the african map samples) or as serialized geometries into the pngs, like they do in other samples (or the Trulia one) I gave above.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...