Wednesday, 1 November 2017

qgis - Divide a complex shapefile into a grid


I have a decently detailed shapefile with polygon/multipolygon features (the file is about 500mb). It's actually a shapefile of the entire world, with the features representing coastlines. I need to divide this data using a grid. To be clear, I don't want to 'sort' the data, but actually cut the polygons up into tiles. I realize this question has been asked before but the solutions I found didn't work for me.


I've tried:




  • Using QGIS and intersecting my shapefile contents with a vector grid -- the results are terrible. Most of the major landmass magically disappears, though it seems like smaller chunks of land sometimes make it. I should note that this method works really well with much simpler data (ie. less points)





  • Using OGR's Intersection tools. I tried it both through ogr2ogr and even by rolling my own C++ tool. Both of them have the same problem as QGIS. They also don't exhibit this problem for simple files, but fail the more complex ones. For reference, I'm using a shapefile of Australia and New Zealand, under 20mb in size, and both QGIS and OGR fail to 'gridify' it.




Someone suggested using PostGIS at one point, since it has an intersection function -- but PostGIS's ST_Intersect uses the same GEOS back end as OGR does. In fact they both call the same function as far as I can tell, so I don't think that PostGIS will yield different results.


I was looking for suggestions as to what else I could try. I need a robust application or toolkit that can divide highly detailed shapefiles into tiles.


EDIT: Adding some more information


In response to Simbamangu:




  • The shapefile is basically coastline data from OpenStreetMap. It's a merged version of the 'processed_p' file (so its not split up into tiles) that I got by emailing their dev list. Note that their splitting up of tiles (into 100km x 100km chunks with overlap) isn't necessarily what I want -- I don't want overlap, and I want the freedom to choose the grid size, or I'd just use the default processed_p.





  • By default the coastline data has geometry errors reported by QGIS. I fix these errors with a small tool I put together using some code I found designed to specifically address this problem (repairing geometry errors in coastline data: https://github.com/tudelft-gist/prepair). Running over the files with this tool fixes virtually all the errors QGIS picks up. I only attempt to do the intersection after cleaning the files.




  • Exactly what I did using QGIS: Open the data to make sure it looks fine in QGIS. Try dividing it into tiles by creating a layer of tiles using Vector Grid with a specified spacing, and then intersecting the two layers -- no go. Try using a smaller data set -- select features in Oceania (Aus, NZ) to try a smaller data set -- this shape file is < 20mb in size. Again try dividing it, doesn't work.




  • What I did with OGR: ogr2ogr directly using the '-spat' and '-clipsrc' options with spat_extent. Also wrote a small C++ tool that works on WKT, so I convert the shapefile to WKT using ogr2ogr, then feed the text file to my application. It runs through the file and calls the Intersection() method documented here: http://www.gdal.org/ogr/classOGRGeometry.html. I think it ends up doing the exact same thing as using ogr2ogr directly.





In response to Brent:



  1. It does. Everything is in WGS84 Lat/Lon

  2. I would have thought that the opposite is true -- that for a given set of grid tiles, it would take way longer to intersect one giant multipolygon rather than a bunch of fragmented features which could be more spatially localized to each tile, but this is an interesting suggestion -- I'll try it and report back.

  3. No attribute fields are kept during the process, I'm only interested in geometry.

  4. I'm not sure, but I think you're saying I should select the polygons that overlap a given grid tile and then perform the intersection. This is too cumbersome manually with QGIS. My tool already does this to a certain extent with a bounding box check. There's a bit of a speed up, but the end result is still poor and not noticeably different.

  5. This isn't an option. Right now I'm trying to divide the data up so that its 1 deg lat x 1 deg lon, and I'm looking for a general/robust methodology that works with all cases. I've tried increasing the grid size (ie 10x10) to see if I'd get better results and I don't see any correlation between grid size and the quality of the output.


Edit #2:



I've tried playing around with this more and in general it just seems that the results are unreliable both using GEOS and with QGIS (which uses fTools, I don't know if that in turn uses GEOS again). I was wrong in stating the size of the grid has nothing to do with the results -- the bigger the grid is, the better the results (that's good to know but still not a solution). Here's a screenshot of a really spaced out grid that mostly worked, but failed partially in one tile:


enter image description here


The geometry is clean -- QGIS shows 0 errors with the "Check Validity" tool. I'm not looking to approach this problem in a step by step fashion; verifying whether or not certain features failed the intersection on a dataset this big when its not visually apparent (and it won't be with smaller tiles) isn't practical.



Answer



I just ended up creating my own tools to do this.


I used the Clipper library (http://www.angusj.com/delphi/clipper.php) along with OGR to divide my data set up. Something to note is performing intersections naively with this lib takes very long, so I instead used a quadtree approach... ie, divide into four grid cells, divide each of those into four more, etc, until you get your desired resolution. The lib works great though, I've attached a screenshot showing the results on the eastern hemisphere:


enter image description here


The above result took about 4.5 hours on a 1.33GHz processor.


Here are the tools in case someone runs into a similar issue in the future. Please note that they're hacked together proof-of-concepts and you probably shouldn't use them directly (might serve as a good starting point for something though):


https://github.com/preet/scratch/tree/master/gis/polytoolkit



https://github.com/preet/scratch/tree/master/gis/shapefiles/shptk


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...