Monday 25 December 2017

python - Understanding use of spatial indexes with RTree?


I'm having trouble understanding the use of spatial indexes with RTree.


Example: I have 300 buffered points, and I need to know each buffer's intersection area with a polygon shapefile. The polygon shapefile has >20,000 polygons. It was suggested I use spatial indices to speed up the process.


SO... If I create a spatial index for my polygon shapefile, will it be "attached" to the file in some way, or will the index stand alone? That is, after creating it can I just run my intersection function on the polygon file and get faster results? Will intersection "see" that there are spatial indices and know what to do? Or, do I need to run it on the index, then relate those results back to my original polygon file via FIDs or some such?



The RTree documentation is not helping me very much (probably because I'm just learning programming). They show how to create an index by reading in manually created points, and then querying it against other manually created points, which returns ids that are contained within the window. Makes sense. But, they don't explain how that would relate back to some original file that the index would have come from.


I'm thinking it must go something like this:



  1. Pull bboxes for each polygon feature from my polygon shapefile and place them in a spatial index, giving them an id that is the same as their id in the shapefile.

  2. Query that index to get the ids that intersect.

  3. Then re-run my intersection on only the features in my original shapefile that were identified by querying my index (not sure how I'd do this last part).


Do I have the right idea? Am I missing anything?




Right now I'm trying to get this code to work on one point shapefile that contains only one point feature, and one polygon shapefile that contains >20,000 polygon features.



I'm importing the shapefiles using Fiona, adding the spatial index using RTree, and trying to do the intersection using Shapely.


My test code looks like this:


#point shapefile representing location of desired focal statistic
traps = fiona.open('single_pt_speed_test.shp', 'r')

#polygon shapefile representing land cover of interest
gl = MultiPolygon([shape(pol['geometry']) for pol in fiona.open('class3_aa.shp', 'r')])

#search area
areaKM2 = 20


#create empty spatial index
idx = index.Index()

#set initial search radius for buffer
areaM2 = areaKM2 * 1000000
r = (math.sqrt(areaM2/math.pi))

#create spatial index from gl
for i, shape in enumerate(gl):

idx.insert(i, shape.bounds)

#query index for ids that intersect with buffer (will eventually have multiple points)
for point in traps:
pt_buffer = shape(point['geometry']).buffer(r)
intersect_ids = pt_buffer.intersection(idx)

But I keep getting TypeError: 'Polygon' object is not callable



Answer



That's the gist of it. The R-tree allows you to make a very fast first pass and gives you a set of results that will have "false positives" (bounding boxes may intersect when the geometries precisely do not). Then you go over the set of candidates (fetching them from the shapefile by their index) and do a mathematically precise intersection test using, e.g., Shapely. This is the very same strategy that's employed in spatial databases like PostGIS.



No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...