Friday, 24 July 2015

python 2.7 - RTree spatial index does not result in faster intersection computation


I have some code that I am using to determine which Shapely Polygon/MultiPolygons intersect with a number of Shapely LineStrings. Through the answers to this question the code has gone from this:


import fiona
from shapely.geometry import LineString, Polygon, MultiPolygon, shape

# Open each layer
poly_layer = fiona.open('polygon_layer.shp')
line_layer = fiona.open('line_layer.shp')


# Convert to lists of shapely geometries
the_lines = [shape(line['geometry']) for line in line_layer]
the_polygons = [(poly['properties']['GEOID'], shape(poly['geometry'])) for poly in poly_layer]

# Check for Polygons/MultiPolygons that the LineString intersects with
covered_polygons = {}
for poly_id, poly in the_polygons:
for line in the_lines:
if poly.intersects(line):
covered_polygons[poly_id] = covered_polygons.get(poly_id, 0) + 1


where every possible intersection is checked, to this:


import fiona
from shapely.geometry import LineString, Polygon, MultiPolygon, shape
import rtree

# Open each layer
poly_layer = fiona.open('polygon_layer.shp')
line_layer = fiona.open('line_layer.shp')


# Convert to lists of shapely geometries
the_lines = [shape(line['geometry']) for line in line_layer]
the_polygons = [(poly['properties']['GEOID'], shape(poly['geometry'])) for poly in poly_layer]

# Create spatial index
spatial_index = rtree.index.Index()
for idx, poly_tuple in enumerate(the_polygons):
_, poly = poly_tuple
spatial_index.insert(idx, poly.bounds)


# Check for Polygons/MultiPolygons that the LineString intersects with
covered_polygons = {}
for line in the_lines:
for idx in list(spatial_index.intersection(line.bounds)):
if the_polygons[idx][1].intersects(line):
covered_polygons[idx] = covered_polygons.get(idx, 0) + 1

where the spatial index is used to reduce the number of intersection checks.


With the shapefiles I have (approximately 4000 polygons, and 4 lines), the original code performs 12936 .intersection() checks and takes about 114 seconds to run. The second piece of code that uses the spatial index performs only 1816 .intersection() checks but it also takes approximately 114 seconds to run.


The code to build the spatial index only takes 1-2 seconds to run, so the 1816 checks in the second piece of code are taking just about the same amount of time to perform as the 12936 checks in the original code (since the loading of shapefiles and converting to Shapely geometries is the same in both pieces of code).



I cannot see any reason that the spatial index would make the .intersects() check take longer, so I am at a loss for why this is happening.


I can only think that I am using the RTree spatial index incorrectly. Thoughts?



Answer



Edit: To clarify this answer, I incorrectly believed that all the intersection tests took approximately the same amount of time. This is not the case. The reason I did not get the speed up I expected from using a spatial index is that the selection of intersection tests are the ones that took the longest to do in the first place.


As gene and mgri have already said, a spatial index is not a magic wand. Although the spatial index pared down the number of intersection tests that needed to be performed from 12936 to only 1816, the 1816 tests are the test that took the vast majority of time to compute in the first place.


The spatial index is being used correctly, but the assumption that each intersection test takes roughly the same amount of time is what is incorrect. The time required by the intersection test can vary greatly (0.05 seconds versus 0.000007 seconds).


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...