I've written this code in order to calculate the percent cover of grassland within a national park, within 20km^2 of points of recorded occurrences for a species. It is designed to only consider the areas within the park, and not those outside. It works, except for one huge issue... it's incredibly slow. I think I've tracked it down to the avail_lc_type = pt_buffer.intersection(gl).area
. The gl polygon has > 24,000 polygons in it. It is a raster to polygon transformation (it has been dissolved), so it has a lot of little polygons within. Right now it's not a huge problem since I'm running it on ~300 points (still takes > 1 hr), but I plan to run it on a few million later, so I need to improve it.
Any ideas?
import numpy as np
import pprint
import shapely
from shapely.geometry import*
import fiona
from fiona import collection
import math
traps = fiona.open('some_points.shp', 'r') #point file: focal points around which statistics are being derived
study_area = fiona.open('available_areas.shp', 'r') #polygon file: represents area available for analysis
for i in study_area: #for every record in 'study_area'
sa = shape(i['geometry']) #make a variable called 'sa' that is a polygon
grassland = fiona.open('land_cover_type_of_interest.shp', 'r') #polygon file: want to calculate percent cover of this lc type within study_area, and within areaKM2 (next variable) of each focal point
pol = grassland.next()
gl = MultiPolygon([shape(pol['geometry']) for pol in grassland])
areaKM2 = 20 #hyp home range size of specie of interest
with traps as input:
#calculate initial area in meters, set radius
areaM2 = areaKM2 * 1000000
r = (math.sqrt(areaM2/math.pi))
#begin buffering and calculating available area (i.e. within study area) for each point
for point in input:
pt_buffer = shape(point['geometry']).buffer(r)
avail_area = pt_buffer.intersection(sa).area
#check and adjust radius of buffer until it covers desired available area within study area
while avail_area < areaM2:
r += 300
pt_buffer = shape(point['geometry']).buffer(r)
avail_area = pt_buffer.intersection(sa).area
#then, calulate percent cover of land cover type of interest within adjusted buffer area
#print to check
avail_lc_type = pt_buffer.intersection(gl).area
perc_cov = (avail_lc_type/areaM2) * 100
print perc_cov
Using the answer from @MWrenn I was able to profile my code and came up with this:
55.3555590078
415 function calls (365 primitive calls) in 48.633 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 48.632 48.632 :15(neighb_func)
1 0.000 0.000 48.633 48.633 :1()
2 0.000 0.000 0.001 0.000 :531(write)
1 0.000 0.000 0.000 0.000 __init__.py:1118(debug)
1 0.000 0.000 0.000 0.000 __init__.py:1318(getEffectiveLevel)
1 0.000 0.000 0.000 0.000 __init__.py:1332(isEnabledFor)
1 0.000 0.000 0.000 0.000 _abcoll.py:483(update)
3 0.000 0.000 0.000 0.000 _weakrefset.py:68(__contains__)
1 0.000 0.000 0.000 0.000 abc.py:128(__instancecheck__)
1 0.000 0.000 0.000 0.000 abc.py:148(__subclasscheck__)
11 0.000 0.000 0.000 0.000 base.py:195(_is_empty)
11 0.003 0.000 0.003 0.000 base.py:202(empty)
7 0.000 0.000 0.003 0.000 base.py:212(__del__)
25 0.000 0.000 0.000 0.000 base.py:231(_geom)
2 0.000 0.000 0.000 0.000 base.py:235(_geom)
3 0.000 0.000 0.000 0.000 base.py:313(geometryType)
3 0.000 0.000 0.000 0.000 base.py:316(type)
3 0.000 0.000 0.001 0.000 base.py:383(area)
2 0.000 0.000 0.001 0.000 base.py:443(buffer)
8 0.000 0.000 0.000 0.000 base.py:46(geometry_type_name)
5 0.000 0.000 0.000 0.000 base.py:52(geom_factory)
3 0.000 0.000 48.626 16.209 base.py:529(intersection)
10 0.000 0.000 0.000 0.000 brine.py:106(_dump_int)
2 0.000 0.000 0.000 0.000 brine.py:150(_dump_str)
12/2 0.000 0.000 0.000 0.000 brine.py:179(_dump_tuple)
24/2 0.000 0.000 0.000 0.000 brine.py:202(_dump)
2 0.000 0.000 0.000 0.000 brine.py:332(dump)
10/2 0.000 0.000 0.000 0.000 brine.py:360(dumpable)
14/8 0.000 0.000 0.000 0.000 brine.py:369()
2 0.000 0.000 0.000 0.000 channel.py:56(send)
1 0.000 0.000 0.000 0.000 collection.py:186(filter)
1 0.000 0.000 0.000 0.000 collection.py:274(__iter__)
1 0.000 0.000 0.000 0.000 collection.py:364(closed)
1 0.000 0.000 0.000 0.000 collections.py:37(__init__)
25 0.000 0.000 0.000 0.000 collections.py:53(__setitem__)
4 0.000 0.000 0.000 0.000 compat.py:17(BYTES_LITERAL)
2 0.000 0.000 0.000 0.000 coords.py:20(required)
2 0.000 0.000 0.000 0.000 geo.py:20(shape)
5 0.000 0.000 0.000 0.000 geos.py:484(errcheck_predicate)
8 0.000 0.000 0.000 0.000 impl.py:43(__getitem__)
2 0.000 0.000 0.000 0.000 point.py:124(_set_coords)
2 0.000 0.000 0.000 0.000 point.py:188(geos_point_from_py)
2 0.000 0.000 0.000 0.000 point.py:37(__init__)
2 0.000 0.000 0.001 0.000 protocol.py:220(_send)
2 0.000 0.000 0.001 0.000 protocol.py:227(_send_request)
2 0.000 0.000 0.000 0.000 protocol.py:241(_box)
2 0.000 0.000 0.001 0.000 protocol.py:438(_async_request)
2 0.000 0.000 0.000 0.000 stream.py:173(write)
11 0.000 0.000 0.000 0.000 topology.py:14(_validate)
3 0.000 0.000 0.000 0.000 topology.py:33(__call__)
3 48.626 16.209 48.626 16.209 topology.py:40(__call__)
2 0.001 0.000 0.001 0.000 topology.py:57(__call__)
5 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
5 0.000 0.000 0.000 0.000 {__import__}
5 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode}
3 0.000 0.000 0.000 0.000 {_ctypes.byref}
6/2 0.000 0.000 0.000 0.000 {all}
6 0.000 0.000 0.000 0.000 {getattr}
5 0.000 0.000 0.000 0.000 {globals}
10 0.000 0.000 0.000 0.000 {hasattr}
5 0.000 0.000 0.000 0.000 {isinstance}
31 0.000 0.000 0.000 0.000 {len}
5 0.000 0.000 0.000 0.000 {locals}
1 0.000 0.000 0.000 0.000 {math.sqrt}
2 0.000 0.000 0.000 0.000 {method 'acquire' of 'thread.lock' objects}
24 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
28 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects}
5 0.000 0.000 0.000 0.000 {method 'pack' of 'Struct' objects}
2 0.000 0.000 0.000 0.000 {method 'release' of 'thread.lock' objects}
2 0.000 0.000 0.000 0.000 {method 'send' of '_socket.socket' objects}
2 0.000 0.000 0.000 0.000 {next}
I'm still trying to figure out exactly what it means, but I think it's telling me that the intersection
function is taking ~16 seconds each time it's called. So, per @gene 's suggestion, I'm working on using a spatial index. I just have to figure out how to get libspatialindex going so that I can use Rtrees.
Answer
You need to use a spatial index. Without an spatial index, you must iterate through all the geometries. With a bounding spatial index, you iterate only through the geometries which have a chance to intersect the other geometries.
Popular bounding spatial indexes in Python:
Examples:
In your script:
- Why
from fiona import collection
if youimport fiona
and use onlyfiona
?import numpy as np
if you are not using Numpy butmath
?import shapely
if you specifyfrom shapely.geometry import *
?
- If
sa
is a list, then usesa = [shape(i['geometry']) for i in for i in study_area]
. If not, you need a condition; otherwisesa
will be the last feature ofstudy_area
. - Why
pol = grassland.next()
if you read the shapefile withMultiPolygon([shape(pol['geometry']) for pol in grassland])
?
Some suggestions:
You use the grassland and trap variables once, so you don't need them
gl = MultiPolygon([shape(pol['geometry']) for pol in fiona.open('land_cover_type_of_interest.shp')])
and
for point in fiona.open('some_points.shp'):
No comments:
Post a Comment