Monday, 1 October 2018

shapely - Intersecting two shapefiles from Python or command line?


I'm new to working with geo data and trying to address a similar question to the one posted as How to calculate the size of a particular area below a buffer in QGIS


Using command line and/or shapely/fiona, in the most efficient and elegant way.



I created two shapefiles with their respective numerical attributes:



  1. small areas with data on population

  2. police districts with crime stats(single number)


They are independent.


I'd like to recalculate those such that both estimates are present in each of the shapefiles, eg. estimate of crime within small area (take data from 2 and look for all areas in 1 that are contained in it, cut those that intersect and do maths).


I have the maths figured out and stuck looking for elegant ways and efficient data formats to do intersections and cut out the overlaps to appropriate boundaries.



The adapted code does not finding any intersections in the data, which is structured as follows (and surprising since it's from official sources):




  1. polSHP_upd: police precincts across SA

  2. sal_subSHP: subset of small regions with population data in Cape Town


The files live here:


https://github.com/AnnaMag/GIS-analyses


What am I missing?


polSHP_upd = 'crime/GIS-analyses/polPres_updated4crimes/polPres_updated4crimes.shp'
sal_subSHP = 'crime/GIS-analyses/sal_sub/sal_sub.shp'


import geopandas as gpd
g1 = gpd.GeoDataFrame.from_file(polSHP_upd)
g2 = gpd.GeoDataFrame.from_file(sal_subSHP)
data = []
# A: region of the police data with criminal record
# C: small area with population data
# we look for all small areas intersecting a given C_i, calculate the fraction of inclusion, scale the
# population accordingly: area(A_j, where A_j crosses C_i)/area(A_j)* popul(A_j)
for index, crim in g1.iterrows():
for index2, popu in g2.iterrows():

if popu['geometry'].crosses(crim['geometry']): # objects overlaps to partial extent, not contained
area_int = popu['geometry'].intersection(crim['geometry']).area
area_crim = crim['geometry'].area
area_popu = popu['geometry'].area # there is a Shape_Area field in properties: to check precision
popu_count = popu['properties']['PPL_CNT']
popu_frac = (area_int / area_popu) * popu_count# fraction of the pop area contained inside the crim
data.append({'id': (index1, index2) ,'area_crim': area_crim,'area_pop': area_popu, 'area_inter': area_int, 'popu_frac': popu_frac} )
# data.append({'geometry': crim['geometry'].intersection(popu['geometry']), 'crime_stat':crim['crime_stat'], 'Population': popu['Population'], 'area':crim['geometry'].intersection(popu['geometry']).area})

elif popu['geometry'].intersects(crim['geometry']):

pass
#print("intersect")

elif popu['geometry'].disjoint(crim['geometry']):
pass`

Answer



The question is about Shapely and Fiona in pure Python without QGIS ("using command line and/or shapely/fiona").


A solution is


from shapely import shape, mapping
import fiona

# schema of the new shapefile
schema = {'geometry': 'Polygon','properties': {'area': 'float:13.3','id_populat': 'int','id_crime': 'int'}}
# creation of the new shapefile with the intersection
with fiona.open('intersection.shp', 'w',driver='ESRI Shapefile', schema=schema) as output:
for crim in fiona.open('crime_stat.shp'):
for popu in fiona.open('population.shp'):
if shape(crim['geometry']).intersects(shape(popu['geometry'])):
area = shape(crim['geometry']).intersection(shape(popu['geometry'])).area
prop = {'area': area, 'id_populat' : popu['id'],'id_crime': crim['id']}
output.write({'geometry':mapping(shape(crim['geometry']).intersection(shape(popu['geometry']))),'properties': prop})


The original two layers and the resulting layer


enter image description hereenter image description here


Part of the resulting layer table


enter image description here


You can use a spatial index (rtree here, look at GSE: Fastest way to join many points to many polygons in python and Using Rtree spatial indexing with OGR)


Another solution is to use GeoPandas (= Pandas + Fiona + Shapely)


import geopandas as gpd
g1 = gpd.GeoDataFrame.from_file("crime_stat.shp")
g2 = gpd.GeoDataFrame.from_file("population.shp")

data = []
for index, crim in g1.iterrows():
for index2, popu in g2.iterrows():
if crim['geometry'].intersects(popu['geometry']):
data.append({'geometry': crim['geometry'].intersection(popu['geometry']), 'crime_stat':crim['crime_stat'], 'Population': popu['Population'], 'area':crim['geometry'].intersection(popu['geometry']).area})

df = gpd.GeoDataFrame(data,columns=['geometry', 'crime_stat', 'Population','area'])
df.to_file('intersection.shp')
# control of the results in mi case, first values
df.head() # image from a Jupiter/IPython notebook


enter image description here


Update


You need to understand the definition of the spatial predicates. I use here the JTS Topology suite


enter image description here


As you can see there are only intersections and no crosses nor disjoint here. Some definitions from the Shapely manual



object.crosses(other): Returns True if the interior of the object intersects the interior of the other but does not contain it, and the dimension of the intersection is less than the dimension of the one or the other.
object.disjoint(other): Returns True if the boundary and interior of the object do not intersect at all with those of the other.
object.intersects(other): Returns True if the boundary and interior of the object intersect in any way with those of the other.




You can control it by a simple script (there are other solution but this one is the simplest)


i = 0
for index, crim in g1.iterrows():
for index2, popu in g2.iterrows():
if popu['geometry'].crosses(crim['geometry']):
i= i+1
print i

and the result is 0



Therefore, you only need intersects here.


Your script becomes


data = []
for index1, crim in g1.iterrows():
for index2, popu in g2.iterrows():
if popu['geometry'].intersects(crim['geometry']): # objects overlaps to partial extent, not contained
area_int = popu['geometry'].intersection(crim['geometry']).area
area_crim = crim['geometry'].area
area_popu = popu['geometry'].area #
# popu['properties'] is for Fiona, not for Pandas

popu_count = popu['PPL_CNT']
popu_frac = (area_int / area_popu) * popu_count#
# you must include the geometry, if not, it is a simple Pandas DataFrame and not a GeoDataframe
# Fiona does not accept a tuple as value of a field 'id': (index1, index2)
data.append({'geometry': crim['geometry'].intersection(popu['geometry']), 'id1': index1, 'id2':index2 ,'area_crim': area_crim,'area_pop': area_popu, 'area_inter': area_int, 'popu_frac': popu_frac} )

df = gpd.GeoDataFrame(data,columns=['geometry', 'id1','id2','area_crim', 'area_pop','area_inter'])
df.to_file('intersection.shp')
df.head()


enter image description here


Result:


enter image description here


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...