Thursday, 27 September 2018

python - Find csv lat and long points in a shapefile polygon with geopandas spatial index


I have a csv with two sets of latitude and longitude points, and a shapefile with espg 4326. I want to find which points are inside the shapefile. I've been following the guide here which uses a spatial index to speed up the search however I'm running into an error.



The csv looks like:


e6ed541a288f13745d2755a79372c0c4,fc707fee61baa13b40278a70c679b9a7,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T02:03:18.000Z,2,2016-11-01T03:01:13.000Z,2,51.456,0.146,51.506,-0.125,false,false,EE,2,3,2,,,,
36e6a9d201be3957fb78be4dfaacb932,d3cbd057ca3bf69f6805d054b4a228c5,52720e003547c70561bf5e03b95aa99f,1,2016-11-01T16:21:45.467Z,2,2016-11-01T16:43:01.627Z,2,51.499,-0.008,51.506,-0.101,false,false,EE,1,1,1,,,,
f15a4077cc79217e620270a24c127ed4,581c63e4232ae5208f3ae23a09fff3ea,13f9896df61279c928f19721878fac41,1,2016-11-01T20:31:36.000Z,2,2016-11-01T21:21:10.000Z,2,51.529,-0.124,51.512,-0.092,false,false,EI,2,2,1,,,,
bcd71b06e56e2e023fa055a40ee03f20,0ed657e42b4d820eed2affd146067623,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T22:58:42.000Z,2,2016-11-01T23:27:24.000Z,2,51.503,-0.113,51.505,-0.086,false,false,EE,2,3,2,,,,

My code so far is:


trip = pd.read_csv(r'test\TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon'])
geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])]
geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])]

trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1)
starts = GeoDataFrame(trip, crs=crs, geometry=geometry) #turn dataframe into geodataframe
ends = GeoDataFrame(trip, crs=crs, geometry=geometry2)
FRC1 = geopandas.GeoDataFrame.from_file('FRC1Shapefile.shp') # import FRC1 polygon

spatial_index = starts.sindex
possible_matches_index = list(spatial_index.intersection(FRC1.bounds))
possible_matches = starts.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(FRC1)]


the error I'm getting is from RTree:


RTreeError: Coordinates must be in the form (minx, miny, maxx, maxy) or (x, y) for 2D indexes

EDIT: Here's an image of the points sjoin is selecting. enter image description here



Answer



There are easier ways, look at More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc or Check if a point falls within a multipolygon with Python.


1) Use directly a GeoPandas spatial-join with the command sjoin which incorporates a spatial index (rtree, line 29 of sjoin.py)


import pandas as pd
trip = pd.read_csv('TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon'])
from shapely.geometry import Point

geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])]
geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])]
trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1)
crs = {'init' :'epsg:4326'}
import geopandas as gp
starts = gp.GeoDataFrame(trip, crs=crs, geometry=geometry)
ends = gp.GeoDataFrame(trip, crs=crs, geometry=geometry2)
starts.head()

enter image description here



 FRC1  = gp.read_file('polygons.shp')

enter image description here


2) Now use sjoin


from geopandas.tools import sjoin
pointInPolys = sjoin(starts, FRC1, how='left',op="within")

enter image description here


The point 0 is within polygon 2, the points 2, 3 are within polygon 1, the point 1 is not in a polygon


Eliminate the Nan values: point(s) is/are not in a polygon



pointInPolys = pointInPolys.dropna()
pointInPolys

enter image description here


3) Conclusion


These 3 points are inside the shapefile


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...