I have a csv with two sets of latitude and longitude points, and a shapefile with espg 4326. I want to find which points are inside the shapefile. I've been following the guide here which uses a spatial index to speed up the search however I'm running into an error.
The csv looks like:
e6ed541a288f13745d2755a79372c0c4,fc707fee61baa13b40278a70c679b9a7,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T02:03:18.000Z,2,2016-11-01T03:01:13.000Z,2,51.456,0.146,51.506,-0.125,false,false,EE,2,3,2,,,,
36e6a9d201be3957fb78be4dfaacb932,d3cbd057ca3bf69f6805d054b4a228c5,52720e003547c70561bf5e03b95aa99f,1,2016-11-01T16:21:45.467Z,2,2016-11-01T16:43:01.627Z,2,51.499,-0.008,51.506,-0.101,false,false,EE,1,1,1,,,,
f15a4077cc79217e620270a24c127ed4,581c63e4232ae5208f3ae23a09fff3ea,13f9896df61279c928f19721878fac41,1,2016-11-01T20:31:36.000Z,2,2016-11-01T21:21:10.000Z,2,51.529,-0.124,51.512,-0.092,false,false,EI,2,2,1,,,,
bcd71b06e56e2e023fa055a40ee03f20,0ed657e42b4d820eed2affd146067623,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T22:58:42.000Z,2,2016-11-01T23:27:24.000Z,2,51.503,-0.113,51.505,-0.086,false,false,EE,2,3,2,,,,
My code so far is:
trip = pd.read_csv(r'test\TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon'])
geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])]
geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])]
trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1)
starts = GeoDataFrame(trip, crs=crs, geometry=geometry) #turn dataframe into geodataframe
ends = GeoDataFrame(trip, crs=crs, geometry=geometry2)
FRC1 = geopandas.GeoDataFrame.from_file('FRC1Shapefile.shp') # import FRC1 polygon
spatial_index = starts.sindex
possible_matches_index = list(spatial_index.intersection(FRC1.bounds))
possible_matches = starts.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(FRC1)]
the error I'm getting is from RTree:
RTreeError: Coordinates must be in the form (minx, miny, maxx, maxy) or (x, y) for 2D indexes
EDIT: Here's an image of the points sjoin is selecting.
Answer
There are easier ways, look at More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc or Check if a point falls within a multipolygon with Python.
1) Use directly a GeoPandas spatial-join with the command sjoin
which incorporates a spatial index (rtree, line 29 of sjoin.py)
import pandas as pd
trip = pd.read_csv('TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon'])
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])]
geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])]
trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1)
crs = {'init' :'epsg:4326'}
import geopandas as gp
starts = gp.GeoDataFrame(trip, crs=crs, geometry=geometry)
ends = gp.GeoDataFrame(trip, crs=crs, geometry=geometry2)
starts.head()
FRC1 = gp.read_file('polygons.shp')
2) Now use sjoin
from geopandas.tools import sjoin
pointInPolys = sjoin(starts, FRC1, how='left',op="within")
The point 0 is within polygon 2, the points 2, 3 are within polygon 1, the point 1 is not in a polygon
Eliminate the Nan values: point(s) is/are not in a polygon
pointInPolys = pointInPolys.dropna()
pointInPolys
3) Conclusion
These 3 points are inside the shapefile
No comments:
Post a Comment