I have a dataframe of X,Y coordinates that represent points along the paths taken by several different entities. Pseudo-data here, but it is roughly of the form:
entity_id lat lon time
1001 34.5 14.2 4:55 pm
1001 34.7 14.5 4:58 pm
1001 35.0 14.6 5.03 pm
1002 27.1 19.2 2:01 pm
1002 27.4 19.3 2:08 pm
1002 27.4 19.9 2:09 pm
What I would like to do is group these points by entity_id
, and then arrange the points sequentially in time to create a LineString
object for each entity_id
. The output will be several lines/paths, with each corresponding to an entity_id
.
I can do this by looping through each entity_id
and each point in entity_id
and using the instructions provided here, but is there a faster/more efficient way to do this leveraging GeoPandas or Shapely, perhaps with groupby
?
Answer
I think I found an interim solution, which I'm posting in case it's useful for anyone:
import pandas as pd
import numpy as np
from geopandas import GeoDataFrame
from shapely.geometry import Point, LineString
# Zip the coordinates into a point object and convert to a GeoDataFrame
geometry = [Point(xy) for xy in zip(df.lon, df.lat)]
df = GeoDataFrame(df, geometry=geometry)
# Aggregate these points with the GroupBy
df = df.groupby(['entity_id'])['geometry'].apply(lambda x: LineString(x.tolist()))
df = GeoDataFrame(df, geometry='geometry')
Note that if you have single-point trajectories in your data, you will have to discard these first or LineString will throw an error.
This and this post were helpful in writing the GroupBy function.
Update: If you didn't discard the single point, you can also use the conditional sentence like:
df = df.groupby(['entity_id'])['geometry'].apply(lambda x: LineString(x.tolist()) if x.size > 1 else x.tolist())
No comments:
Post a Comment