Wednesday, 18 December 2019

shapely - Turning GeoDataFrame of x,y coordinates into Linestrings using GROUPBY?


I have a dataframe of X,Y coordinates that represent points along the paths taken by several different entities. Pseudo-data here, but it is roughly of the form:


entity_id   lat   lon   time

1001 34.5 14.2 4:55 pm
1001 34.7 14.5 4:58 pm
1001 35.0 14.6 5.03 pm

1002 27.1 19.2 2:01 pm
1002 27.4 19.3 2:08 pm

1002 27.4 19.9 2:09 pm

What I would like to do is group these points by entity_id, and then arrange the points sequentially in time to create a LineString object for each entity_id. The output will be several lines/paths, with each corresponding to an entity_id.


I can do this by looping through each entity_id and each point in entity_id and using the instructions provided here, but is there a faster/more efficient way to do this leveraging GeoPandas or Shapely, perhaps with groupby?



Answer



I think I found an interim solution, which I'm posting in case it's useful for anyone:


import pandas as pd
import numpy as np
from geopandas import GeoDataFrame
from shapely.geometry import Point, LineString


# Zip the coordinates into a point object and convert to a GeoDataFrame
geometry = [Point(xy) for xy in zip(df.lon, df.lat)]
df = GeoDataFrame(df, geometry=geometry)

# Aggregate these points with the GroupBy
df = df.groupby(['entity_id'])['geometry'].apply(lambda x: LineString(x.tolist()))
df = GeoDataFrame(df, geometry='geometry')

Note that if you have single-point trajectories in your data, you will have to discard these first or LineString will throw an error.



This and this post were helpful in writing the GroupBy function.




Update: If you didn't discard the single point, you can also use the conditional sentence like:


 df = df.groupby(['entity_id'])['geometry'].apply(lambda x: LineString(x.tolist()) if x.size > 1 else x.tolist())

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...