I'm working with a dataset that contains millions of polygons representing leased properties. In many cases, there are multiple polygons that are drawn on top of one another (representing multiple leases for the same property).
Currently I'm working on a process that will group these so that only one polygon geometry is used to represent multiple lease counterparts (same property leased multiple times). I have come up with a python script that uses an arcpy Select by Location, and compares the XY centroid coordinates, perimeter, area, and extent of overlapping polygons to apply a unique ID to these groups (basically). This works just fine, but it takes forever.
I'm wondering: would it be possible to use SQL (Using SQL Server Management Studio) to identify overlapping polygons (based on buffer of centroid) and group by other spatial attributes of the input polygon?
I have made an attempt at getting this to work, but got too many groups. I have around 41,000 polygons (for the first county) and should have fewer than 14,000 'unique' polygons or groups. I have access to FME and ArcGIS Advanced Licenses, so can anyone provide an alternative to using an arcpy.da Search cursor to select by location and compare selected polygon attributes in python?
Here is my first attempt at the SQL Query. The projected coordinate system uses meters, thus the +- 11 in the query for the extents.
SELECT a.Poly_ID
FROM
(
SELECT [OBJECTID]
,[SHAPE]
,[calc_acres]
,[perimeter]
,[CentX]
,[CentY]
,[X_Max]
,[Y_Max]
,[X_Min]
,[Y_Min]
,[Poly_ID]
,[leaseId]
,geometry::STGeomFromText('POINT (' + convert(varchar,[CentX]) + ' '+ convert(varchar,[CentY]) + ')', 5070) as cent_geom
FROM [poly_topology].[gis_mfg].[LTPOLYGONS]) a
left join (
SELECT [OBJECTID]
,[SHAPE]
,[calc_acres]
,[perimeter]
,[CentX]
,[CentY]
,[X_Max]
,[Y_Max]
,[X_Min]
,[Y_Min]
,[Poly_ID]
,[leaseId]
,geometry::STGeomFromText('POINT (' + convert(varchar,[CentX]) + ' '+ convert(varchar,[CentY]) + ')', 5070) as cent_geom
FROM [poly_topology].[gis_mfg].[LTPOLYGONS]) b on
a.cent_geom.STBuffer(6).STContains(b.cent_geom) = 1 and
b.perimeter > (a.perimeter - (a.perimeter * 0.03)) and b.perimeter < (a.perimeter + (a.perimeter * 0.03)) and
b.calc_acres > (a.calc_acres - (a.calc_acres * 0.03)) and b.calc_acres < (a.calc_acres + ( a.calc_acres * 0.03)) and
b.X_Max > (a.X_Max - 11) and b.X_Max < (a.X_Max + 11) and
b.Y_Max > (a.Y_Max - 11) and b.Y_Max < (a.Y_Max + 11) and
b.X_Min > (a.X_Min - 11) and b.X_Min < (a.X_Min + 11) and
b.Y_Min > (a.Y_Min - 11) and b.Y_Min < (a.Y_Min + 11) and b.leaseId != a.leaseId
group by a.Poly_ID
order by a.Poly_ID`
UPDATE: After implementing FelixIP's idea into my process, I found the following link: http://clear.uconn.edu/tools/Shape_Metrics/method.htm
This Shape Metrics tool is old, but the methodologies seem more robust than relying on perimeter, area, extent, and centroid alone. These values are susceptible to outliers, so not every polygon in a group is caught (mainly resulting from extraneous boundaries where two polygons were dissolved but the boundaries were not completely overlapping).
Answer
I found that replacing geometries by their string "signatures" is very efficient technique when comparing geometries. E.g. Assign point IDs to respective start and end attributes of a polyline or finding points that overlap. This approach combined with dictionaries is a game changer.
By some reason, that I don't fully understand, truncate produce more robust results than rounding or converting decimal to integers described in my comments. Thus this:
def truncate(f, n):
s = '{}'.format(f)
i, p, d = s.partition('.')
return '.'.join([i, (d+'0'*n)[:n]])
'============================================
truncate( !Shape!.centroid.X,2)+truncate( !Shape!.centroid.Y,2)+truncate( !Shape!.area,2)
will potentially work better, when populating field for dissolve. In fact it is enough to summarise it and find minimum of sequential integer value stored in other field (FID and OBJECTID won't work when using standard Summarise tool, but are perfectly fine in script!). This way first unique features can be found and exported for further analysis
No comments:
Post a Comment