Saturday, 30 June 2018

Deleting many duplicate points using ArcGIS Desktop?


I have a shapefile with more than 6 million points, most of which are identical. I used the built-in tool Delete Identical by using Shape property to do it, which takes almost 10 hours and only complete 20%.


The point shapefile only contains three fields, including ID, x-coordinate and y-coordinate. I want to delete the duplicate records from the existing shapefile based on the condition that the points have same x,y coordinates. I can also save the unique records to another shapefile, depending on which method is much faster.


I am using ArcGIS 10.3.1 Advanced License.



Answer



I found that this script copies unique points into separate originally empty shapefile about 5 times faster, compare to delete identical. I tested it on 200,000 points, with 100,000 of them being a duplicate.


import arcpy, time,os

from arcpy import env
env.overwriteoutput=True
allPoints=r'C:\...\ScratchFolder\POINTs.shp'
outFC=r'C:\...\ScratchFolder\OUTPUT\Block_00000.shp'
curT=arcpy.da.InsertCursor(outFC,"Shape@")

result=arcpy.GetCount_management(allPoints)
nF=int(result.getOutput(0))

aDict={}

with arcpy.da.SearchCursor(allPoints, "SHAPE@XY") as cursor:
arcpy.SetProgressor("step", "", 0, nF,1)
for row in cursor:
arcpy.SetProgressorPosition()
v=row[0]
if aDict.has_key(v):continue
aDict[v]=1
curT.insertRow((row[0],))

It confirms @Vince point.



Use fastest drive to store input and output, network drives=NO GO. I suggest to run it from mxd or ArcCatalog, you can watch progress and cancel at any time.


UPDATE TO HANDLE TOLERANCE:


import arcpy, time,os
from arcpy import env
env.overwriteoutput=True

def truncate(f, n):
s = '{}'.format(f)
i, p, d = s.partition('.')
return '.'.join([i, (d+'0'*n)[:n]])


allPoints=r'C:\...Data\ScratchFolder\POINTs.shp'
outFC=r'C:\...Data\ScratchFolder\OUTPUT\Block_00000.shp'
curT=arcpy.da.InsertCursor(outFC,"Shape@")
result=arcpy.GetCount_management(allPoints)
nF=int(result.getOutput(0))
aDict={}

with arcpy.da.SearchCursor(allPoints, "SHAPE@XY") as cursor:
arcpy.SetProgressor("step", "", 0, nF,1)

for row in cursor:
arcpy.SetProgressorPosition()
x,y=row[0]
v='%s_%s' %(truncate(x,3),truncate(y,3))
if aDict.has_key(v):continue
aDict[v]=1
curT.insertRow((row[0],))

This will consider points to be the same if their coordinates are identical to 3 decimal places. I badly want to hope that you at least working with projected data


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...