Friday, 9 September 2016

python - Finding duplicate records in field using ArcGIS for Desktop?


I'm looking for duplicate records in dbf files based upon the attribute called 'ID'. I have various dbf files from 500,000 records to 1.5 million and I know there are a host of duplicates.


I would like to add a field 'Duplicate' that says Yes or No (or 1 or 0 is fine) when the ID attribute is present elsewhere. Using the following python script in Field Calculator returns 1 for a duplicate entry and 0 for unique entry;


uniqueList = []

def isDuplicate(inValue):
if inValue in uniqueList:
return 1
else:
uniqueList.append(inValue)
return 0
isDuplicate(!FIELD_NAME!)

However, the 1st record of, for example, 5 duplicate IDs will also be returned as a 0 (the subsequent 4 are considered the duplicates). I would need all 5 to be marked as duplicate as the ID exists elsewhere.


Using the following code will give you an incremental count of how many times that ID occurs with 1 meaning the 1st occasion and so forth;



UniqueDict = {}
def isDuplicateIndex(inValue):
UniqueDict.setdefault(inValue,0)
UniqueDict[inValue] += 1
return UniqueDict[inValue]

isDuplicateIndex( !YOUR_FIELD! )

I just want a 1 (or Yes) if the ID of that record exists elsewhere! (ArcGIS version 10.1)


I have seen other answers such as Python script for identifying duplicate records (follow up) but it doesn't quite work.




Answer



An alternative solution is to use the existing "summary statistics" tool in ArcGIS, then you join the resulting table based on you ID field. The duplicates will have a "COUNT" larger than 1, so it is then simple to calculate it with your field calculator.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...