I'm looking for duplicate records in dbf files based upon the attribute called 'ID'. I have various dbf files from 500,000 records to 1.5 million and I know there are a host of duplicates.
I would like to add a field 'Duplicate' that says Yes or No (or 1 or 0 is fine) when the ID attribute is present elsewhere. Using the following python script in Field Calculator returns 1 for a duplicate entry and 0 for unique entry;
uniqueList = []
def isDuplicate(inValue):
if inValue in uniqueList:
return 1
else:
uniqueList.append(inValue)
return 0
isDuplicate(!FIELD_NAME!)
However, the 1st record of, for example, 5 duplicate IDs will also be returned as a 0 (the subsequent 4 are considered the duplicates). I would need all 5 to be marked as duplicate as the ID exists elsewhere.
Using the following code will give you an incremental count of how many times that ID occurs with 1 meaning the 1st occasion and so forth;
UniqueDict = {}
def isDuplicateIndex(inValue):
UniqueDict.setdefault(inValue,0)
UniqueDict[inValue] += 1
return UniqueDict[inValue]
isDuplicateIndex( !YOUR_FIELD! )
I just want a 1 (or Yes) if the ID of that record exists elsewhere! (ArcGIS version 10.1)
I have seen other answers such as Python script for identifying duplicate records (follow up) but it doesn't quite work.
Answer
An alternative solution is to use the existing "summary statistics" tool in ArcGIS, then you join the resulting table based on you ID field. The duplicates will have a "COUNT" larger than 1, so it is then simple to calculate it with your field calculator.
No comments:
Post a Comment