Saturday, 7 March 2015

qgis - How to select duplicate records based on multiple fields?


I have a feature class with two fields of interest: "name" and "identifier" (Original Data; Figure 1). Note that there are several duplicate rows in the table and the table may or may not be sorted, as it is in this example.


I'm interested in selecting, and/or removing, duplicate rows if the identifier is a duplicate within each "name" group (e.g. A, B, C...). In the example, name "C" has two identifiers "abcd", so the duplicate would need to be selected and/or removed. I'm working with ArcGIS Desktop Basic or QGIS and python. I'm thinking that referencing a dictionary or some fancy SQL expression might do the trick, but hoping there may be an easier solution. How can I find duplicate records based on multiple fields?


enter image description here


Figure 1



Answer



I was faced with a similar problem today, but I ended up using FME which was really quick and easy.


What I was thinking was an update cursor which loops through the rows. Store the row in a list. The first time the unique pair is encountered it is added to the list. Go to the next row. If the name/identifier is in your list for the next row, delete the row. Otherwise, add the name/identifier to the list.


I've just tested it and it works:



uniqueList = []
with arcpy.da.UpdateCursor(fc, ["name","identifier"]) as uCur:
for row in uCur:
if row in uniqueList:
uCur.deleteRow()
else:
uniqueList.append(row)

No comments:

Post a Comment