Friday 26 June 2015

arcgis 10.0 - What is the most efficient way to search a geodatabase for NULL-like records?



The problem: I have a geodatabase with several datasets and many more feature classes within. The fields within the feature classes have been populated through joins with shapefiles and manual edits. Often times string fields will become populated with whitespace (i.e. '', ' ', ' ', etc) or the string "Null", and numeric fields will become populated with a zero (0). I would like to find these records and replace them with a true NULL value. I have the following code, which uses an UpdateCursor, but it still very slow and doesn't catch all of the NULL-like records. Does anyone know of other ways to accomplish this task?


GDB = arcpy.GetParameterAsText(0) #input geodatabase
arcpy.env.workspace = GDB
datasetList = arcpy.ListDatasets() #list datasets

for dataset in datasetList:
arcpy.env.workspace = os.path.join(GDB, dataset)
fcList = arcpy.ListFeatureClasses()
for fc in fcList:
arcpy.AddMessage("Processing %s..." % fc)

#count features
arcpy.MakeTableView_management(fc, "tempTableView")
count = int(arcpy.GetCount_management("tempTableView").getOutput(0))
if count > 0:
fieldList = arcpy.ListFields(fc)
for field in fieldList:
arcpy.AddMessage("...%s" % field.name)
rows = arcpy.UpdateCursor(fc)
for row in rows:
count = 0

if row.isNull(field.name):
continue # if already null
elif field.type == "Text":
value = row.getValue(field.name)
if value.lstrip(' ') == '' or value.lower() == '': # looks for whitespace or ''
row.setNull(field.name)
count += 1
elif field.type == "ShortInteger" or field.type == "LongInteger" or field.type == "Float" or field.type == "Double":
value = row.getValue(field.name)
if value == 0:

row.setNull(field.name)
count += 1
if count > 0: # update row if records have changed
rows.updateRow(row)
del rows
else:
arcpy.AddMessage("...NO RECORDS FOUND.")

Answer



Since I'm most familiar with 10.1+, and cursors in general are a lot better in the da module, here's a potential solution. Currently, you are creating a cursor each time you change fields, which means you are taking a hit there. Furthermore, you are checking the field type for each record instead of just using the field type once to filter initially.


I've changed how Null values are checked, but I haven't thoroughly tested it to check for all possible values. For the small sample dataset I had, it worked @ 10.2.2.



#Return None if the value needs to be changed, else return the value
def nullify(value):
x = value
if value is not None: #True null fields are read as None types
if type(value) == str:
if value.lstrip(' ') == '' or value.lower() == '':
x = None
else:
if value == 0:
x = None


return x



#We're only interested in some fields
ftypes = ("String", "SmallInteger", "Integer", "Double")
fieldList = [f.name for f in arcpy.ListFields(fc) if f.type in ftypes]

with arcpy.da.UpdateCursor(fc, fieldList) as rows:
for row in rows:

nulled = map(nullify, row)
if row != nulled: #Only update if the row actually needs to be changed.
rows.updateRow(nulled)

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...