cursor - Iterating through records specific number at a time using ArcPy?

Thursday, 4 July 2019

cursor - Iterating through records specific number at a time using ArcPy?

I'm working with a table that has over 180,000 records. My code searches through the table and pulls field values (meeting a SQL statement) into a list. This list is used as part of a Search by Attribute SQL statement on a feature class using the IN parameter. For example, SELECT from fc WHERE field IN('value1', 'value2', etc).

Only posting the relevant code here. The problem is that the initial query might return as many as 180k+ records and thus the resulting where clause is a massive list. I've determined that if the list of values (GUIDS as strings) gets over 20,000 or so, the script fails with an "Invalid Expression" error.

The only thing I can think to do is modify this code so that it runs the first query on the table no more than 10,000 rows at a time. That should keep the IN clause manageable. Once that list is used to select features with matching field values in the fc, the next 10k would run as "Add_to_selection."

However, I'm having a hell of a time figuring the best way to manipulate this bit of code to do it that way.

Summary: Both the table and the fc have a field called "Main_ID". The code searches for those values in the table based on a user query, puts them into a list and searches on the "Main_ID" field of fcs in the mxd using a sql expression with an IN clause. I think this line needs to be altered to just do this maybe 10k records at a time:

set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])

Any help?

linkKeyField = "Main_ID"


if len(layers) > 0:

    # Set the SearchCursor to look through the selection of the linkTable
    sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])
# Add DBMS-specific field delimiters
fieldDelimited = arcpy.AddFieldDelimiters(arcpy.Describe(linkTable).path, linkKeyField)
# Add single-quotes for string field values
valueList = ["'%s'" % value for value in sourceIDs]
# Format WHERE clause in the form of an IN statement
whereClause = "%s IN(%s)" % (fieldDelimited, ', '.join(map(str, valueList)))
arcpy.AddMessage("SQL Clause: {}".format(whereClause))
for lyr in layers:

    if len(arcpy.ListFields(lyr, "Main_ID")) > 0:
        # Process: Select Layer By Attribute
        arcpy.AddMessage("Querying related records in {0}".format(lyr))
        arcpy.SelectLayerByAttribute_management(lyr, selectType, whereClause)

Answer

There's How do you split a list into evenly sized chunks? on Stack Overflow that describes 'chunking' up a list, integrating that into the code that I can see:

def chunks(l, n): # have this up the top of your code
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]


if len(layers) > 0:
    # Set the SearchCursor to look through the selection of the linkTable
    sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])

sourceChunks = list(chunks(sourceIDs,10000))

for sourceIDs in sourceChunks: # re-use the sourceIDs variable now that it's copied to sourceChunks
    # Add DBMS-specific field delimiters
    fieldDelimited = arcpy.AddFieldDelimiters(arcpy.Describe(linkTable).path, linkKeyField)

    # Add single-quotes for string field values
    valueList = ["'%s'" % value for value in sourceIDs]
    # Format WHERE clause in the form of an IN statement
    whereClause = "%s IN(%s)" % (fieldDelimited, ', '.join(map(str, valueList)))
    arcpy.AddMessage("SQL Clause: {}".format(whereClause))
    for lyr in layers:
            if len(arcpy.ListFields(lyr, "Main_ID")) > 0:
                    # Process: Select Layer By Attribute
                    arcpy.AddMessage("Querying related records in {0}".format(lyr))
                    arcpy.SelectLayerByAttribute_management(lyr, selectType, whereClause)

If the list is shorter than 10000 then the iteration occurs only once as the conversion is a list of lists, which is a list with one element of a list of ids where the length is less than 10k. Note that the last list could be very short but is guaranteed to be less than 10k elements. There is also the list comprehension [l[i:i + n] for i in range(0, len(l), n)] if you don't want to def a function.

Blog

Thursday, 4 July 2019

cursor - Iterating through records specific number at a time using ArcPy?

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?