Monday 30 March 2015

arcpy - Better approach for selecting related records in multiple featureclasses based on query of related table? (sde/SQL server)


I have an enterprise geodatabase (SQL) that contains a table ("linktable") which has a key field (Main_ID) that is also in up to a dozen related feature classes (via a relationship class).


I want to run a script in Arctoolbox that allows the user to specify a query statement on the linktable. This would then gather the Main_IDs of the resulting records and go through the feature classes open in an mxd one by one looking for the related Main_ID field in each of them. I want the related records in each feature class selected. The user can pick the selection method (new, add_to, remove, etc).


There are over 180,000 records in linktable. Potentially tens of thousands in each fc.


I came up with a method 1-2 years ago that involved creating a long SQL statement with IN on each fc. It works up until that list gets way too long.
It was suggested I repost this issue here and try to solve it using more db-centric ideas. I'm just not a full-time developer and my python only goes so far. I would love to find a more efficient solution. I'm posting the current, inefficient code here as well as a link to my most recent post on this problem


import arcpy, os, string
import arcpy.mapping as MAP

#all relevant fcs as well as Link table should be loaded into current mxd

mxd = MAP.MapDocument("CURRENT")
df = MAP.ListDataFrames(mxd)[0]

#Get Link tables from input
linkTable = arcpy.GetParameterAsText(0)
#Make lists of all feature layers in dataframe
layers = MAP.ListLayers(mxd, "", df)


#get sql expression (sql calculator, query of linktable)

sqlExp = arcpy.GetParameterAsText(1)
arcpy.AddMessage(sqlExp)

#set selection type (new, add_to, etc)
selectType = arcpy.GetParameterAsText(2)

#key field contains Main_IDs that potentially relate Link table to fcs
linkKeyField = "Main_ID"

#Main code to select all records in fc with matching Main_ID in Linktable

if len(layers) > 0:
# Set the SearchCursor to look through the selection of the linkTable
sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])
# Add DBMS-specific field delimiters
fieldDelimited = arcpy.AddFieldDelimiters(arcpy.Describe(linkTable).path, linkKeyField)
# Add single-quotes for string field values
valueList = ["'%s'" % value for value in sourceIDs]
# Format WHERE clause in the form of an IN statement
whereClause = "%s IN(%s)" % (fieldDelimited, ', '.join(map(str, valueList)))
arcpy.AddMessage("SQL Clause: {}".format(whereClause))

for lyr in layers:
if len(arcpy.ListFields(lyr, "Main_ID")) > 0:
# Process: Select Layer By Attribute
arcpy.AddMessage("Querying related records in {0}".format(lyr))
arcpy.SelectLayerByAttribute_management(lyr, selectType, whereClause)

else:
arcpy.AddMessage("No availble layers for selection.")
sys.exit(0)


del sqlExp, mxd, df, linkTable, linkKeyField, layers

Answer



You're not letting the RDBMS do its job. There's no reason to use any cursors to do what is a standard part of the SQL language: The subquery.


I haven't tested this in any way, just commented out the unnecessary code and added one statement (on two lines):


import arcpy, os, string
import arcpy.mapping as MAP

#all relevant fcs as well as Link table should be loaded into current mxd
mxd = MAP.MapDocument("CURRENT")
df = MAP.ListDataFrames(mxd)[0]


#Get Link tables from input
linkTable = arcpy.GetParameterAsText(0)
#Make lists of all feature layers in dataframe
layers = MAP.ListLayers(mxd, "", df)


#get sql expression (sql calculator, query of linktable)
sqlExp = arcpy.GetParameterAsText(1)
arcpy.AddMessage(sqlExp)


#set selection type (new, add_to, etc)
selectType = arcpy.GetParameterAsText(2)

#key field contains Main_IDs that potentially relate Link table to fcs
linkKeyField = "Main_ID"

#Main code to select all records in fc with matching Main_ID in Linktable
if len(layers) > 0:
## # Set the SearchCursor to look through the selection of the linkTable

## sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])
## # Add DBMS-specific field delimiters
## fieldDelimited = arcpy.AddFieldDelimiters(arcpy.Describe(linkTable).path, linkKeyField)
## # Add single-quotes for string field values
## valueList = ["'%s'" % value for value in sourceIDs]
## # Format WHERE clause in the form of an IN statement
## whereClause = "%s IN(%s)" % (fieldDelimited, ', '.join(map(str, valueList)))

whereClause = "{:s} IN (SELECT {:s} FROM {:s} WHERE {:s})".format(
linkKeyField,linkKeyField,linkTable,sqlExp)


arcpy.AddMessage("SQL Clause: {}".format(whereClause))
for lyr in layers:
if len(arcpy.ListFields(lyr, "Main_ID")) > 0:
# Process: Select Layer By Attribute
arcpy.AddMessage("Querying related records in {0}".format(lyr))
arcpy.SelectLayerByAttribute_management(lyr, selectType, whereClause)

else:
arcpy.AddMessage("No available layers for selection.")

sys.exit(0)

del sqlExp, mxd, df, linkTable, linkKeyField, layers

You should of course make sure that each feature class has an index on the link column, and that the link table has indexes on likely query columns from the user expression.


Your code should verify that the sqlExp is not empty (SQL syntax error)


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...