Friday, 28 April 2017

arcgis 10.1 - Optimizing arcpy.da.Walk performance?


I'm trying to automate a workflow that currently costs a co-worker about 45 minutes. General run down is this person copies files from a network drive onto a local one, and process it through four separate ArcGIS models, the code below does not comprise all of the geoprocesses that are run against these files, but the point in which the bog down is happening. As it stands, each model takes about 12 minutes to run. The code below works, but it literally takes about an hour and a half on the same exact data. I'm using arcpy.da.Walk because I figured I could save all the clicks and just run it in the background. Pretty new to coding, but been doing it about 2 years - could it be the fnmatch? My gut tells me it's the walk, and building the list - wondering if there is an alternative.


I have attempted to replicate the workflow (copying the directories locally, and not over the network).


def main():
try:
import arcpy, os, sys, fnmatch, traceback
arcpy.env.overwriteOutput = True
rootDir = arcpy. GetParameterAsText(0)
outDir = arcpy.GetParameterAsText (1)

mssn = '"12"'
sensor = '"Geo"'
dissolveFld = ["Date"]
search = "*pan*.shp"
outName = "test"

shpMergeLst = []
for root, dir, files in arcpy.da.Walk(rootDir):
for filename in fnmatch.filter(files, search):
shpMergeLst.append(os.path.join(root, filename))


if shpMergeLst:
outShp = os.path.join(outDir, os.path.basename(outName[0][:-4]) + "_Merge.shp")
arcpy.Merge_management(shpMergeLst, outShp)

Don't have system specs, but currently running ArcGIS 10.1 Advanced



Answer



I agree with @Paul above - the best way to optimise this code is to ensure that it never runs. If you're happy to install third party packages I'd also take a look at the the formic package, which is a



... Python implementation of Apache Ant FileSet and Globs including the directory wildcard **




(Credit to this answer on SO for pointing me at the library)


In the same vein as the answer above you can use formic to build your file matcher like so:


import formic

t3a = time.time()
res3 = list(formic.FileSet(include="*pan*.shp", exclude=["**/*.gdb/"]))
t3b = time.time()
print("Formic walker:\t\t{}".format(t3b-t3a))


assert(res1==res2==res3)

And on my machine I get the three results for times:


Native walker:    333.040999889
Optimized walker: 277.563000202
Formic walker: 0.770999908447

Do note that the substantial difference comes from Formic not checking whether or not the file is actually a valid shapefile (and this certainly won't work for geodatabases). The idea behind this is that it's Easier to Ask for Forgiveness than Permission. Still, that doesn't always hold out with ArcGIS, so you may find it easier to filter the function as the paths are generated:


def check_formic(maindir, searchpat):
for path in formic.FileSet(

include=searchpat, exclude=["**/*.gdb/"],
directory=maindir
):
if arcpy.Describe(path).dataType == "ShapeFile":
yield path

t4a = time.time()
res4 = list(gen4)
t4b = time.time()
print("Formic with type checking:\t\t{}".format(t4b-t4a))


Which gives:


Formic with type checking:              30.4530000687

Still faster than using the arcpy.da.Walk method by a significant margin.


Note All my tests were done in a folder structure with 43 shapefiles in various places.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...