Monday 18 February 2019

Improve the speed at which arcpy creates a layer object


Is there something happening on the back-end, that makes the process below so slow, that can be avoided to speed it up?


It is taking between 5 and 7 seconds to turn one layer file (.lyr) into a layer object using layer = arcpy.mapping.Layer(path). This is painfully slow when I have over 2,500 layers that I need to convert to layer objects so that I can access their datasetName property.


Within a directory of layers, I want to be able to search for all layers that point to a given dataset in our SDE. For example, NYC's Planimetrics feature classes contain the word "NYCMAP," and I want to search for all layer files that are currently pointing to any feature classes containing that word.


My current script actually works fine, but it's painfully slow. So I decided to break it down line by line to try to find/understand the snag. It didn't take long for me to find it...


for dirname, dirnames, filenames in os.walk(layersFolder):

for filename in filenames:
path = os.path.join(dirname, filename)
filepath, extension = os.path.splitext(path)


if extension == ".lyr":
# slow down happening here:
layer = arcpy.mapping.Layer(path)
print layer

If I run these few lines before converting to a layer object, I run through over 2,500 layers in about 15 seconds. When I include the layer = arcpy.mapping.Layer(path) line, it slows to about 3 1/2 hours.


I have tried using arcpy.walkinstead of os.walk, and this has no effect. I have also tried copying the layers directory and my script to my C:\ (as opposed to having both on our network), but this also has no effect.


Using Python 2.7.10 and ArcGIS 10.4





UPDATE:


I thought I would include my entire working code. So this code currently takes about 3 hours and 15 minutes to go through just under 3,000 layer files. Rather than search for a specific keyword each time, I simply output the entire directory of layers into an Excel doc, and then just search inside of that doc. I will eventually schedule the script to run about once a week or so. Someone suggested created a database table and using SQL to query, which is an interesting idea that I may explore.


import sys
sys.path.append(r'somePath') # Looking in our Python Library for the xlsxwriter module
import arcpy, os, xlsxwriter, string
from time import strftime
from datetime import datetime

# Set time & date variables
startTime = datetime.now() # Used at end of script to calculate the time it took the script to run

dateStr = strftime('%m/%d/%Y %H:%M') # Variable for the current date/time (printed below)
dateNameStr = strftime('%Y%m%d') # Used later in output file name
timeStr = strftime('%H%M') # Used later in output file name
print "{}\n".format(dateStr)

outputFileFolder = r'somePath'

# ----------------------------------------------------------------------
# Functions
# ----------------------------------------------------------------------

def getColumnList(someList):
'''Creates a list of Excel columns (Note: this function is restricted to 26 columns (i.e. A->Z))'''
columnList = []
n = 0
alphaList = list(string.ascii_uppercase)
someListLen = len(someList)
while n <= someListLen:
columnList.append(alphaList[n])
n += 1
return columnList


# ----------------------------------------------------------------------
# Walk through layers and get layer attributes
# ----------------------------------------------------------------------
# Layers folder
layersFolder = r'somePath'
print "Collecting attributes from all layers in {}\n".format(layersFolder)

layersList = []
walk = arcpy.da.Walk(layersFolder, datatype="Layer")

for dirpath, dirnames, filenames in walk:
for filename in filenames:
print filename
path = os.path.join(dirpath, filename)
layer = arcpy.mapping.Layer(path)

layerList = []

if layer.isGroupLayer:
for sublayer in layer:

layerList.append(sublayer)
else: layerList.append(layer)

for singleLayer in layerList:

if singleLayer.supports("datasetName"):
layerInfo = (os.path.split(path)[0], os.path.split(path)[1], str(singleLayer), singleLayer.datasetName)
print str([str(item) for item in layerInfo])
layersList.append(layerInfo)


print

# ----------------------------------------------------------------------
# Create Excel doc using XLSXWriter
# ----------------------------------------------------------------------
print("Creating Excel doc...\n")

# Create Excel doc
workbook = xlsxwriter.Workbook(os.path.join(outputFileFolder, "Layers_and_Datasets_{}_{}.xlsx".format(dateNameStr, timeStr)))
worksheet = workbook.add_worksheet()


# Format the worksheet
worksheet.set_column('A:A', 50) # Set column width
worksheet.set_column('B:B', 50) # Set column width
worksheet.set_column('C:C', 50) # Set column width
worksheet.set_column('D:D', 50) # Set column width
bold = workbook.add_format({'bold': True}) # Create bold format object for column headers

# Column headers/column list
headers = ("Layer Path", "Layer Name", "Sublayer Heirarchy", "Dataset Name")

columnList = getColumnList(headers)

# Write headers in cells A1, B1, C1, & D1
i = 0
for header in headers:
worksheet.write(columnList[i] + "1", str(headers[i]), bold)
i += 1

# Sort the layersList list
sortedlayersList = sorted(list(layersList), key=lambda layerPath: layerPath[0])


# Write all other rows, beginning in row 2
row = 2
for layerRow in sortedlayersList:
attr = 0 # "attr" stands for layer attribute
for lyrAttr in layerRow:
worksheet.write(str(columnList[attr])+str(row), lyrAttr)
attr += 1
row += 1


# Close workbook
workbook.close()

# ----------------------------------------------------------------------
# Finish script
# ----------------------------------------------------------------------
scriptTime = datetime.now() - startTime
print "{}\n".format(scriptTime)

print 'Script Complete'


Answer



Short answer


Creating an arcpy.mapping.Layer object for a .lyr file that references an SDE feature class when running a Python script not on the same machine where you have your SDE geodatabase takes several seconds. The performance is poor because it takes a longer time to access the SDE geodatabase over the network.


Solution: run your Python script on the same machine where you have your SDE geodatabase. Then it takes fractions of a second for every .lyr file.


Long answer


I have done a simple benchmark to see if creating of Layer objects can be sped up. And it seems as you cannot do anything about it really.


The code to run:


## -*- coding: UTF-8 -*-
from __future__ import print_function


import os
import sys
print(sys.version)
import arcpy
import time
from functools import wraps

def report_time(func):
'''Decorator reporting the execution time'''
@wraps(func)

def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(func.__name__, round(end-start,3))
return result
return wrapper

@report_time
#----------------------------------------------------------------------

def get_filepath(input_folder):
for dirname, dirnames, filenames in os.walk(input_folder):
for filename in filenames:
path = os.path.join(dirname, filename)
filepath, extension = os.path.splitext(path)
if extension == ".lyr":
# slow down happening here:
print(filepath)

layer = create_lyr(path)

get_dataset_name(layer)

@report_time
#----------------------------------------------------------------------
def create_lyr(path):
if 'Continuum' in sys.version:
layer = arcpy.mp.LayerFile(path)
else:
layer = arcpy.mapping.Layer(path)
return layer


@report_time
#----------------------------------------------------------------------
def get_dataset_name(layer):
if 'Continuum' in sys.version:
l = layer.listLayers()[0]
print(l.dataSource)
else:
print(layer.datasetName)


get_filepath(r'C:\GIS\Temp\lyrs_remote_sde_few_features')

The summary of the tests I've done are below:



  • DBMS: SQL Server 2012

  • Enterprise geodatabase (aka SDE): 10.4.1

  • ArcGIS Desktop 10.4.1


The results:




  • For Python 32bit/64bit:

  • Same performance (fractions of seconds) to create a Layer object when the .lyr files refer to either local SQL Server database or when the script is run on the remote machine with the remote database.

  • Same performance (4-6 seconds) to create a Layer object when the .lyr files refer to remote SQL Server database and the script is run on the local machine.


The same performance was observed when running code:



  • in ArcMap Python window;

  • from cmd;

  • as a script tool in ArcMap;

  • using ArcGIS Pro Python 3.5 64bit


  • using arcpy.Describe(path).table.name


I have even written an ArcObjects script to see if this was an arcpy overhead:


from comtypes.client import GetModule, CreateObject
from snippets102 import GetStandaloneModules, InitStandalone

GetStandaloneModules()
InitStandalone()

esriCarto = GetModule(r"C:\Program Files (x86)\ArcGIS\Desktop10.4\com\esriCarto.olb")


layerFile = CreateObject(esriCarto.LayerFile,interface=esriCarto.ILayerFile)
lyrs = ["C:\GIS\Temp\lyrs_remote_sde_many_features\Parc.lyr",
r"C:\GIS\Temp\lyrs_local_sde\Park_boundary.lyr",
"C:\GIS\Temp\lyrs_remote_sde_few_features\Work.lyr"]

for lyr_file in lyrs:
layerFile.Open(lyr_file)
lyr = layerFile.Layer
desc = lyr.QueryInterface(esriCarto.IFeatureLayer)

print desc.DataSourceType
#u'SDE Feature Class'
print desc.FeatureClass.AliasName
#sqlgdb.dbo.Park

And it takes the same amount of time (4-6 secs) to create a Layer object with ArcObjects. Again, it is always slow when accessing a remote database, but super fast when the Python code is run on the same machine where the database is hosted.


So, you either:



  • accept that it takes a lot of time to run the code and schedule it to run off work hours;

  • run your Python script on the machine where the SDE database is hosted;


  • pre-generate the .lyr files metadata and use it later for lookup.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...