Improve the speed at which arcpy creates a layer object

Is there something happening on the back-end, that makes the process below so slow, that can be avoided to speed it up?

It is taking between 5 and 7 seconds to turn one layer file (.lyr) into a layer object using layer = arcpy.mapping.Layer(path). This is painfully slow when I have over 2,500 layers that I need to convert to layer objects so that I can access their datasetName property.

Within a directory of layers, I want to be able to search for all layers that point to a given dataset in our SDE. For example, NYC's Planimetrics feature classes contain the word "NYCMAP," and I want to search for all layer files that are currently pointing to any feature classes containing that word.

My current script actually works fine, but it's painfully slow. So I decided to break it down line by line to try to find/understand the snag. It didn't take long for me to find it...

for dirname, dirnames, filenames in os.walk(layersFolder):

    for filename in filenames:
        path = os.path.join(dirname, filename)
        filepath, extension = os.path.splitext(path)


        if extension == ".lyr":
            # slow down happening here:
            layer = arcpy.mapping.Layer(path)
            print layer

If I run these few lines before converting to a layer object, I run through over 2,500 layers in about 15 seconds. When I include the layer = arcpy.mapping.Layer(path) line, it slows to about 3 1/2 hours.

I have tried using arcpy.walkinstead of os.walk, and this has no effect. I have also tried copying the layers directory and my script to my C:\ (as opposed to having both on our network), but this also has no effect.

Using Python 2.7.10 and ArcGIS 10.4

UPDATE:

I thought I would include my entire working code. So this code currently takes about 3 hours and 15 minutes to go through just under 3,000 layer files. Rather than search for a specific keyword each time, I simply output the entire directory of layers into an Excel doc, and then just search inside of that doc. I will eventually schedule the script to run about once a week or so. Someone suggested created a database table and using SQL to query, which is an interesting idea that I may explore.

import sys
sys.path.append(r'somePath')            # Looking in our Python Library for the xlsxwriter module
import arcpy, os, xlsxwriter, string
from time import strftime
from datetime import datetime

# Set time & date variables
startTime = datetime.now()              # Used at end of script to calculate the time it took the script to run

dateStr = strftime('%m/%d/%Y %H:%M')    # Variable for the current date/time (printed below)
dateNameStr = strftime('%Y%m%d')        # Used later in output file name
timeStr = strftime('%H%M')              # Used later in output file name
print "{}\n".format(dateStr)

outputFileFolder = r'somePath'

# ----------------------------------------------------------------------
# Functions
# ----------------------------------------------------------------------

def getColumnList(someList):
    '''Creates a list of Excel columns (Note: this function is restricted to 26 columns (i.e. A->Z))'''
    columnList = []
    n = 0
    alphaList = list(string.ascii_uppercase)
    someListLen = len(someList)
    while n <= someListLen:
        columnList.append(alphaList[n])
        n += 1
    return columnList


# ----------------------------------------------------------------------
# Walk through layers and get layer attributes
# ----------------------------------------------------------------------
# Layers folder
layersFolder = r'somePath'
print "Collecting attributes from all layers in {}\n".format(layersFolder)

layersList = []
walk = arcpy.da.Walk(layersFolder, datatype="Layer")

for dirpath, dirnames, filenames in walk:
    for filename in filenames:
        print filename                              
        path = os.path.join(dirpath, filename)
        layer = arcpy.mapping.Layer(path)

        layerList = []

        if layer.isGroupLayer:
            for sublayer in layer:

                layerList.append(sublayer)
        else: layerList.append(layer)

        for singleLayer in layerList:

            if singleLayer.supports("datasetName"):
                layerInfo = (os.path.split(path)[0], os.path.split(path)[1], str(singleLayer), singleLayer.datasetName)
                print str([str(item) for item in layerInfo])
                layersList.append(layerInfo)


    print

# ----------------------------------------------------------------------
# Create Excel doc using XLSXWriter
# ----------------------------------------------------------------------
print("Creating Excel doc...\n")

# Create Excel doc
workbook = xlsxwriter.Workbook(os.path.join(outputFileFolder, "Layers_and_Datasets_{}_{}.xlsx".format(dateNameStr, timeStr)))
worksheet = workbook.add_worksheet()


# Format the worksheet
worksheet.set_column('A:A', 50)                 # Set column width
worksheet.set_column('B:B', 50)                 # Set column width
worksheet.set_column('C:C', 50)                 # Set column width
worksheet.set_column('D:D', 50)                 # Set column width
bold = workbook.add_format({'bold': True})      # Create bold format object for column headers

# Column headers/column list
headers = ("Layer Path", "Layer Name", "Sublayer Heirarchy", "Dataset Name")

columnList = getColumnList(headers)

# Write headers in cells A1, B1, C1, & D1
i = 0
for header in headers:
    worksheet.write(columnList[i] + "1", str(headers[i]), bold)
    i += 1

# Sort the layersList list
sortedlayersList = sorted(list(layersList), key=lambda layerPath: layerPath[0])


# Write all other rows, beginning in row 2
row = 2
for layerRow in sortedlayersList:
    attr = 0                                    # "attr" stands for layer attribute
    for lyrAttr in layerRow:
        worksheet.write(str(columnList[attr])+str(row), lyrAttr)
        attr += 1
    row += 1


# Close workbook
workbook.close()

# ----------------------------------------------------------------------
# Finish script
# ----------------------------------------------------------------------
scriptTime = datetime.now() - startTime
print "{}\n".format(scriptTime)

print 'Script Complete'

Answer

Short answer

Creating an arcpy.mapping.Layer object for a .lyr file that references an SDE feature class when running a Python script not on the same machine where you have your SDE geodatabase takes several seconds. The performance is poor because it takes a longer time to access the SDE geodatabase over the network.

Solution: run your Python script on the same machine where you have your SDE geodatabase. Then it takes fractions of a second for every .lyr file.

Long answer

I have done a simple benchmark to see if creating of Layer objects can be sped up. And it seems as you cannot do anything about it really.

The code to run:

## -*- coding: UTF-8 -*-
from __future__ import print_function


import os
import sys
print(sys.version)
import arcpy
import time
from functools import wraps

def report_time(func):
    '''Decorator reporting the execution time'''
    @wraps(func)

    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(func.__name__, round(end-start,3))
        return result
    return wrapper

@report_time
#----------------------------------------------------------------------

def get_filepath(input_folder):
    for dirname, dirnames, filenames in os.walk(input_folder):
        for filename in filenames:
            path = os.path.join(dirname, filename)
            filepath, extension = os.path.splitext(path)
            if extension == ".lyr":
                # slow down happening here:
                print(filepath)

                layer = create_lyr(path)

                get_dataset_name(layer)

@report_time
#----------------------------------------------------------------------
def create_lyr(path):
    if 'Continuum' in sys.version:
        layer = arcpy.mp.LayerFile(path)
    else:
        layer = arcpy.mapping.Layer(path)
    return layer


@report_time
#----------------------------------------------------------------------
def get_dataset_name(layer):
    if 'Continuum' in sys.version:
        l = layer.listLayers()[0]
        print(l.dataSource)
    else:
        print(layer.datasetName)


get_filepath(r'C:\GIS\Temp\lyrs_remote_sde_few_features')

The summary of the tests I've done are below:

DBMS: SQL Server 2012

Enterprise geodatabase (aka SDE): 10.4.1

ArcGIS Desktop 10.4.1

The results:

For Python 32bit/64bit:

Same performance (fractions of seconds) to create a Layer object when the .lyr files refer to either local SQL Server database or when the script is run on the remote machine with the remote database.

Same performance (4-6 seconds) to create a Layer object when the .lyr files refer to remote SQL Server database and the script is run on the local machine.

The same performance was observed when running code:

in ArcMap Python window;

from cmd;

as a script tool in ArcMap;

using ArcGIS Pro Python 3.5 64bit

using arcpy.Describe(path).table.name

I have even written an ArcObjects script to see if this was an arcpy overhead:

from comtypes.client import GetModule, CreateObject
from snippets102 import GetStandaloneModules, InitStandalone

GetStandaloneModules()
InitStandalone()

esriCarto = GetModule(r"C:\Program Files (x86)\ArcGIS\Desktop10.4\com\esriCarto.olb")


layerFile = CreateObject(esriCarto.LayerFile,interface=esriCarto.ILayerFile)
lyrs = ["C:\GIS\Temp\lyrs_remote_sde_many_features\Parc.lyr",
        r"C:\GIS\Temp\lyrs_local_sde\Park_boundary.lyr",
        "C:\GIS\Temp\lyrs_remote_sde_few_features\Work.lyr"]

for lyr_file in lyrs:
    layerFile.Open(lyr_file)
    lyr = layerFile.Layer
    desc = lyr.QueryInterface(esriCarto.IFeatureLayer)

    print desc.DataSourceType
    #u'SDE Feature Class'
    print desc.FeatureClass.AliasName
    #sqlgdb.dbo.Park

And it takes the same amount of time (4-6 secs) to create a Layer object with ArcObjects. Again, it is always slow when accessing a remote database, but super fast when the Python code is run on the same machine where the database is hosted.

So, you either:

accept that it takes a lot of time to run the code and schedule it to run off work hours;

run your Python script on the machine where the SDE database is hosted;

pre-generate the .lyr files metadata and use it later for lookup.

Blog

Monday 18 February 2019