Tuesday, 25 July 2017

arcgis desktop - arcpy with multiprocessing throws "UnpickleableError: Cannot pickle objects"


I'm attempting to speed up a process which is currently running synchronously, by using the python multiprocessing module.


I'm having trouble sending a feature layer to a function which is called by multiprocessing, as demonstrated in this simple script:


import multiprocessing, arcpy

def doProcess(lyr):
print(lyr.name)

if __name__ == '__main__':


#Create an array of feature layers
arcpy.env.workspace = "C:\Program Files (x86)\ArcGIS\Desktop10.2\TemplateData\TemplateData.gdb"
featureLayers = []
fcs = arcpy.ListFeatureClasses("*","All","World")
for fc in fcs:
arcpy.Delete_management(fc + "_lyr")
lyrName = fc + "_lyr"
arcpy.MakeFeatureLayer_management(fc, lyrName)
featureLayers.append(arcpy.mapping.Layer(lyrName))


#This works when not using multiprocessing:
for featureLayer in featureLayers:
doProcess(featureLayer)

#This fails with "UnpickleableError: Cannot pickle objects"
pool = multiprocessing.Pool()
pool.map(doProcess, featureLayers)
pool.close()
pool.join()


When iterating over the array manually, rather than using multiprocessing, the function has access to the feature layer. But when using multiprocessing, this error message is shown:



UnpickleableError: Cannot pickle type 'geoprocessing Layer object' objects



What is the correct syntax/approach to handle a feature layer within the multiprocessing environment? I based the above script on the example on the Esri blog Multiprocessing with ArcGIS



Answer



I finally found the time to look into this. I don't fully understand the "unpickleable" error message, but a workaround is to pass only strings into the multiprocessor. Something like this:


import multiprocessing, arcpy, os


def doProcess(fClass):
#This function doesn't do anything, it's just to show that accessing arcpy methods is possible
print("in do process function for " + fClass)
arcpy.env.workspace = "C:\Program Files (x86)\ArcGIS\Desktop10.2\TemplateData\TemplateData.gdb"
arcpy.Delete_management(fClass + "_lyr")
lyrName = fClass + "_lyr"
arcpy.MakeFeatureLayer_management(fClass, lyrName)
desc = arcpy.Describe(lyrName)
print("Finished " + desc.Name)


if __name__ == '__main__':

#Create an array of feature class names
arcpy.env.workspace = "C:\Program Files (x86)\ArcGIS\Desktop10.2\TemplateData\TemplateData.gdb"
fClasses = []
fcs = arcpy.ListFeatureClasses("*","All","World")
for fc in fcs:
fClasses.append(fc)

#Multiprocessing approach

pool = multiprocessing.Pool()
pool.map(doProcess, fClasses)
pool.close()
pool.join()

(Interestingly, this script takes a lot longer to complete when I use the multiprocessing approach, compared to just running:


for fClass in fClasses:
doProcess(fClass)

Presumably there's a lot more overhead in setting up the environments for each thread. Hopefully in a more complicated scenario involving long geoprocessing tasks, the payoff would be faster overall completion of all tasks.)



No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...