Saturday 31 August 2019

arcpy - Multiprocessing Errors - ArcGIS implementation


I was wondering if anyone else in the community here has attempted to use multi-processing for spatial analyses. Namely I am trying to iterate through a series of rasters, create a multiprocessing job for each and run them through a number of geoprocessing steps within one def function. Something along the lines of


def net(RasterImage, OutFolderDir):
arcpy.env.overwriteOutput = True
arcpy.env.workspace = OutFolderDir
DEM_Prj = DEM_Prj.tif

try:
arcpy.ProjectRaster_management(RasterImage, DEM_Prj....

FocalStatistics(DEM_prj....)
...

if __name__ == '__main__':
InputFolder = r'C:\test\somepath'
Output = r'C:\test\somepath2'
arcpy.env.workspace = InputFolder
arcpy.env.scratchWorkspace = r'C:\test.gdb'

fcs = arcpy.ListRasters('*')

pool = multiprocessing.Pool(4)
jobs = []
for fc in fcs:
rIn = os.path.join(InputFolder,fc)
rOut = os.path.join(Output,fc[:-4])
jobs.append(pool.apply_async(net,(rIn, rOut)))

Now the multiprocessing does run, usually for the first batch! However, I keep running into several different errors when attempting several datasets(more than 4 files - i.e. 4 core multiprocessing) including:


ERROR 010302: Unable to create the output raster: C:\somepath\sr6f8~1\FocalSt_srtm1
ERROR 010067: Error in executing grid expression.

Failed to execute (FocalStatistics).

and


ERROR 999999: Error executing function.
Failed to copy raster dataset
Failed to execute (ProjectRaster)

Notice in the first error the strange folder that is created (in the OutFolderDir location) associated with the focal statistics that nearly creates an exact replica of the final output.


My question is based off your experience is it impossible to create several step geoprocessing within one multiprocessing function? Or do I need to tile these steps into their individual geoprocessing steps?


UPDATE



Still encoutering similar errors - moving the import functions to the def function has shown that


import arcpy 
from arcpy.sa import *

cannot create an output with an added syntax warning that of import * is not allowed.


UPDATE #2


I know this is a late reply but I thought it might benefit someone else for future reference to my workaround that allows multiprocessing to work with arcpy. The main problem I found after returning to this problem is not the competition of the arcpy modules but rather competition over the scratchWorkspace that the ArcObjects utilize to save the temporary files. Therefore consider running a counter into the multiprocessing parsing argument to make a unique scratchWorkspace for each process i.e.


Counter = 0 
for fc in fcs:
rIn = os.path.join(InputFolder,fc)

rOut = os.path.join(Output,fc[:-4])
jobs.append(pool.apply_async(net,(rIn, rOut,Counter)))
Counter += 1

Then in the main function make a specific temporary directory and assign an unique scratchWorkspace to each multiprocessing task.


def main(RasterImage,OutFolderDir,Counter)      
TempFolder = os.path.join(os.path.dirname(OutFolderDir),'Temp_%s'% (Counter))
os.mkdir(TempFolder)
arcpy.scratchWorkspace = TempFolder
...


Hope that helps and thanks to Ragi for the inital suggestion to use separate temp workspaces - still baffled by why it originally did not work.


Additional Resources


ESRI Multiprocessing Blog


Python,Gis and Stuff Blog



Answer



Each IWorkspace connection (i.e each database connection) has thread affinity. Two threads cannot share the same workspace. You can have one thread own the resource and then sync the access, but if you are going to be using straight gp functions, then that is even not an option.


The easiest (lame) way is to create separate processes and then do multi process synchronization (as opposed to multithread synchronization). Even then you should be aware of the underlying workspace type. if you are not using arcsde (a multi-user datasource) you will probably use a single user datasource (like personal or filegdb). Then remember that means only one process can write at a time! The typical (lame) synchronization for these scenarios is that each parallel process writes to a different temp workspace and then you merge it all in to your destination workspace in a single process.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...