I have seen some difficult ways to do parallel processing, but I wonder if it is possible to simply execute multiple process of the same ArcPy script at the same time.
My script makes some changes to the default geodatabase, so I thought of making a geodatabase copy for each process.
I have updated the script to copy the shared resources between the processes, so it copies the geodatabase and the mxd's related to it.
I have made a test to parallelize, using this script:
pool = multiprocessing.Pool(2)
pool.map(test_func, [1, 2] , 1)
pool.close()
I noticed when I browse RAM and CPU use, every process consumes 200Mb. So, if I have 6 Gb of RAM, I think I have to exploit 5Gb RAM of it by enlarging the pool size to:
5000 / 200 = 25
So, to exploit the whole power of the machine, I think I should use 25 as pool size.
I need to know if this is the best manner, or how I could measure the efficiency of this parallelization.
This an example of the code that I'm trying to parallelize. The whole script contains 1500 lines of code almost like this one:
def dora_layer_goned():
arcpy.Select_analysis( "layer_goned" , "layer_goned22" )
arcpy.MakeFeatureLayer_management("layer_goned", "layer_goned_lyr")
arcpy.SelectLayerByLocation_management("layer_goned_lyr" ,"WITHIN", "current_parcel" , "" , "NEW_SELECTION")
arcpy.SelectLayerByAttribute_management("layer_goned_lyr" , "SWITCH_SELECTION" )
arcpy.Select_analysis("layer_goned_lyr" , "layer_goned_2_dora2" )
arcpy.Clip_analysis("layer_goned_lyr" , "current_parcel_5m_2" , "layer_goned_2_dora" )
arcpy.SelectLayerByAttribute_management("layer_goned_lyr" , "CLEAR_SELECTION" )
arcpy.FeatureToPoint_management("layer_goned_2_dora","layer_goned_2_dora_point","CENTROID")
arcpy.MakeFeatureLayer_management("layer_goned_2_dora2", "layer_goned_2_dora_lyr")
arcpy.SelectLayerByLocation_management("layer_goned_2_dora_lyr" ,"INTERSECT", "layer_goned_2_dora_point" , "" , "NEW_SELECTION")
arcpy.DeleteFeatures_management("layer_goned_2_dora_lyr")
arcpy.FeatureVerticesToPoints_management("current_parcel","current_parcel__point", "ALL")
arcpy.FeatureVerticesToPoints_management("carre_line","carre_line__point", "ALL")
arcpy.CalculateField_management("current_parcel__point","id","!objectid!","PYTHON_9.3")
arcpy.SpatialJoin_analysis("carre_line__point" , "current_parcel__point" , "carre_line__point_sj","JOIN_ONE_TO_ONE" , "KEEP_COMMON" , "" , "CLOSEST")
arcpy.Append_management("current_parcel__point" , "carre_line__point_sj" , "NO_TEST") #
arcpy.PointsToLine_management("carre_line__point_sj", "carre_line__point_sj_line", "id")
arcpy.Buffer_analysis("carre_line__point_sj_line" , "carre_line__point_sj_line_buf" , 0.2)
arcpy.Erase_analysis("layer_goned_2_dora2" , "carre_line__point_sj_line_buf" , "layer_goned_2_dora_erz")
arcpy.MultipartToSinglepart_management("layer_goned_2_dora_erz" , "layer_goned_2_dora_erz_mono")
arcpy.MakeFeatureLayer_management("layer_goned_2_dora_erz_mono", "layer_goned_2_dora_erz_lyr")
arcpy.SelectLayerByLocation_management("layer_goned_2_dora_erz_lyr" ,"SHARE_A_LINE_SEGMENT_WITH", "current_parcel" , "" , "NEW_SELECTION")
arcpy.SelectLayerByLocation_management("layer_goned_lyr" ,"CONTAINS", "layer_goned_2_dora_erz_lyr" , "" , "NEW_SELECTION")
arcpy.DeleteFeatures_management("layer_goned_lyr")
arcpy.Append_management("layer_goned_2_dora_erz_lyr" , "layer_goned" , "NO_TEST") #
arcpy.SelectLayerByAttribute_management("layer_goned_2_dora_erz_lyr" , "CLEAR_SELECTION" )
Answer
See this blog post, it should cover it
http://blogs.esri.com/esri/arcgis/2012/09/26/distributed-processing-with-arcgis-part-1/
No comments:
Post a Comment