My question is similar to this one which was answered successfully: Calculating Percentiles in ArcMap?
My addition is how do I batch this solution for multiple files or apply a group-by feature to this code?
I have a large dataset with HUC14 watershed classifications and groundwater recharge values. I would like to determine the quintile that each record is in for each HUC14. So, first the data needs to be grouped by HUC14 and then the percentile calculated. Right now I have the data as a single feature class (feature layer) and also as separate files (shapefiles), so either a batch solution or a grouping solution would be ok.
EDIT: I wanted to add the code that I finally used based heavily on @Farid Cher 's response.
import arcpy
import numpy as np
import os
#loop through all Shapefile in a folder and call the CalcPercentile method
workspace = "X:\Jocelyn\GWR\HUC14b"
walk = arcpy.da.Walk(workspace, datatype="FeatureClass")
for dirpath, dirnames, filenames in walk:
for filename in filenames:
featureClass = os.path.join(dirpath, filename)
#First add the percentile Rank Field
arcpy.AddField_management(featureClass, "PercRank", "LONG", 5, "", "","", "NULLABLE")
inputFeatureClass = featureClass
#Creates a feature layer to allow for the selection and removal (using switch selection) of records that have a GSR code of 999 (water and wetlands) or no soil group associated with it.
#These records should not be used when calculating the percentile rank.
FeatureLayer = arcpy.MakeFeatureLayer_management (inputFeatureClass, "temp_lyr")
FL_Selection1 = arcpy.SelectLayerByAttribute_management (FeatureLayer,"NEW_SELECTION", """ "SoilGroup" = ' ' OR "GSR32_code" = 999""")
FL_Selection2 = arcpy.SelectLayerByAttribute_management (FL_Selection1,"SWITCH_SELECTION")
#Only uses the selected features in the CalcPercentile function
CalcPercentile(inputFeatureClass)
#Deletes the temporary feature layer to avoid cluttering and space issues.
arcpy.Delete_management(FeatureLayer)
def CalcPercentile(inputFeatureClass):
arrp = arcpy.da.FeatureClassToNumPyArray(FL_Selection2, ('GWR_V'))
arr = np.array(arrp,np.float)
#to create 5 ranks
p1 = np.percentile(arr, 20) # rank = 1 (lands that provide the lowest volume recharge within the HUC14)
p2 = np.percentile(arr, 40) # rank = 2
p3 = np.percentile(arr, 60) # rank = 3
p4 = np.percentile(arr, 80) # rank = 4
p5 = np.percentile(arr, 100)+1 # rank = 5 (lands that provide the highest volume recharge within the HUC14)
#Print the quintile breaks that are calculated.
print "p1=%s" % p1
print "p2=%s" % p2
print "p3=%s" % p3
print "p4=%s" % p4
print "p5=%s" % p5
#use cursor to update the new rank field
with arcpy.da.UpdateCursor(inputFeatureClass , ['GWR_V','PercRank']) as cursor:
for row in cursor:
if row[0] < p1:
row[1] = 1 #rank 1
elif p1 <= row[0] and row[0] < p2:
row[1] = 2
elif p2 <= row[0] and row[0] < p3:
row[1] = 3
elif p3 <= row[0] and row[0] < p4:
row[1] = 4
else:
row[1] = 5
cursor.updateRow(row)
Answer
I would recommend the batch solution, because no data selection (Group By) is necessary and will run faster. By integrating the answer of that question and loop over shapefiles you can achieve your goal.
Just modify the code as necessary: - edit the number of ranks, - shapefile folder and so on
Code:
import arcpy
import numpy as np
import os
#loop through all Shapefile in a folder and call the CalcPercentile method
shpFolder = "c:/data/MyShapeFiles"
walk = arcpy.da.Walk(workspace, datatype="FeatureClass")
for dirpath, dirnames, filenames in walk:
for filename in filenames:
featureClass = os.path.join(dirpath, filename)
#First add the percentile Rank Field
arcpy.AddField_management(featureClass, "PerRank", "LONG", 5, "", "","", "NULLABLE")
CalcPercentile(inputFeatureClass)
def CalcPercentile(inputFeatureClass):
arr = arcpy.da.FeatureClassToNumPyArray(inputFeatureClass, ('population_density'))
##to create 3 rank for example
p1 = np.percentile(arr, 33) # rank = 0
p2 = np.percentile(arr, 67) # rank = 1
p3 = np.percentile(arr, 100) # rank = 2
#use cursor to update the new rank field
with arcpy.da.UpdateCursor(inputFeatureClass , ['population_density','PerRank']) as cursor:
for row in cursor:
if row[0] < p1:
row[1] = 0 #rank 0
elif p1 <= row[0] and row[0] < p2:
row[1] = 1
else:
row[1] = 2
cursor.updateRow(row)
No comments:
Post a Comment