Friday 22 July 2016

arcpy - Methods for converting a regular polygon grid to a numpy array


I have a regularly spaced model grid, 5000 by 5000 feet (517,300 cells), which I need to convert to a numpy array. I am investigation a couple of different methods, using arcpy, by which I can convert the grid to a numpy array using the values in a particular field. They are as follows:



  1. Polygon to raster conversion followed by a raster to numpy array conversion, and

  2. Iteration over the rows of the grid, pulling row/col indices from each row and assigning values to an empty array.


Neither of these methods seems to be particularly fast - is there a more direct way to use the regular geometry of my features to apply the values to an array more efficiently? I will need to run this process on multiple versions of my model to populate parameter arrays for each of the 9 layers with multiple parameters per layer. Any processing time I can gain will be helpful.



import os
import arcpy
import datetime
import numpy as np

arcpy.CheckOutExtension("Spatial")
arcpy.env.workspace = r'..\gis\geom.gdb'
arcpy.env.overwriteOutput = True

grid = r'..\gis\grid_5000ft_01.gdb\grid'


out_table = r'in_memory\{}'.format('TOP_sas')
arcpy.sa.ZonalStatisticsAsTable(grid, 'OID', 'geodas_fasft', out_table, 'DATA', 'ALL')

grid_lyr = r'in_memory\grid_lyr'
table_vwe = r'in_memory\table_vwe'
arcpy.MakeFeatureLayer_management(grid, grid_lyr)
arcpy.MakeTableView_management(out_table, table_vwe)
arcpy.AddJoin_management(grid_lyr, 'OID', table_vwe, 'OID_', 'KEEP_ALL')


print [field.name for field in arcpy.ListFields(grid_lyr)]

start = datetime.datetime.now()
arcpy.PolygonToRaster_conversion(grid_lyr, '{}.MEAN'.format('TOP_sas'), r'in_memory\ras', '#', '#', 5000.)
a = arcpy.RasterToNumPyArray(r'in_memory\ras')
print (datetime.datetime.now() - start) / 60.

b = np.zeros((739, 700), dtype=np.float)
start = datetime.datetime.now()
with arcpy.da.SearchCursor(grid_lyr, ['grid.row', 'grid.col', '{}.MEAN'.format('TOP_sas')]) as cur:

for row in cur:
(r, c, val) = row
a[r-1, c-1] = val
print (datetime.datetime.now() - start) / 60.

a.savetxt('{}.ref'.format('TOP_sas_a'))
b.savetxt('{}.ref'.format('TOP_sas_b'))

Answer



Update (5/14/2015)


In a related question, I posted code for a function that computes zonal statistics for a raster given a regular polygon grid and converts that to a numpy array. Here is the relevant bit of code related to the grid-array conversion:



import numpy as np
import arcpy

def grid_to_array(grid, nrow, ncol, row_col_field_names, val_field_name):
"""
Returns an array of shape (nrow, ncol) from a polygon grid of the same shape.
Grid must have row/col values specified.
vars:
grid: name of grid feature class/layer; str
nrow: number of rows; int

ncol: number of columns; int
row_col_field_names: names of fields containing row/col values for each cell; list, tuple
val_field_name: name of field containing values of interest; same as row_col_field_names
"""

# Convert regular-grid polygon shapefile to an array.
a = arcpy.da.TableToNumPyArray(grid, row_col_field_names + val_field_name, skip_nulls=False)

# Convert to recarray
a_1 = np.rec.fromrecords(a.tolist(), names=['row', 'col', 'val'])

a_1.sort(order=['row', 'col'])
b = np.reshape(a_1.val, (nrow, ncol))
return b

Original Answer


The grid I was using was stored as a feature class within a geodatabase; printing cursor iterations over the table revealed that it was only reading about 1,000 records at a time at which point it would pause for quite a while before reading another 1,000 records. After 45 minutes, the table still had not been traversed. By exporting to a shapefile I was able to much more quickly process the grid features (about 20 seconds). Final code:


for surface in arcpy.ListDatasets('*', 'Raster'):

out_table = r'in_memory\{}'.format(surface)
arcpy.sa.ZonalStatisticsAsTable(grid, 'FID', surface, out_table, 'DATA', 'MIN_MAX_MEAN')


grid_lyr = r'in_memory\grid_lyr'
table_vwe = r'in_memory\table_vwe'
arcpy.MakeFeatureLayer_management(grid, grid_lyr)
arcpy.MakeTableView_management(out_table, table_vwe)
arcpy.AddJoin_management(grid_lyr, 'FID', table_vwe, 'FID', 'KEEP_ALL')

a = arcpy.da.TableToNumPyArray(grid_lyr, ['grid.row', 'grid.col', '{}.MEAN'.format(surface)], skip_nulls=True)

# OLD CODE - much less efficient

#b = np.zeros((nrow, ncol), dtype=np.float)*np.nan
#for idx, (row, col, val) in np.ndenumerate(a):
# b[row-1, col-1] = val

# NEW CODE - much more efficient (updated 5/13/2015)
a_1 = np.rec.fromrecords(a.tolist(), names=['row', 'col', 'val'])
a_1.sort(order=['row', 'col'])
b = np.reshape(a_1.val, (nrow, ncol))

np.savetxt(r'ref\{}.dat'.format(surface), b, fmt='%5.0f')

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...