Tuesday, 16 June 2015

arcpy - Efficient ways to read table data from a large feature layer?


I have a feature layer that contains about 460,000 records and it currently takes about 20 minutes to read that table using the arcpy.da.TableToNumPyArray() tool. Is there a more efficient way to read the rows to then be able to manipulate these data? It seems like there should be a more "C-like" way to access the rows. Here is the function in it's entirety, though I'm focused on the line near the bottom where I'm calling arcpy.da.TableToNumpyArray() to read out the data:


def import_surface(surface, grid, nrow, ncol, row_col_fields, field_to_convert):
"""
References raster surface to model grid.
Returns an array of size nrow, ncol.
"""
out_table = r'in_memory\{}'.format(surface)

grid_oid_fieldname = arcpy.Describe(grid).OIDFieldName

# Compute the mean surface value for each model cell and output a table (~20 seconds)
arcpy.sa.ZonalStatisticsAsTable(grid, grid_oid_fieldname, surface, out_table, 'DATA', 'MEAN')

# Build some layers and views
grid_lyr = r'in_memory\grid_lyr'
table_vwe = r'in_memory\table_vwe'
arcpy.MakeFeatureLayer_management(grid, grid_lyr)
arcpy.MakeTableView_management(out_table, table_vwe)


grid_lyr_oid_fieldname = arcpy.Describe(grid_lyr).OIDFieldName
# table_vwe_oid_fieldname = arcpy.Describe(table_vwe).OIDFieldName

# Join the output zonal stats table with the grid to assign row/col to each value.
arcpy.AddJoin_management(grid_lyr, grid_lyr_oid_fieldname, table_vwe, 'OID_', 'KEEP_ALL')

# Take the newly joined grid/zonal stats and read out tuples of (row, col, val) (takes ~20 minutes)
a = arcpy.da.TableToNumPyArray(grid_lyr, row_col_fields + [field_to_convert], skip_nulls=False)


# Reshape the 1D array output by TableToNumpy into a 2D structured array, sorting by row/col (~0.1 seconds)
a = np.rec.fromrecords(a.tolist(), names=['row', 'col', 'val'])
a.sort(order=['row', 'col'])
b = np.reshape(a.val, (nrow, ncol))

return b

Answer



The main bottle neck in my previous code was in reading from a GDB featureclass. Once I converted to a shapefile, my read time dropped to about 10% of the original. I should have remembered that I had the same problem just a few months ago #facepalm. Additionally, in a previous iteration of the current problem, I tried to copy my features to an in_memory location which did not yield an improvement in processing time. However, I tried it again, perhaps after resetting the iPython Notebook kernel, and the processing time became much more reasonable.


Here is the new code which runs in less than 2 minutes:


def import_surface(surface, grid, nrow, ncol,

zstat_field=None,
join_field=None,
row_col_fields=('grid_row', 'grid_col'),
field_aliasname_to_convert='MEAN'):
"""
References raster surface to model grid and returns an array of size nrow, ncol.
vars:
surface: the raster surface to be sampled; raster
grid: the vector grid upon which the raster will be summarized; feature class
nrow: number of rows in the grid; int

ncol: number of columns in the grid; int
zstat_field: field in the grid that defines the zones; str
join_field: field in the resulting features to be used for the join; str
row_col_fields: names of the fields containing row and column numbers; list
field_aliasname_to_convert: alias of resulting zstat field to use; str
"""
if join_field is None:
join_field = arcpy.Describe(grid).OIDFieldName
if zstat_field is None:
zstat_field = join_field


zstat = r'in_memory\{}'.format(surface)
arcpy.sa.ZonalStatisticsAsTable(grid, zstat_field, surface, zstat, 'NODATA', 'MEAN')


# Create feature layers and table views for joining
grid_lyr = r'in_memory\grid_lyr'
arcpy.MakeFeatureLayer_management(grid, grid_lyr)

zstat_vwe = r'in_memory\zstat_vwe'

arcpy.MakeTableView_management(zstat, zstat_vwe)


# Join tables
arcpy.AddJoin_management(grid_lyr, join_field, zstat_vwe, join_field, 'KEEP_ALL')


# Write the grid features to a new featureclass
zstat_grid = r'in_memory\zstat_grid'
arcpy.CopyFeatures_management(grid_lyr, zstat_grid)



# Ensure we point to the correct zstat field name in case of truncation
for idx, alias in enumerate([f.aliasName for f in arcpy.ListFields(zstat_grid)]):
if alias == field_aliasname_to_convert:
name = [f.name for f in arcpy.ListFields(zstat_grid)][idx]
break


# Convert regular-grid polygon shapefile to an array.

a = arcpy.da.TableToNumPyArray(zstat_grid, row_col_fields + (name, ), skip_nulls=False)


# Convert to recarray
a_1 = np.rec.fromrecords(a.tolist(), names=['row', 'col', 'val'])
a_1.sort(order=['row', 'col'])
b = np.reshape(a_1.val, (nrow, ncol))
return b

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...