arcpy - Efficient ways to read table data from a large feature layer?

Tuesday, 16 June 2015

arcpy - Efficient ways to read table data from a large feature layer?

I have a feature layer that contains about 460,000 records and it currently takes about 20 minutes to read that table using the arcpy.da.TableToNumPyArray() tool. Is there a more efficient way to read the rows to then be able to manipulate these data? It seems like there should be a more "C-like" way to access the rows. Here is the function in it's entirety, though I'm focused on the line near the bottom where I'm calling arcpy.da.TableToNumpyArray() to read out the data:

def import_surface(surface, grid, nrow, ncol, row_col_fields, field_to_convert):
    """
    References raster surface to model grid.
    Returns an array of size nrow, ncol.
    """
    out_table = r'in_memory\{}'.format(surface)

    grid_oid_fieldname = arcpy.Describe(grid).OIDFieldName

    # Compute the mean surface value for each model cell and output a table (~20 seconds)
    arcpy.sa.ZonalStatisticsAsTable(grid, grid_oid_fieldname, surface, out_table, 'DATA', 'MEAN')

    # Build some layers and views
    grid_lyr = r'in_memory\grid_lyr'
    table_vwe = r'in_memory\table_vwe'
    arcpy.MakeFeatureLayer_management(grid, grid_lyr)
    arcpy.MakeTableView_management(out_table, table_vwe)


    grid_lyr_oid_fieldname = arcpy.Describe(grid_lyr).OIDFieldName
    # table_vwe_oid_fieldname = arcpy.Describe(table_vwe).OIDFieldName

    # Join the output zonal stats table with the grid to assign row/col to each value.
    arcpy.AddJoin_management(grid_lyr, grid_lyr_oid_fieldname, table_vwe, 'OID_', 'KEEP_ALL')

    # Take the newly joined grid/zonal stats and read out tuples of (row, col, val) (takes ~20 minutes)   
    a = arcpy.da.TableToNumPyArray(grid_lyr, row_col_fields + [field_to_convert], skip_nulls=False)


    # Reshape the 1D array output by TableToNumpy into a 2D structured array, sorting by row/col (~0.1 seconds)
    a = np.rec.fromrecords(a.tolist(), names=['row', 'col', 'val'])
    a.sort(order=['row', 'col'])
    b = np.reshape(a.val, (nrow, ncol))

    return b

Answer

The main bottle neck in my previous code was in reading from a GDB featureclass. Once I converted to a shapefile, my read time dropped to about 10% of the original. I should have remembered that I had the same problem just a few months ago #facepalm. Additionally, in a previous iteration of the current problem, I tried to copy my features to an in_memory location which did not yield an improvement in processing time. However, I tried it again, perhaps after resetting the iPython Notebook kernel, and the processing time became much more reasonable.

Here is the new code which runs in less than 2 minutes:

def import_surface(surface, grid, nrow, ncol,

                   zstat_field=None,
                   join_field=None,
                   row_col_fields=('grid_row', 'grid_col'),
                   field_aliasname_to_convert='MEAN'):
    """
    References raster surface to model grid and returns an array of size nrow, ncol.
    vars:
        surface: the raster surface to be sampled; raster
        grid: the vector grid upon which the raster will be summarized; feature class
        nrow: number of rows in the grid; int

        ncol: number of columns in the grid; int
        zstat_field: field in the grid that defines the zones; str
        join_field: field in the resulting features to be used for the join; str
        row_col_fields: names of the fields containing row and column numbers; list
        field_aliasname_to_convert: alias of resulting zstat field to use; str
    """
    if join_field is None:
        join_field = arcpy.Describe(grid).OIDFieldName
    if zstat_field is None:
        zstat_field = join_field


    zstat = r'in_memory\{}'.format(surface)
    arcpy.sa.ZonalStatisticsAsTable(grid, zstat_field, surface, zstat, 'NODATA', 'MEAN')


    # Create feature layers and table views for joining
    grid_lyr = r'in_memory\grid_lyr'
    arcpy.MakeFeatureLayer_management(grid, grid_lyr)

    zstat_vwe = r'in_memory\zstat_vwe'

    arcpy.MakeTableView_management(zstat, zstat_vwe)


    # Join tables
    arcpy.AddJoin_management(grid_lyr, join_field, zstat_vwe, join_field, 'KEEP_ALL')


    # Write the grid features to a new featureclass
    zstat_grid = r'in_memory\zstat_grid'
    arcpy.CopyFeatures_management(grid_lyr, zstat_grid)



    # Ensure we point to the correct zstat field name in case of truncation
    for idx, alias in  enumerate([f.aliasName for f in arcpy.ListFields(zstat_grid)]):
        if alias == field_aliasname_to_convert:
            name = [f.name for f in arcpy.ListFields(zstat_grid)][idx]
            break


    # Convert regular-grid polygon shapefile to an array.

    a = arcpy.da.TableToNumPyArray(zstat_grid, row_col_fields + (name, ), skip_nulls=False)


    # Convert to recarray
    a_1 = np.rec.fromrecords(a.tolist(), names=['row', 'col', 'val'])
    a_1.sort(order=['row', 'col'])
    b = np.reshape(a_1.val, (nrow, ncol))
    return b

Blog

Tuesday, 16 June 2015

arcpy - Efficient ways to read table data from a large feature layer?

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?