arcpy - How to improve performance when using ArcGIS cursors in Python with big tables?

Sunday, 2 December 2018

arcpy - How to improve performance when using ArcGIS cursors in Python with big tables?

I have a pretty big point feature class in a file geodatabase (~4 000 000 records). This is a regular grid of points with a 100m resolution.

I need to perform a kind of generalization on this layer. For this, I create a new grid where each point lies in the middle of 4 "old" points:

 *     *     *     *
    o     o     o
 *     *     *     *
    o     o     o
 *     *     *     *

[*] = point of the original grid - [o] = point of the new grid

The attribute value of each new point is calculated based on the weighted values of its 4 neighbors in the old grid. I thus loop on all the points of my new grid and, for each of them, I loop on all the points of my old grid, in order to find the neighbors (by comparing the values of X and Y in the attribute table). Once 4 neighbors have been found, we get out of the loop.

There is no methodological complexity here but my problem is that, based on my first tests, this script will last for weeks to complete...

Do you see any possibility to make it more efficient? A few ideas on the top of my head:

Index the fields X and Y => I did that but didn't notice any significant performance change

Do a spatial query to find the neighbors rather than an attribute-based one. Would that actually help? What spatial function in ArcGIS should do the job? I doubt that, e.g., buffering each new point will prove more efficient

Transform the feature class into a NumPy Array. Would that help? I haven't worked a lot with NumPy so far and I wouldn't like to dive into it unless someone tells me it might really help reducing the processing time

Anything else?

Answer

Thanks everybody for your help!

I finally found a very non-pythonic way to solve this issue... What was actually taking the most computing time was to find the 4 neighbors of each point. Rather than using the X and Y attributes (either with an arcpy cursor or within another data structure, such as a python ditionary), I ended up using the ArcGIS tool Generate near table. I assume this takes advantage of the spatial indexes and the performances are obviously much much higher, without me having to implement the index myself.

Blog

Sunday, 2 December 2018

arcpy - How to improve performance when using ArcGIS cursors in Python with big tables?

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?