arcpy - Most efficient method to join multiple fields in ArcGIS

Tuesday, 15 January 2019

arcpy - Most efficient method to join multiple fields in ArcGIS

I have as set of shapefiles with approximately 300000 polygons each, as well as corresponding point shapefiles and tables with approximately the same number of row (equal or less rows than the polygons FC).

I need to join about 30 fields from the table of the point FC and from the stand alone table. It works, but it is quite slow.

I've tried different method : AddJoin + claculateField, JoinField and arcpy.da.UpdateCursor. Join Field is the slowest and the Cursor is the fastest but I found it tricky to manage when the two tables are not matching exactly. Nevertheless, I was surprised that a spatial join is sometimes faster than a table join: this makes me think that I a missing something. I saw on a Website that I should work with layer instead of the FC, but I used "makeFeatureLayer" in my script and it didn't help much.

Can I improve this with indexing, with other FC types or with another approach ? Note that this is part of a broader Python script with arcpy, so I need a solution that can be used within this script.

Answer

I tested different methods for joining massive shapefile tables: the fastest way is using cursors.

Read the join table creating a dictionaries with values of join field as keys and the rest of the fields as dictionary values

Create new fields in the target shapefile

Update the target shapefile

Use the if/else statements to join and populate the target fields

import arcpy
targshp = r"C:\dataset\shape1.shp"
joinshp = r"C:\dataset\shape2.shp"

## Step 1     
joinfields = ['Join_Field', 'Field1', 'Field2']
joindict = {}
with arcpy.da.SearchCursor(joinshp, joinfields) as rows:

    for row in rows:
        joinval = row[0]
        val1 = row[1]
        val2 = row[2]
        joindict[joinval]=[val1, val2]
del row, rows

## Step 2
arcpy.AddField_management(targshp, "New_Field1", "LONG", "5")
arcpy.AddField_management(targshp, "New_Field2", "LONG", "5")


## Step 3
targfields = ['Key_Field', 'New_Field1', 'New_Field2']
with arcpy.da.UpdateCursor(targshp, targfields) as recs:
    for rec in recs:
        keyval = rec[0]

        ## Step 4
        if joindict.has_key(keyval):
            rec[1] = joindict[keyval][0]

            rec[2] = joindict[keyval][1]
        else:
            rec[1] = 0
            rec[2] = 0
        recs.updateRow(rec)
del rec, recs

Blog

Tuesday, 15 January 2019

arcpy - Most efficient method to join multiple fields in ArcGIS

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?