Tuesday, 15 January 2019

arcpy - Most efficient method to join multiple fields in ArcGIS



I have as set of shapefiles with approximately 300000 polygons each, as well as corresponding point shapefiles and tables with approximately the same number of row (equal or less rows than the polygons FC).


I need to join about 30 fields from the table of the point FC and from the stand alone table. It works, but it is quite slow.


I've tried different method : AddJoin + claculateField, JoinField and arcpy.da.UpdateCursor. Join Field is the slowest and the Cursor is the fastest but I found it tricky to manage when the two tables are not matching exactly. Nevertheless, I was surprised that a spatial join is sometimes faster than a table join: this makes me think that I a missing something. I saw on a Website that I should work with layer instead of the FC, but I used "makeFeatureLayer" in my script and it didn't help much.


Can I improve this with indexing, with other FC types or with another approach ? Note that this is part of a broader Python script with arcpy, so I need a solution that can be used within this script.



Answer



I tested different methods for joining massive shapefile tables: the fastest way is using cursors.



  1. Read the join table creating a dictionaries with values of join field as keys and the rest of the fields as dictionary values

  2. Create new fields in the target shapefile

  3. Update the target shapefile



  4. Use the if/else statements to join and populate the target fields


    import arcpy
    targshp = r"C:\dataset\shape1.shp"
    joinshp = r"C:\dataset\shape2.shp"

    ## Step 1
    joinfields = ['Join_Field', 'Field1', 'Field2']
    joindict = {}
    with arcpy.da.SearchCursor(joinshp, joinfields) as rows:

    for row in rows:
    joinval = row[0]
    val1 = row[1]
    val2 = row[2]
    joindict[joinval]=[val1, val2]
    del row, rows

    ## Step 2
    arcpy.AddField_management(targshp, "New_Field1", "LONG", "5")
    arcpy.AddField_management(targshp, "New_Field2", "LONG", "5")


    ## Step 3
    targfields = ['Key_Field', 'New_Field1', 'New_Field2']
    with arcpy.da.UpdateCursor(targshp, targfields) as recs:
    for rec in recs:
    keyval = rec[0]

    ## Step 4
    if joindict.has_key(keyval):
    rec[1] = joindict[keyval][0]

    rec[2] = joindict[keyval][1]
    else:
    rec[1] = 0
    rec[2] = 0
    recs.updateRow(rec)
    del rec, recs


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...