I have as set of shapefiles with approximately 300000 polygons each, as well as corresponding point shapefiles and tables with approximately the same number of row (equal or less rows than the polygons FC).
I need to join about 30 fields from the table of the point FC and from the stand alone table. It works, but it is quite slow.
I've tried different method : AddJoin + claculateField, JoinField and arcpy.da.UpdateCursor. Join Field is the slowest and the Cursor is the fastest but I found it tricky to manage when the two tables are not matching exactly. Nevertheless, I was surprised that a spatial join is sometimes faster than a table join: this makes me think that I a missing something. I saw on a Website that I should work with layer instead of the FC, but I used "makeFeatureLayer" in my script and it didn't help much.
Can I improve this with indexing, with other FC types or with another approach ? Note that this is part of a broader Python script with arcpy, so I need a solution that can be used within this script.
Answer
I tested different methods for joining massive shapefile tables: the fastest way is using cursors.
- Read the join table creating a dictionaries with values of join field as keys and the rest of the fields as dictionary values
- Create new fields in the target shapefile
- Update the target shapefile
Use the if/else statements to join and populate the target fields
import arcpy
targshp = r"C:\dataset\shape1.shp"
joinshp = r"C:\dataset\shape2.shp"
## Step 1
joinfields = ['Join_Field', 'Field1', 'Field2']
joindict = {}
with arcpy.da.SearchCursor(joinshp, joinfields) as rows:
for row in rows:
joinval = row[0]
val1 = row[1]
val2 = row[2]
joindict[joinval]=[val1, val2]
del row, rows
## Step 2
arcpy.AddField_management(targshp, "New_Field1", "LONG", "5")
arcpy.AddField_management(targshp, "New_Field2", "LONG", "5")
## Step 3
targfields = ['Key_Field', 'New_Field1', 'New_Field2']
with arcpy.da.UpdateCursor(targshp, targfields) as recs:
for rec in recs:
keyval = rec[0]
## Step 4
if joindict.has_key(keyval):
rec[1] = joindict[keyval][0]
rec[2] = joindict[keyval][1]
else:
rec[1] = 0
rec[2] = 0
recs.updateRow(rec)
del rec, recs
No comments:
Post a Comment