Monday, 14 December 2015

arcpy - Fastest way to Join tables in python automation


My issue comes form the fact that doing Joins manually is actually relatively quick (5mins), but python automation of this process is 6hrs! Here is my code:


import arcpy

GDB_MARK_TO_MARK_VECTOR_VIEW = "Database Connections\\OLE DB Connection.odc\\GDB.MARK_TO_MARK_VECTOR_VIEW"
table = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mlines.mdb\\JOINtable"
feature1 = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mlines.mdb\\UNJOINED"
feature = "N:\Geodetic\GIS\BaseLayers\M2Mlines\M2Mlines.shp"


arcpy.env.overwriteOutput = True

arcpy.CopyRows_management(GDB_MARK_TO_MARK_VECTOR_VIEW,table)

arcpy.XYToLine_management(GDB_MARK_TO_MARK_VECTOR_VIEW,feature1,"FROM_REF_UTM_X_M","FROM_REF_UTM_Y_M","TO_REF_UTM_X_M","TO_REF_UTM_Y_M","GEODESIC","M2M_ID", "PROJCS['NAD_1983_UTM_Zone_15N',GEOGCS['GCS_North_American_1983',DATUM['D_North_American_1983',SPHEROID['GRS_1980',6378137.0,298.257222101]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Transverse_Mercator'],PARAMETER['False_Easting',500000.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',-93.0],PARAMETER['Scale_Factor',0.9996],PARAMETER['Latitude_Of_Origin',0.0],UNIT['Meter',1.0]];-5120900 -9998100 10000;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision")

arcpy.JoinField_management(feature1,"M2M_ID",table,"M2M_ID")

arcpy.CopyFeatures_management(feature1,feature)


exit()

The XYToLine_management does not keep the table and therefore the table needs to be reattached. Is there a way to speed this up?


MAJOR BREAK THROUGH... (For me anyway) I honestly didn't think it would work but I checked all 300,000+ lines in my data table vs "the correct way" above, here is my new code that runs in 4 minutes!:


import arcpy,os


GDB_MARK_TO_MARK_VECTOR_VIEW = "Database Connections\\OLE DB Connection.odc\\GDB.MARK_TO_MARK_VECTOR_VIEW"
table = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\JOINtable.dbf"

feature1 = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mlines.shp"
table1 = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mlines.dbf"
table2 = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\JOINtable.dbf.xml"

arcpy.env.overwriteOutput = True

arcpy.CopyRows_management(GDB_MARK_TO_MARK_VECTOR_VIEW,table)
arcpy.XYToLine_management(table,feature1,"FROM_REF_U","FROM_REF_1","TO_REF_UTM","TO_REF_U_1","GEODESIC","M2M_ID", "PROJCS['NAD_1983_UTM_Zone_15N',GEOGCS['GCS_North_American_1983',DATUM['D_North_American_1983',SPHEROID['GRS_1980',6378137.0,298.257222101]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Transverse_Mercator'],PARAMETER['False_Easting',500000.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',-93.0],PARAMETER['Scale_Factor',0.9996],PARAMETER['Latitude_Of_Origin',0.0],UNIT['Meter',1.0]];-5120900 -9998100 10000;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision")
os.remove(table1)
os.remove(table2)

os.rename(table,table1)

exit()

I could still be guided to a new answer yet. I am always looking to improve, so any thoughts are welcome.


I also didn't include my breakthrough as an answer due to the fact that I don't think it answers the real question of faster Joins, it just fixes my specific problem for today.



Answer



@PolyGeo gave me the start to a set of test that ended with "The Code", which is:


import arcpy,os



GDB_MARK_TO_MARK_VECTOR_VIEW = "Database Connections\\OLE DB Connection.odc\\GDB.MARK_TO_MARK_VECTOR_VIEW"
feature = "C:\\VirtualUpdater\\TempFiles\\UnJoined.shp"
layer = "C:\\VirtualUpdater\\TempFiles\\M2Mlines.lyr"
featureFIN = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mline.shp"

arcpy.env.overwriteOutput = True

arcpy.XYToLine_management(GDB_MARK_TO_MARK_VECTOR_VIEW,feature,"FROM_REF_UTM_X_M","FROM_REF_UTM_Y_M","TO_REF_UTM_X_M","TO_REF_UTM_Y_M","GEODESIC","M2M_ID", "PROJCS['NAD_1983_UTM_Zone_15N',GEOGCS['GCS_North_American_1983',DATUM['D_North_American_1983',SPHEROID['GRS_1980',6378137.0,298.257222101]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Transverse_Mercator'],PARAMETER['False_Easting',500000.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',-93.0],PARAMETER['Scale_Factor',0.9996],PARAMETER['Latitude_Of_Origin',0.0],UNIT['Meter',1.0]];-5120900 -9998100 10000;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision")
arcpy.MakeFeatureLayer_management(feature,layer)

arcpy.AddJoin_management(layer,"M2M_ID",GDB_MARK_TO_MARK_VECTOR_VIEW,"M2M_ID","KEEP_ALL")
arcpy.CopyFeatures_management(layer,featureFIN)

exit()

It takes 15 minutes for 300,000+ features, and while it truncates the names of the column headers they are still usable.


When I added an intermediary step of creating a table with FID's to speed up the join (which it did by 7 min!), it created column headers that were ABSOLUTE JUNK.


I also tested this in several other scripts with smaller datasets and I have yet to find one that hasn't sped up by switching to the above process.


So to sum up this whole thing for my dataset which has 300,000+ features, the first process in the question took 6 hours, the second is a gamble (I think) and took 4 min, and lastly the example in the answer took 15 min but gives the same stable result as the first.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...