My issue comes form the fact that doing Joins manually is actually relatively quick (5mins), but python automation of this process is 6hrs! Here is my code:
import arcpy
GDB_MARK_TO_MARK_VECTOR_VIEW = "Database Connections\\OLE DB Connection.odc\\GDB.MARK_TO_MARK_VECTOR_VIEW"
table = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mlines.mdb\\JOINtable"
feature1 = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mlines.mdb\\UNJOINED"
feature = "N:\Geodetic\GIS\BaseLayers\M2Mlines\M2Mlines.shp"
arcpy.env.overwriteOutput = True
arcpy.CopyRows_management(GDB_MARK_TO_MARK_VECTOR_VIEW,table)
arcpy.XYToLine_management(GDB_MARK_TO_MARK_VECTOR_VIEW,feature1,"FROM_REF_UTM_X_M","FROM_REF_UTM_Y_M","TO_REF_UTM_X_M","TO_REF_UTM_Y_M","GEODESIC","M2M_ID", "PROJCS['NAD_1983_UTM_Zone_15N',GEOGCS['GCS_North_American_1983',DATUM['D_North_American_1983',SPHEROID['GRS_1980',6378137.0,298.257222101]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Transverse_Mercator'],PARAMETER['False_Easting',500000.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',-93.0],PARAMETER['Scale_Factor',0.9996],PARAMETER['Latitude_Of_Origin',0.0],UNIT['Meter',1.0]];-5120900 -9998100 10000;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision")
arcpy.JoinField_management(feature1,"M2M_ID",table,"M2M_ID")
arcpy.CopyFeatures_management(feature1,feature)
exit()
The XYToLine_management does not keep the table and therefore the table needs to be reattached. Is there a way to speed this up?
MAJOR BREAK THROUGH... (For me anyway) I honestly didn't think it would work but I checked all 300,000+ lines in my data table vs "the correct way" above, here is my new code that runs in 4 minutes!:
import arcpy,os
GDB_MARK_TO_MARK_VECTOR_VIEW = "Database Connections\\OLE DB Connection.odc\\GDB.MARK_TO_MARK_VECTOR_VIEW"
table = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\JOINtable.dbf"
feature1 = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mlines.shp"
table1 = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mlines.dbf"
table2 = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\JOINtable.dbf.xml"
arcpy.env.overwriteOutput = True
arcpy.CopyRows_management(GDB_MARK_TO_MARK_VECTOR_VIEW,table)
arcpy.XYToLine_management(table,feature1,"FROM_REF_U","FROM_REF_1","TO_REF_UTM","TO_REF_U_1","GEODESIC","M2M_ID", "PROJCS['NAD_1983_UTM_Zone_15N',GEOGCS['GCS_North_American_1983',DATUM['D_North_American_1983',SPHEROID['GRS_1980',6378137.0,298.257222101]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Transverse_Mercator'],PARAMETER['False_Easting',500000.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',-93.0],PARAMETER['Scale_Factor',0.9996],PARAMETER['Latitude_Of_Origin',0.0],UNIT['Meter',1.0]];-5120900 -9998100 10000;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision")
os.remove(table1)
os.remove(table2)
os.rename(table,table1)
exit()
I could still be guided to a new answer yet. I am always looking to improve, so any thoughts are welcome.
I also didn't include my breakthrough as an answer due to the fact that I don't think it answers the real question of faster Joins, it just fixes my specific problem for today.
Answer
@PolyGeo gave me the start to a set of test that ended with "The Code", which is:
import arcpy,os
GDB_MARK_TO_MARK_VECTOR_VIEW = "Database Connections\\OLE DB Connection.odc\\GDB.MARK_TO_MARK_VECTOR_VIEW"
feature = "C:\\VirtualUpdater\\TempFiles\\UnJoined.shp"
layer = "C:\\VirtualUpdater\\TempFiles\\M2Mlines.lyr"
featureFIN = "N:\\Geodetic\\GIS\\BaseLayers\\M2Mlines\\M2Mline.shp"
arcpy.env.overwriteOutput = True
arcpy.XYToLine_management(GDB_MARK_TO_MARK_VECTOR_VIEW,feature,"FROM_REF_UTM_X_M","FROM_REF_UTM_Y_M","TO_REF_UTM_X_M","TO_REF_UTM_Y_M","GEODESIC","M2M_ID", "PROJCS['NAD_1983_UTM_Zone_15N',GEOGCS['GCS_North_American_1983',DATUM['D_North_American_1983',SPHEROID['GRS_1980',6378137.0,298.257222101]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Transverse_Mercator'],PARAMETER['False_Easting',500000.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',-93.0],PARAMETER['Scale_Factor',0.9996],PARAMETER['Latitude_Of_Origin',0.0],UNIT['Meter',1.0]];-5120900 -9998100 10000;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision")
arcpy.MakeFeatureLayer_management(feature,layer)
arcpy.AddJoin_management(layer,"M2M_ID",GDB_MARK_TO_MARK_VECTOR_VIEW,"M2M_ID","KEEP_ALL")
arcpy.CopyFeatures_management(layer,featureFIN)
exit()
It takes 15 minutes for 300,000+ features, and while it truncates the names of the column headers they are still usable.
When I added an intermediary step of creating a table with FID's to speed up the join (which it did by 7 min!), it created column headers that were ABSOLUTE JUNK.
I also tested this in several other scripts with smaller datasets and I have yet to find one that hasn't sped up by switching to the above process.
So to sum up this whole thing for my dataset which has 300,000+ features, the first process in the question took 6 hours, the second is a gamble (I think) and took 4 min, and lastly the example in the answer took 15 min but gives the same stable result as the first.
No comments:
Post a Comment