python - Using OGR and Shapely more efficiently?

Sunday, 22 September 2019

python - Using OGR and Shapely more efficiently?

I'm looking for some suggestions about how to make my python code more efficient. Normally efficiency doesn't matter for me but I am now working with a text file of US locations with over 1.5 million points. With the given setup it is taking about 5 seconds to run operations on one point; I need to get this figure way down.

I'm using three different python GIS packages to do a few different operations on the points and output a new delimited text file.

I use OGR to read a county boundary shapefile and get access to the boundary geometry.

Shapely checks to see if a point is within any of these counties.

If it is within one, I use the Python Shapefile Library to pull attribute information from the boundary .dbf.

I then write some information from both sources to a text file.

I suspect that the inefficiency lies in having a 2-3 tiered loop... not quite sure what to do about it. I'm particularly looking for help with someone experienced in using any of these 3 packages, as it is my first time to use any of them.

import os, csv
from shapely.geometry import Point
from shapely.geometry import Polygon
from shapely.wkb import loads
from osgeo import ogr
import shapefile


pointFile = "C:\\NSF_Stuff\\NLTK_Scripts\\Gazetteer_New\\NationalFile_20110404.txt"
shapeFolder = "C:\NSF_Stuff\NLTK_Scripts\Gazetteer_New"
#historicBounds = "C:\\NSF_Stuff\\NLTK_Scripts\\Gazetteer_New\\US_Counties_1860s_NAD"
historicBounds = "US_Counties_1860s_NAD"
writeFile = "C:\\NSF_Stuff\\NLTK_Scripts\\Gazetteer_New\\NewNational_Gazet.txt"

#opens the point file, reads it as a delimited file, skips the first line
openPoints = open(pointFile, "r")
reader = csv.reader(openPoints, delimiter="|")
reader.next()


#opens the write file
openWriteFile = open(writeFile, "w")

#uses Python Shapefile Library to read attributes from .dbf
sf = shapefile.Reader("C:\\NSF_Stuff\\NLTK_Scripts\\Gazetteer_New\\US_Counties_1860s_NAD.dbf")
records = sf.records()
print "Starting loop..."

#This will loop through the points in pointFile    

for row in reader:
    print row
    shpIndex = 0
    pointX = row[10]
    pointY = row[9]
    thePoint = Point(float(pointX), float(pointY))
    #This section uses OGR to read the geometry of the shapefile
    openShape = ogr.Open((str(historicBounds) + ".shp"))
    layers = openShape.GetLayerByName(historicBounds)
    #This section loops through the geometries, determines if the point is in a polygon

    for element in layers:
        geom = loads(element.GetGeometryRef().ExportToWkb())
        if geom.geom_type == "Polygon":
            if thePoint.within(geom) == True:
                print "!!!!!!!!!!!!! Found a Point Within Historic !!!!!!!!!!!!"
                print str(row[1]) + ", " + str(row[2]) + ", " + str(row[5]) + " County, " + str(row[3])
                print records[shpIndex]
                openWriteFile.write((str(row[0]) + "|" + str(row[1]) + "|" + str(row[2]) + "|" + str(row[5]) + "|" + str(row[3]) + "|" + str(row[9]) + "|" + str(row[10]) + "|" + str(records[shpIndex][3]) + "|" + str(records[shpIndex][9]) + "|\n"))
        if geom.geom_type == "MultiPolygon":
            for pol in geom:

                if thePoint.within(pol) == True:
                    print "!!!!!!!!!!!!!!!!! Found a Point Within MultiPolygon !!!!!!!!!!!!!!"
                    print str(row[1]) + ", " + str(row[2]) + ", " + str(row[5]) + " County, " + str(row[3])
                    print records[shpIndex]
                    openWriteFile.write((str(row[0]) + "|" + str(row[1]) + "|" + str(row[2]) + "|" + str(row[5]) + "|" + str(row[3]) + "|" + str(row[9]) + "|" + str(row[10]) + "|" + str(records[shpIndex][3]) + "|" + str(records[shpIndex][9]) + "|\n"))
        shpIndex = shpIndex + 1
    print "finished checking point"
    openShape = None
    layers = None



pointFile.close()
writeFile.close()
print "Done"

Blog

Sunday, 22 September 2019

python - Using OGR and Shapely more efficiently?

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?