Tuesday, 20 September 2016

python - Find a linestring closest to a given linestring


There must be already a solution to this problem - let's say we have five different linestrings. They might intersect or not. Given a sixth linestring, how do we find which one of the original five matches the sixth the closest, both in shape and space? A perfect match here would be if sixth linestring is exactly the same as one of the pre-existing ones. I the use case I'm looking at, one of the five would be almost exactly the same as the sixth one.


Anything in PostGIS, Java, Python would work for me, or even a general concept as well.



Answer



This is a pure pyqgis implementation of Hausdorff Distance, solely for comparing polylines. If you find the wikipedia page hard to understand, try to think of it this way: it is a distance that lies somewhere between the minimum and maximum distance between two lines, but it is not a simple statistical mean or median distance. It tends to be somewhat higher than the mean (min+max)/2, really a neat mathematical way of calculating the "greatest of all the distances from a point in one set[line] to the closest point in the other set[line]". And yes, it's an actual distance between two nodes on both lines!


def hausdorffDistance(geom1,geom2):

'''
Hausdorff Distance calculator, somewhat based on https://github.com/anitagraser/QGIS-Processing-tools/blob/master/1.1/scripts/find_similar_line_feature.py
Uses pure pyqgis instead of numpy
Inputs geom1 and geom2 should be QgsGeometry of type 'QGis.Line'
'''
dist = lambda x1, y1, x2, y2: float((x2-x1)**2+(y2-y1)**2)**(0.5) #Euclidean distance between two coordinates

##Get all possible combinations between coordinates on the first line and second line
combins = [dist(a[0], a[1], b[0], b[1]) for a in geom1.asPolyline() for b in geom2.asPolyline()]


##Find array dimensions
combinSz = len(list(combins))
xArrSize = len(geom1.asPolyline())
yArrSize = combinSz/xArrSize
#print "{0}x{1} array".format(xArrSize, yArrSize)

##Turn 1-dimensional list of distances into 2-dimensional list/array
distAryOne = [[0]*yArrSize for i in range(xArrSize)] #initialize empty 2-dimensional distance array first
for x in range(xArrSize):
for y in range(yArrSize):

distAryOne[x][y]=combins[(x*yArrSize)+y]
distAryTwo = [[0]*xArrSize for i in range(yArrSize)] #flipped order of distAryOne
for y in range(yArrSize):
for x in range(xArrSize):
#print y, x, (y*xArrSize)+x
distAryTwo[y][x]=combins[(y*xArrSize)+x]

##Finally calculates Hausdorff Distance
#Calculate distances between origin and target feature
H1 = max([min([distAryOne[i][j] for i in range(xArrSize)]) for j in range(yArrSize)]) #get the highest minimum (supremum infimum) travelling along axis 1 (y-axis)

H2 = max([min([distAryOne[i][j] for j in range(yArrSize)]) for i in range(xArrSize)]) #get the highest minimum (supremum infimum) travelling along axis 0 (x-axis)
#print H1, H2
#Repeat the calculation in reverse order
H3 = max([min([distAryTwo[j][i] for i in range(xArrSize)]) for j in range(yArrSize)]) #get the highest minimum (supremum infimum) travelling along axis 1 (y-axis)
H4 = max([min([distAryTwo[j][i] for j in range(yArrSize)]) for i in range(xArrSize)]) #get the highest minimum (supremum infimum) travelling along axis 0 (x-axis)
#print H3, H4

hausdorff = max([H1, H2]+[H3, H4])
#print hausdorff


return hausdorff

hDist = hausdorffDistance(featureOne.geometry(), featureTwo.geometry()) #gets Hausdorff Distance between two QgsGeometry

And to answer the OP's question, finding the closest/most-similar linestring would probably work in a loop like so:


linesDict = {}
linesDict[1] = QgsGeometry.fromWkt('LineString (19195738.63976059108972549 -5399938.71016304567456245, 19195567.28721107169985771 -5399646.2101767435669899)')
linesDict[2] = QgsGeometry.fromWkt('LineString (19195731.71763794869184494 -5399947.30879085976630449, 19195560.82387629151344299 -5399647.8986875806003809)')
linesDict[3] = QgsGeometry.fromWkt('LineString (19194766.44169946759939194 -5400376.5755823515355587, 19195214.49306683242321014 -5400105.78012083098292351, 19195258.48439108207821846 -5400084.0807413337752223, 19195306.40671262890100479 -5400070.40886308066546917, 19195384.15196022763848305 -5400066.94008383341133595, 19195487.77627336978912354 -5400067.64584534615278244, 19195521.69570336490869522 -5400061.87798151094466448, 19195547.68196233361959457 -5400050.05497545935213566, 19195731.71763794869184494 -5399947.30879085976630449)')
linesDict[4] = QgsGeometry.fromWkt('LineString (19194770.19089671224355698 -5400382.93258125334978104, 19195210.47202705591917038 -5400113.57951446622610092, 19195239.35679624602198601 -5400096.63180513121187687, 19195254.45391828566789627 -5400088.23555850423872471, 19195264.69389183074235916 -5400085.22728221211582422, 19195293.6455497182905674 -5400079.71067976579070091, 19195308.1098017729818821 -5400078.48953226301819086, 19195374.02861313149333 -5400069.41496563982218504, 19195383.67602141946554184 -5400068.01544827781617641, 19195487.77627336978912354 -5400067.64584534615278244, 19195514.07467959076166153 -5400068.170664357021451, 19195524.17230026796460152 -5400065.01574745960533619, 19195536.3263196162879467 -5400059.96993853803724051, 19195573.85859682410955429 -5400037.65741264726966619, 19195676.50016063824295998 -5399974.75716814957559109, 19195711.39376448467373848 -5399954.18374839331954718, 19195738.63976059108972549 -5399938.71016304567456245)')

linesDict[5] = QgsGeometry.fromWkt('LineString (19195731.71763794869184494 -5399947.30879085976630449, 19196343.75779527798295021 -5399613.46129895187914371, 19196395.72945566475391388 -5399589.81357175391167402, 19196570.48818347230553627 -5399533.5258752852678299)')
sixthLineStr = QgsGeometry.fromWkt('LineString (19195738.63976059108972549 -5399938.71016304567456245, 19196027.92994695901870728 -5399769.77589720394462347, 19196106.32950322702527046 -5399741.08493372611701488, 19196326.46363679319620132 -5399622.75882790051400661, 19196505.00672129169106483 -5399554.61725367326289415, 19196570.48818347230553627 -5399533.5258752852678299)') #The sixth linestring we want to compare to the other five

for lineNo in linesDict.keys():
print lineNo, hausdorffDistance(linesDict[lineNo], sixthLineStr)

closestLineNo = sorted([(hausdorffDistance(linesDict[lineNo], sixthLineStr), lineNo) for lineNo in linesDict.keys()])[0][1]
print "Line number {0} is the closest to the sixth Linestring".format(closestLineNo)

Hausdorff Distance in QGIS



Result:
1 942.18211796
2 1066.25287595
3 1066.25287595
4 1065.47020549
5 352.393932128
Line number 5 is the closest to the sixth Linestring

Credits to Anita Graser's script which the above code was based off, and which can be found on github. The script here is probably more convoluted than Anita's, but it does not depend on scipy.spatial whose installation didn't work for me through OSGEO4W. I would actually recommend the linked numpy method if you can use it, much shorter and cleaner code.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...