I am looking for a fast algorithm that will find the closest point on a line segment in a graph (such as a road network) to my test point.
I'll try to be more specific. I have a road network graph. This may have somewhere in the region of 30 million nodes and links/lines (my US dataset is of this size). I already have a fast routing algorithm up and running, but it works from node to node. The real world does not work with nodes, but with coordinates. I could search for the nearest node to each start/end coordinate, and in the past I have coded a very fast coordinate matching algorithm (a few million nodes, and managing >~1000 test points per second).
Commercial routing algorithms do not work like that: They direct you to the nearest road segment. In other words I need to match a coordinate to the nearest line (graph link). As well as finding the closest line, I need a fraction that tells me how far along the line that closest point is. Then I can route from the closest node and add the fractional line in to the distance/time sum. Or correct accordingly if the route happens to go down the line.
I think it is reasonable to have a limit on the maximum distance to search. E.g. "The closest line that is within 500 metres". And this should help prune the search space.
Coding up an algorithm that searches all of these lines is pretty simple, but it would be slow due to the large number of lines and the geometry calculations. I could limit the search to be limited to those lines within the distance limit. And this could be further optimized by sorting the lines by longitude (say). Of course most lines have a fine longitude "width" which complicates that sort&search.
I still think this will be slow. How do commercial packages do this? Most will find and display a route within half a second or so, and most of that is taken up with the main routing algorithm, and constructing the result (deriving turn directions, etc).
There are a number of similar questions on this site (eg. How to find the nearest point projected on the road network? ) but these refer to specific packages such as PostGIS or ArcGIS. I am looking for an actual algorithm to code up myself. I suspect my main issue is known the words/phrases to search for. Once I've found an algorithm name, then a bit of Googling/etc will quickly find all-sorts of algorithms and papers...
Answer
I see this question had a little bit of activity recently, so I will post my eventual solution. It essentially follows the strategy of my last comment.
There are two types of nodes: those which define junctions between road segments (ie. a graph node); and intermediate nodes which define the shape of the road segment. These are dropped for routing, but I keep them for my closest-location match.
I built a kd tree index of all of these nodes - junctions and intermediates. For the latter I store information about which segment they lie on, and the fraction of the distance along the segment.
Yes this index has to be stored to disk. By using fixed length records, it is possible to search the tree using a binary-chop approach - i.e. without any pointers or internal indexing. The implementation was in C#/.NET, and a cache was also used during reading. Note that the top few levels don't take up much space, and are read the most - so they benefit a lot from the cache. Similarly, it is often the case that successive search locations are in the same part of the tree, so branches 'in that direction' will tend to be cached.
No comments:
Post a Comment