Wednesday, 29 March 2017

postgresql - How can I optimize pgrouting for speed?


I am using pgrouting on a postgis database created through osm2pgrouting. It performs very good on a limited dataset (3.5k ways, all shortest path A* searches < 20 ms).


However since I have imported a bigger bounding box (122k ways) from europe.osm the performance went down a lot (a shortest path costs around 900ms).


I would think that using A* most of those edges will never be visited as they are out of the way.


What I have done so far in an attempt to improve the speed:



  • Put an index on the geometry column (no noticeable effect)

  • Increased my memory from 8GB to 16GB


  • Change the postgresql memory settings (shared_buffers, effective_cache_size) from (128MB, 128MB) to (1GB, 2GB) (no noticeable effect)


I have a feeling that most of the work is being done in the C Boost library where the graph is being made so optimizing postgresql will not give me much better results. As I do minor changes to the set of rows I select for A* for every search I am a bit afraid that the boost library cannot cache my graph and has to rebuild all the 122k edges every time (even though it will only use a very limited subset every query). And I have no idea how much is spent doing that compared to the actual shortest path search.


Does any of you use pgrouting on a 122k or greater OSM dataset? What performance should I expect? What settings affect the performance most?



Answer



When faced with tasks like this your primary objective is to be rational. Don't change params based on 'gut feeling'. While the gut seems to works for Hollywood it does not for us who live in the real world. Well, at least not my gut ;-).


You should:




  1. establish a usable and repeatable metric (like the time required by a pgrouting query)





  2. save metric results in a spreadsheet and average them (discard best and worst). This will tell you if the changes you are making are going in the right direction




  3. monitor your server using top and vmstat (assuming you're on *nix) while queries are running and look for significant patterns: lots of io, high cpu, swapping, etc. If the cpu is waiting for i/o then try to improve disk performance (this should be easy, see below). If the CPU is instead at 100% without any significant disk acticity you have to find a way to improve the query (this is probably going to be harder).




For the sake of simplicity I assume network is not playing any significant role here.


Improving database performance



Upgrade to the latest Postgres version. Version 9 is so much better that previous versions. It is free so you have no reason not not.


Read the book I recommended already here.


You really should read it. I believe the relevant chapters for this case are 5,6,10,11


Improving disk performance




  1. Get an SSD drive and put the whole database on it. Read performance will most-likely quadruple and write performance should also radically improve




  2. assign more memory to postgres. Ideally you should be able to assign enough memory so that the whole (or the hottest part) can be cached into memory, but not too much so that swapping occurs. Swapping is very bad. This is covered in the book cited in the previous paragraph





  3. disable atime on all the disks (add the noatime options to fstab)




Improving query perfomance


Use the tools described in the book cited above to trace your query/ies and find stops that are worth optimizing.


Update


After the comments I have looked at the source code for the stored procedure


https://github.com/pgRouting/pgrouting/blob/master/core/src/astar.c



and it seems that once the query has been tuned there is not much more room for improvement as the algorithm runs completely in memory (and, unfortunately on only one cpu). I'm afraid your only solution is to find a better/faster algorithm or one that can run multithreaded and then integrate it with postgres either by creating a library like pgrouting or using some middleware to retrieve the data (and cache it, maybe) and feed it to the algorithm.


HTH


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...