Friday 19 June 2015

pgrouting - PostGIS: How can I improve the response time for getting the routed distance between addresses?


I need the ability to calculate the driving distance between two addresses for possibly hundreds of thousands of records. I have decided on PostGIS because it can geocode and perform routing all in one system.


Using the following statement, it takes 23 seconds for the response of 45.9 miles:


 SELECT  ( SUM(w.length_m)*0.000621371 )::numeric(10,1) As miles
FROM pgr_dijkstra('SELECT gid AS id, source, target, cost FROM public.ways as d,(SELECT ST_Expand(ST_Extent(the_geom),0.001) as box FROM public.ways as l1 WHERE l1.target =(SELECT id FROM public.ways_vertices_pgr ORDER BY the_geom <-> ST_SetSRID( (SELECT g.geomout FROM geocode(
normalize_address(''2727 South Rockford Rd, Tulsa,OK''),1) AS g), 4326) limit 1 ) OR l1.source = (SELECT id FROM public.ways_vertices_pgr ORDER BY the_geom <-> ST_SetSRID( (SELECT g.geomout FROM geocode(
normalize_address(''1925 Woolaroc Ranch Rd, Bartlesville,OK''),1) AS g), 4326) limit 1 )) as box WHERE d.the_geom && box.box',

(SELECT id FROM public.ways_vertices_pgr ORDER BY the_geom <-> ST_SetSRID( (SELECT g.geomout FROM geocode(
normalize_address('2727 South Rockford Rd, Tulsa,OK'),1) AS g), 4326) limit 1 ) ,(SELECT id FROM public.ways_vertices_pgr ORDER BY the_geom <-> ST_SetSRID( (SELECT g.geomout FROM geocode(
normalize_address('1925 Woolaroc Ranch Rd, Bartlesville,OK'),1) AS g), 4326) limit 1), false) AS r
LEFT JOIN public.ways AS w ON r.edge = w.gid;

This statement returns the driving miles by reducing the calculated area with ST_Extent and geocoding (first normalizing) two addresses passed into pgr_dijkstra.


I have a fresh windows install of PostgreSQL 9.5 and PostGIS 2.2 with Tiger Geocoder extension. I have loaded the oklahoma-latest.osm for pgr with osm2pgrouting and I loaded the Oklahoma tiger data for TigerGeocoder into the same database.


I have added indexes to public.ways.gid and public.ways_vertices_pgr.id after checking other indexes exist (based on other posts), although I don't know if that was necessary.


I have no experience with analyzing SQL performance or indexes, though I have run the postgresql explain/analyze:


 "Aggregate  (cost=7803.02..7803.04 rows=1 width=8) (actual time=23396.253..23396.254 rows=1 loops=1)"

" InitPlan 2 (returns $1)"
" -> Limit (cost=12.79..12.89 rows=1 width=40) (actual time=4558.944..4558.945 rows=1 loops=1)"
" InitPlan 1 (returns $0)"
" -> Function Scan on geocode g (cost=2.50..12.50 rows=1000 width=32) (actual time=4558.659..4558.660 rows=1 loops=1)"
" -> Index Scan using ways_vertices_pgr_gdx on ways_vertices_pgr (cost=0.29..76571.82 rows=783446 width=40) (actual time=4558.941..4558.941 rows=1 loops=1)"
" Order By: (the_geom <-> st_setsrid($0, 4326))"
" InitPlan 4 (returns $3)"
" -> Limit (cost=12.79..12.89 rows=1 width=40) (actual time=7175.975..7175.976 rows=1 loops=1)"
" InitPlan 3 (returns $2)"
" -> Function Scan on geocode g_1 (cost=2.50..12.50 rows=1000 width=32) (actual time=7175.686..7175.687 rows=1 loops=1)"

" -> Index Scan using ways_vertices_pgr_gdx on ways_vertices_pgr ways_vertices_pgr_1 (cost=0.29..76571.82 rows=783446 width=40) (actual time=7175.973..7175.973 rows=1 loops=1)"
" Order By: (the_geom <-> st_setsrid($2, 4326))"
" -> Nested Loop Left Join (cost=0.68..7774.75 rows=1000 width=8) (actual time=23393.245..23395.995 rows=241 loops=1)"
" -> Function Scan on pgr_dijkstra r (cost=0.25..10.25 rows=1000 width=8) (actual time=23393.207..23393.401 rows=241 loops=1)"
" -> Index Scan using ways_gid on ways w (cost=0.42..7.75 rows=1 width=16) (actual time=0.005..0.006 rows=1 loops=241)"
" Index Cond: (r.edge = gid)"
"Planning time: 15.967 ms"
"Execution time: 23396.365 ms"

I'm not sure I am on the right path because I don't fully understand the term "cost" and how it might be used for my purposes. Regarding the statement above, I'd probably not use it for 100000 records but rather for individual "manual" calculations. But 23 seconds is still a long time.



Next, I am looking at using pgr_dijkstra to get the driving miles for a matrix of source (array) and target (array) addresses.



Answer



I followed this guide to ensure that my osm2po data was properly/fully indexed. There were no indexes on the Geometry columns after a fresh load of data.


Most of my undesired performance issues were related to the Tiger Geocoder taking ~13 secs to provide a result after a fresh load of tiger data with 8 states.


I followed this guide for missing indexes and vacuum/analyze which halved the response time to ~6 secs:


And then this guide on configuring postgresql for pretty good performance which got the response time under 2 secs.


I think that value is going to rise though because I will eventually load all the states, so I'll have to also implement restricting the data to the area of interest.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...