Thursday 23 April 2015

st dwithin - Improve performance on a st_dwithin query (in PostGIS)


At the momenent I am working on a query, as described in an earlier question. I have the two tables



  1. testme (tracked GPS-profile with Point Geometry) and

  2. roads (geometry roads shapefile)


Besides the distance of each tracked point (row) in table pt in want to find the closest point using a combination of ST_Distance and ST_DWithin.


DROP TABLE IF EXISTS raw_2015_processed;
EXPLAIN ANALYZE

CREATE TABLE raw_2015_processed AS
SELECT pt.id,
pt."DeliveryID",
pt."VehicleID",
pt."TrackID",
pt."Longitude",
pt."Latitude",
pt."Altitude",
pt."Heading",
pt."Speed",

pt."Satelites",
pt."HDOP",
pt."VDOP",
pt."Xfcd",
pt.ts,
pt.received,
pt.the_geom,
(SELECT ST_ClosestPoint(line.geom,pt.the_geom) AS closest_geom
FROM roads AS line
WHERE ST_DWithin(line.geom,pt.the_geom, 0.5) LIMIT 1),

(SELECT ST_Distance(line.geom,pt.the_geom) AS distance
FROM roads AS line
ORDER BY pt.the_geom <#> line.geom LIMIT 1),
(SELECT ST_AsText(ST_ClosestPoint(line.geom, pt.the_geom)) AS closest_coordinates
FROM roads AS line
ORDER BY pt.the_geom <#> line.geom LIMIT 1)
FROM raw_2015 AS pt
ORDER by pt.id;`

Working with a reduced file 'Testme' of 144 rows EXPLAIN ANALYZE returns following Query Plan:



"Sort  (cost=84207.04..84207.40 rows=144 width=131) (actual time=11797.677..11797.685 rows=144 loops=1)"
" Sort Key: pt.id"
" Sort Method: quicksort Memory: 54kB"
" -> Seq Scan on testme pt (cost=0.00..84201.87 rows=144 width=131) (actual time=82.458..11797.268 rows=144 loops=1)"
" SubPlan 1"
" -> Limit (cost=9.46..578.16 rows=1 width=155) (actual time=81.508..81.508 rows=1 loops=144)"
" -> Bitmap Heap Scan on roads line (cost=9.46..578.16 rows=1 width=155) (actual time=81.388..81.388 rows=1 loops=144)"
" Recheck Cond: (geom && st_expand(pt.the_geom, 0.5::double precision))"
" Rows Removed by Index Recheck: 0"
" Filter: ((pt.the_geom && st_expand(geom, 0.5::double precision)) AND _st_dwithin(geom, pt.the_geom, 0.5::double precision))"

" -> Bitmap Index Scan on geom_index_roads (cost=0.00..9.46 rows=139 width=0) (actual time=79.722..79.722 rows=450402 loops=144)"
" Index Cond: (geom && st_expand(pt.the_geom, 0.5::double precision))"
" SubPlan 2"

However, when I run the query for a larger dataset with more data points (< 5 Million points) , it gets very slow (i.e. several hours / days). Do you guys see a way to increase the speed of the query? Is there an an alternative option to st_dwithin or a different query structure which can proof to be helpful?




No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...