st dwithin - Improve performance on a st_dwithin query (in PostGIS)

Thursday, 23 April 2015

st dwithin - Improve performance on a st_dwithin query (in PostGIS)

At the momenent I am working on a query, as described in an earlier question. I have the two tables

testme (tracked GPS-profile with Point Geometry) and

roads (geometry roads shapefile)

Besides the distance of each tracked point (row) in table pt in want to find the closest point using a combination of ST_Distance and ST_DWithin.

DROP TABLE IF EXISTS raw_2015_processed;
EXPLAIN ANALYZE

CREATE TABLE raw_2015_processed AS
SELECT pt.id, 
    pt."DeliveryID",
    pt."VehicleID",
    pt."TrackID",
    pt."Longitude",
    pt."Latitude",
    pt."Altitude",
    pt."Heading",
    pt."Speed",

    pt."Satelites",
    pt."HDOP",
    pt."VDOP",
    pt."Xfcd",
    pt.ts,
    pt.received,
    pt.the_geom,
(SELECT ST_ClosestPoint(line.geom,pt.the_geom) AS closest_geom
    FROM roads AS line 
    WHERE ST_DWithin(line.geom,pt.the_geom, 0.5) LIMIT 1),

(SELECT ST_Distance(line.geom,pt.the_geom) AS distance
    FROM roads AS line
    ORDER BY pt.the_geom <#> line.geom LIMIT 1),
(SELECT ST_AsText(ST_ClosestPoint(line.geom, pt.the_geom)) AS closest_coordinates
    FROM roads AS line
    ORDER BY pt.the_geom <#> line.geom LIMIT 1)
FROM raw_2015 AS pt
ORDER by pt.id;`

Working with a reduced file 'Testme' of 144 rows EXPLAIN ANALYZE returns following Query Plan:

"Sort  (cost=84207.04..84207.40 rows=144 width=131) (actual time=11797.677..11797.685 rows=144 loops=1)"
"  Sort Key: pt.id"
"  Sort Method: quicksort  Memory: 54kB"
"  ->  Seq Scan on testme pt  (cost=0.00..84201.87 rows=144 width=131) (actual time=82.458..11797.268 rows=144 loops=1)"
"        SubPlan 1"
"          ->  Limit  (cost=9.46..578.16 rows=1 width=155) (actual time=81.508..81.508 rows=1 loops=144)"
"                ->  Bitmap Heap Scan on roads line  (cost=9.46..578.16 rows=1 width=155) (actual time=81.388..81.388 rows=1 loops=144)"
"                      Recheck Cond: (geom && st_expand(pt.the_geom, 0.5::double precision))"
"                      Rows Removed by Index Recheck: 0"
"                      Filter: ((pt.the_geom && st_expand(geom, 0.5::double precision)) AND _st_dwithin(geom, pt.the_geom, 0.5::double precision))"

"                      ->  Bitmap Index Scan on geom_index_roads  (cost=0.00..9.46 rows=139 width=0) (actual time=79.722..79.722 rows=450402 loops=144)"
"                            Index Cond: (geom && st_expand(pt.the_geom, 0.5::double precision))"
"        SubPlan 2"

However, when I run the query for a larger dataset with more data points (< 5 Million points) , it gets very slow (i.e. several hours / days). Do you guys see a way to increase the speed of the query? Is there an an alternative option to st_dwithin or a different query structure which can proof to be helpful?

Blog

Thursday, 23 April 2015

st dwithin - Improve performance on a st_dwithin query (in PostGIS)

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?