Monday 26 March 2018

sql - Queries returning very big datasets in PostGIS


I have a PostGIS query that will return several million rows:


SELECT 
t1.id AS id1,

t2.id AS id2,
ABS(t1.mean_h - t2.mean_h) AS h_diff,
ST_Distance(t1.the_geom, t2.the_geom) AS dist
FROM tas_ponds as t1, tas_ponds as t2
WHERE
(t1.gid > t2.gid) AND
ST_DWithin(t1.the_geom, t2.the_geom, 17000)

When run in psql, I get an out of memory for query result error.


Googling suggests that this an error within psql rather than postgres/PostGIS. Would amending the query into the form SELECT ... INTO x FROM ... fix the problem? Are there any other recommended approaches for dealing with very large datasets?




Answer



Some poking around does confirm this is a Postgres client problem, independent of spatial or server considerations: the client has a limited amount of memory to buffer the results before displaying them on the screen, which you're exceeding.


The recommended approach to handle this is to use a DECLARE / FETCH approach to access the data in smaller blocks than the total result set. You could also create a view with components of the query (e.g. distance) to cut down on the memory needed for the query operation itself.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...