Monday, 6 May 2019

Poor performance with storing large rasters in PostGIS and visualising in QGIS


my question concerns the use and performance of several software tools in conjunction, namely PostgreSQL, PostGIS, QGIS, and GDAL.


I'm a long-time ArcGIS, Python, and R user who is interested in diversifying into the free open source GIS ecosystem and Linux as well. Recently I've been very interested in using QGIS (ver 2.8) together with PostgreSQL (ver 9.4) and PostGIS (ver 2.1), and I've installed the software on a computer with Windows 8.1 x64 (the computer specs in brief: ThinkPad X200s with a 2.1GHz Core 2, 8GB RAM, and a 240GB SSD). Once I learn how to manage my spatial data (~100GB worth), I'd like to run Ubuntu on this machine.


At the moment, I'm simply trying to reliably store and retrieve shapefiles and rasters. So far I've been successful in loading shapefiles into PostGIS, but rasters are proving more problematic. I have successfully completed single and batch imports of small geoTIFF and GRID files, but larger rasters (say, a 15619x14655 cell IMG or TIFF file 870MB in size on disk) take forever to load into PostGIS. I've read and configured the raster2pgsql tool to build spatial indices and load rasters by tiles using these parameters:


raster2pgsql -s 3161 -C -I D:\PostGIS_data\dem.img -t auto raster.dem | psql -h localhost -U postgres -p 5432 -d postgres

Performance in importing is still very poor, and the hardware is not the problem. Visualisation of PostGIS rasters in QGIS is even worse, slowly loading small rasters at best or freezing altogether. Large rasters like the one I mentioned are impossible to visualise in QGIS. From the documentation and forum discussions, this shortcoming appears to be due to GDAL's PostGIS raster driver and not QGIS itself. Forum discussions mention this problem briefly and some even suggest that rasters shouldn't be stored in PostGIS (what is the point in a spatial database that doesn't handle rasters smoothly?). Yet I routinely use ESRI's file geodatabase to store, visualise, and analyse quite large (~70GB) rasters quickly and easily, and ArcGIS 10.1 never freezes or slows due to such routine operations. These performance roadblocks are disappointing and leave me unimpressed with FOSS GIS.



Is there something I'm missing here, a bottleneck I haven't addressed? Does PostgreSQL need tuning to realise the performance benefits of PostGIS? Am I missing a version of GDAL that I need to hunt down and compile? How do I improve PostGIS performance and visualisation in QGIS of shapefiles and rasters especially? How can I enjoy the glory of comprehensive and speedy spatial data management via a Linux terminal? Any help on this issue would be welcome!




I followed this guide by a Duncan Golicher: https://duncanjg.wordpress.com/2012/11/20/the-basics-of-postgis-raster/


I was using tiles with an automatic setting originally, but I reset the tiling to 100x100 cells per row and then included the pyramids as shown in the guide like so:


raster2pgsql -s 3161 -d -C -I -M -l 4 D:\PostGIS_data\dem.img -t 100x100 raster.dem100 | psql -h localhost -U postgres -p 5432 -d postgres

I was able to successfully import the 870MB IMG raster in a good time and display it in QGIS without slowing or crashing the application. The rendering time is not as snappy as I'd expect, but it is acceptable. I will read further on the -l parameter to use it properly.


Incidentally, in importing the dem.img file as the dem100 table another raster table was created called "o_4_dem100". When I import it as a layer in QGIS, it has a value range of between 201 to 524, while the dem100 layer has a range of 36 to 524. Am I right in assuming that this extra table is the pyramid table that has a narrower value range as a result of being aggregated to a lower resolution?




I don't think inadequate hardware is the problem. Here's a brief summary of what I've found so far.



GDAL's PostGIS raster driver has had past performance issues (see here as well). Although these problems were noted in 2012, I wonder whether GDAL 1.11.2 found in QGIS 2.8 still has this problem. Surely there are others using QGIS and PostGIS for raster visualisation and storage?


On a possible related note, I have also had performance issues with opening PostGIS attribute tables in QGIS with tables of ~4.7m records. After a few suggestions in that thread and without fixing the problem, I ultimately filed a bug report with QGIS that was eventually closed and linked to the following similar bug report. Although the bug report is closed, it doesn't seem to be fixed...


To sum up my efforts so far:



  • I have optimised the PostgreSQL server for spatial data.

  • I have built spatial indices for geometry tables and performed a VACUUM.

  • QGIS behaviour for opening large attribute tables (~4.7m records) seems to try reading all records rather than returning a subset for instantaneous viewing. This leads to poor performance.


  • Performance in rendering large PostGIS geometry tables does not seem to be a problem.





  • With raster2pgsql, rasters were indexed, tiled, and imported as raster tables with pyramids in PostGIS.



  • Rasters of any reasonable size are still incredibly slow to import into PostGIS, let alone open and pan around in QGIS.


It's also worth noting that when importing large rasters or opening large attribute tables with PostGIS, memory consumption for raster2pgsql and qgis-bin are over 1GB. As @Michael and @Paul have mentioned in response to my initial question, it appears that PostGIS isn't meant to bring much if any benefits to storing rasters. However, at that point I question why I would run QGIS+PostGIS at all for my GIS needs, especially when ESRI fileGDBs enable raster attributes, mosaic datasets, and other raster operations facilitated by the geodatabase. So maybe either I am really missing something or QGIS and PostGIS do not meet my GIS needs. I find the latter hard to believe.




No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...