Monday 9 January 2017

How should I manage PostGIS Raster data with different projections?


I have a requirement to store and manage archaeological geophysics data that is collected as a rectangular array of samples -- a raster image.



  • Each raster will usually 20x20 or 30x30 floating-point samples, typically sampled at 1m intervals.

  • A survey will consist of one or more of these images in a given location.

  • It is possible that two different surveys may take place in different countries, or areas that use different projections, but each survey will use one and only one projection.


  • They're never likely to be viewed together, each survey will usually sit by itself.

  • The data will only be accessed by a custom front-end, so there will be no users getting direct control of it through psql or similar.

  • Each sample needs to be stored as it was collected, so I can't reproject it into a common CRS such as Web Mercator because one sample could end up covering more or less area than in the original projection, and analysis will need to be performed on the data.


How should I best store the data in a PostGIS Raster database? The options I have come up with are:



  1. Ignore SRID constraints and store all the data in one table, writing my front-end code to deal with manipulating the data in a consistent manner.

  2. Store all the data in one table, and rewrite the SRID constraint as a compound of SRID and survey ID.

  3. Through table inheritance, create a new table for each new SRID.

  4. Through table inheritance, create a new table for each survey.



1 and 2 break some of the nice automated parts of PostGIS, but will be otherwise hidden in front-end code. But queries will probably take slightly longer.


3 and 4 could end up with an explosion of tables that would make it harder to manage FK constraints and so on.


Practically, the number of rasters per survey is anywhere from 1 to 100 or more, and the number of surveys is likely to run into the hundreds. But the number of distinct projections is likely to remain very low, which favours 3.



Answer



I have given your question a thought and ultimately came to the conclusion that I would store each survey into its own database.


NOTE: by database I mean a database created inside a single postgres database cluster as per the postgres terminology given here, not an entirely separate postgres process with its own users,template1,etc.


While this might sound overkill it, in fact, offers several advantages:





  • manageability: each survey has only one raster table with its srid which allows you to adhere as much as possible to the postgis standards of managing data (ie: no messing with the raster_columns table or FKs or constraints. All postgis functions still work as expected)




  • simplicity: as long as you adopt and enforce a coherent naming strategy like: call each db as srvy_name and then reuse the same name (ie surveydata) for all the raster tables and columns. If you're so keen (I know I would ;-)) you could also add a metadata table to each database describing what kind of data is stored into that database, when it was last updated and so on. Scripting/querying a database structure with such coherent naming would be easy (and pleasant).




  • it works as per you requirements, as long as each survey uses its own srid




  • scalability: it scales because you could move databases (by allocating them on different tablespaces) onto different spindles (or disks, pools, lun, depending on the storage vendor) so that I/O can be parallelized. It would be much more difficult to move out tables from the same database to different disks





  • security: you can grant different permissions to different surveys by exploiting database security (as an additional layer above application)




  • tested: there have been reports of postgres handling thousands of databases on a single instance, see this for a reference




  • [this has to be tested, I know it works for geometries, don't know for rasters] you can still query (and transform) all rasters at once by creating views like the following:





create or replace view v_all_surveys_as_wgs84 as select ST_Transform(raster, 4326) as raster_wgs84 from srvy_number1.rasterdata union all select ST_Transform(raster, 4326) as raster_wgs84 from srvy_number2.rasterdata [...]


One possible argument against is that this setup is complex, but I would argue back that it is instead very simple to replicate once the first database has been established and then it can be completely managed in scripting if proper naming policy is put in place.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...