Tuesday 30 August 2016

postgis - What are the implications of invalid geometries



I've imported some data in a Postgis database and some of the geometries are reported invalid (ST_IsValidReason reports self-intersection or ring self-intersection).


The queries I am performing don't seem affected by the invalid aspect of these geometries (i'm only using ST_Distance queries).


What are the things that break when geometries are invalid?


Is fixing these geometries "automatically" (buffer(geom, 0) or ST_SimplifyPreserveTopology(geom, 0.0001)) an option?



Answer



Keeping malformed data is a bad idea, because you can never predict when and where will the failure occur. Moreover, malformed data can cause Heisenbugs, the most vicious and illusive type of bugs.


I think that it is a bit pointless to discuss the possible outcome of storing invalid geometries. Having that said, The consequences can include:



  • Wrong results (that is, the ST_Distance will return inaccurate or plain wrong figures)

  • Database performance issues: Keeping malformed data can seriously damage the database performance and create huge log file, because every function call will write an error to the log and disrupted the ordinary database work.


  • Database crashes.

  • Application crashes - either caused by receiving malformed data from the database, or by receiving unreasonable outcome (negative distance, for example).

  • Phantom behaviour (see link above). This is the worst consequence of all. You'll have strange things happening. Slowdowns, data loss, crashes, unreasonable results, long pauses, no responsiveness and many other curses. You might not be able to spot them or reproduce them, because they all fall under the "undefined" category in every documentation.


My advice - if small buffers do not significantly harm your data consistency, use them to prevent any of the above from happening. Keep your data valid.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...