I've imported some data in a Postgis database and some of the geometries are reported invalid (ST_IsValidReason reports self-intersection or ring self-intersection).
The queries I am performing don't seem affected by the invalid aspect of these geometries (i'm only using ST_Distance queries).
What are the things that break when geometries are invalid?
Is fixing these geometries "automatically" (buffer(geom, 0) or ST_SimplifyPreserveTopology(geom, 0.0001)) an option?
Answer
Keeping malformed data is a bad idea, because you can never predict when and where will the failure occur. Moreover, malformed data can cause Heisenbugs, the most vicious and illusive type of bugs.
I think that it is a bit pointless to discuss the possible outcome of storing invalid geometries. Having that said, The consequences can include:
- Wrong results (that is, the
ST_Distance
will return inaccurate or plain wrong figures) - Database performance issues: Keeping malformed data can seriously damage the database performance and create huge log file, because every function call will write an error to the log and disrupted the ordinary database work.
- Database crashes.
- Application crashes - either caused by receiving malformed data from the database, or by receiving unreasonable outcome (negative distance, for example).
- Phantom behaviour (see link above). This is the worst consequence of all. You'll have strange things happening. Slowdowns, data loss, crashes, unreasonable results, long pauses, no responsiveness and many other curses. You might not be able to spot them or reproduce them, because they all fall under the "undefined" category in every documentation.
My advice - if small buffers do not significantly harm your data consistency, use them to prevent any of the above from happening. Keep your data valid.
No comments:
Post a Comment