Friday, 4 September 2015

gdal - How to delete duplicate features from shapefile/OSM XML using code/command line tools only? (NOT QGIS/ArcGis)



I have shapefile with lines, some of them are perfectly overlaps - duplicate, although each one have unique id etc.



  • When I view the file in QGIS I see they overlap.

  • When I convert the file with ogr2osm to osm XML & import to JOSM, and running validation I also get warning of duplicate ways (the lines converted to ways...).


How can I "programmatically" detect and remove such situations? I don't mind doing it in the shapefile or later in the OSM XML.


Note 1 - The attributes (tags) are unique - the actual spatial geometry is duplicate.


Note 2 - I want to use something like ogrinfo, ogr2ogr etc GDAL libraries either as commandline tools or as import library for python/java code. If other code tools exits it's perfect also.



Answer



It might be possible to do with ogrinfo and SQLite/SpatiaLite dialect http://www.gdal.org/ogr_sql_sqlite.html directly with shapefiles but it may be hard to handle the fids/rowids with that route. But if you convert your data into SpatiaLite you can use SQL for finding and deleting the duplicates. With spatialite command line tools https://www.gaia-gis.it/fossil/spatialite-tools/index you can use SQL in scripts.



Convert data from shapefile into SpatiaLite with ogr2ogr


ogr2ogr -f SQLite -dsco spatialite=yes my_data.sqlite my_data.shp

A query for finding the duplicates


SELECT a.ogc_fid AS a_fid, b.ogc_fid AS b_fid 
FROM equal_lines AS a, equal_lines AS b
WHERE ST_Equals(a.geometry,b.geometry)
AND a.ogc_fid!=b.ogc_fid
AND a.ogc_fid


A result will be like


a_fid   b_fid
1 3
5 6
5 7
6 7

Thus 1=3, 5=6, 5=7, and 6=7 (means that 5=6=7). Delete rows which have their ogc_fid listed in the right hand column and you have done.


DELETE FROM equal_lines WHERE ogc_fid IN
(SELECT b.ogc_fid

FROM equal_lines AS a, equal_lines AS b
WHERE ST_Equals(a.geometry,b.geometry)
AND a.ogc_fid!=b.ogc_fid
AND a.ogc_fid

If you really need shapefiles you can export the processed table also with SQL by using the ExportSHP function https://www.gaia-gis.it/gaia-sins/spatialite-sql-latest.html.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...