The basic script in order to iterate recursively over sub-folders and merge all shapefiles into single one is:
#!/bin/bash
consolidated_file="./consolidated.shp"
for i in $(find . -name '*.shp'); do
if [ ! -f "$consolidated_file" ]; then
# first file - create the consolidated output file
ogr2ogr -f "ESRI Shapefile" $consolidated_file $i
else
# update the output file with new file content
ogr2ogr -f "ESRI Shapefile" -update -append $consolidated_file $i
fi
done
Hoverer in vertaully all examples around the web I noticed that for the case where I update the output file, -nln
tag is added, for example:
ogr2ogr -f "ESRI Shapefile" -update -append $consolidated_file $i -nln merged
According to the documentation it says:
Assign an alternate name to the new layer
And I noticed it creates a temporary shapefile called "merged", and in the end of the loop the file is identical to the last shapefile I merged.
I don't understand why I need this? Because I succeeded to merge successfully without this tag.
Answer
For GDAL there are datastores which contain layers. Some datastores, like the database ones or GML, can hold several layers but some others like shapefiles can only contain one layer.
You can test with for example GeoPackage driver what happens if you do not use the -nln switch with a datastore that can contain many layers.
ogr2ogr -f gpkg merged.gpkg a.shp
ogr2ogr -f gpkg -append -update merged.gpkg b.shp
ogrinfo merged.gpkg
INFO: Open of `merged.gpkg'
using driver `GPKG' successful.
1: a (Polygon)
2: b (Polygon)
The shapefile driver does not necessarily need the layer name because if you give the datastore name "a.shp" the driver has logic to see a single layer, named by the basename of the shapefile. Therefore you can add data to "merged.shp" with command:
ogr2ogr -f "ESRI Shapefile" merged.shp a.shp
ogr2ogr -f "ESRI Shapefile" -append -update merged.shp b.shp
However, shapefile driver has also another logic to consider a datastore which name is given without .shp extension as a multi-layer datastore. Practically this means a directory that contains one or more shapefiles as layers. You can test what happens with a command
ogr2ogr -f "ESRI Shapefile" merged a.shp
ogr2ogr -f "ESRI Shapefile" -append -update merged b.shp
Or then you can edit your script slightly to have
consolidated_file="./consolidated"
If you want to append data with ogr2ogr it is compulsory to use the -nln switch with some drivers, including a few which don't support multiple layers. For some other drivers it is not strictly necessary, but using -nln is always safe and fortunately it is used in the examples which you have found. Otherwise we would have a bunch of questions about why merging into shapefiles is successful but merging to other formats just creates new layers.
No comments:
Post a Comment