Friday 25 November 2016

gdal - ogr2ogr merge multiple shapefiles: What is the purpose of -nln tag?


The basic script in order to iterate recursively over sub-folders and merge all shapefiles into single one is:



#!/bin/bash
consolidated_file="./consolidated.shp"
for i in $(find . -name '*.shp'); do
if [ ! -f "$consolidated_file" ]; then
# first file - create the consolidated output file
ogr2ogr -f "ESRI Shapefile" $consolidated_file $i
else
# update the output file with new file content
ogr2ogr -f "ESRI Shapefile" -update -append $consolidated_file $i
fi

done

Hoverer in vertaully all examples around the web I noticed that for the case where I update the output file, -nln tag is added, for example:


ogr2ogr -f "ESRI Shapefile" -update -append $consolidated_file $i -nln merged

According to the documentation it says:



Assign an alternate name to the new layer



And I noticed it creates a temporary shapefile called "merged", and in the end of the loop the file is identical to the last shapefile I merged.



I don't understand why I need this? Because I succeeded to merge successfully without this tag.



Answer



For GDAL there are datastores which contain layers. Some datastores, like the database ones or GML, can hold several layers but some others like shapefiles can only contain one layer.


You can test with for example GeoPackage driver what happens if you do not use the -nln switch with a datastore that can contain many layers.


ogr2ogr -f gpkg merged.gpkg a.shp
ogr2ogr -f gpkg -append -update merged.gpkg b.shp

ogrinfo merged.gpkg
INFO: Open of `merged.gpkg'
using driver `GPKG' successful.

1: a (Polygon)
2: b (Polygon)

The shapefile driver does not necessarily need the layer name because if you give the datastore name "a.shp" the driver has logic to see a single layer, named by the basename of the shapefile. Therefore you can add data to "merged.shp" with command:


ogr2ogr -f "ESRI Shapefile" merged.shp a.shp
ogr2ogr -f "ESRI Shapefile" -append -update merged.shp b.shp

However, shapefile driver has also another logic to consider a datastore which name is given without .shp extension as a multi-layer datastore. Practically this means a directory that contains one or more shapefiles as layers. You can test what happens with a command


ogr2ogr -f "ESRI Shapefile" merged a.shp
ogr2ogr -f "ESRI Shapefile" -append -update merged b.shp


Or then you can edit your script slightly to have


consolidated_file="./consolidated"

If you want to append data with ogr2ogr it is compulsory to use the -nln switch with some drivers, including a few which don't support multiple layers. For some other drivers it is not strictly necessary, but using -nln is always safe and fortunately it is used in the examples which you have found. Otherwise we would have a bunch of questions about why merging into shapefiles is successful but merging to other formats just creates new layers.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...