gdal - ogr2ogr merge multiple shapefiles: What is the purpose of -nln tag?

Friday, 25 November 2016

gdal - ogr2ogr merge multiple shapefiles: What is the purpose of -nln tag?

The basic script in order to iterate recursively over sub-folders and merge all shapefiles into single one is:

#!/bin/bash
consolidated_file="./consolidated.shp"
for i in $(find . -name '*.shp'); do
    if [ ! -f "$consolidated_file" ]; then
        # first file - create the consolidated output file
        ogr2ogr -f "ESRI Shapefile" $consolidated_file $i
    else
        # update the output file with new file content
        ogr2ogr -f "ESRI Shapefile" -update -append $consolidated_file $i
    fi

done

Hoverer in vertaully all examples around the web I noticed that for the case where I update the output file, -nln tag is added, for example:

ogr2ogr -f "ESRI Shapefile" -update -append $consolidated_file $i -nln merged

According to the documentation it says:

Assign an alternate name to the new layer

And I noticed it creates a temporary shapefile called "merged", and in the end of the loop the file is identical to the last shapefile I merged.

I don't understand why I need this? Because I succeeded to merge successfully without this tag.

Answer

For GDAL there are datastores which contain layers. Some datastores, like the database ones or GML, can hold several layers but some others like shapefiles can only contain one layer.

You can test with for example GeoPackage driver what happens if you do not use the -nln switch with a datastore that can contain many layers.

ogr2ogr -f gpkg merged.gpkg a.shp
ogr2ogr -f gpkg -append -update merged.gpkg b.shp

ogrinfo merged.gpkg
INFO: Open of `merged.gpkg'
      using driver `GPKG' successful.

1: a (Polygon)
2: b (Polygon)

The shapefile driver does not necessarily need the layer name because if you give the datastore name "a.shp" the driver has logic to see a single layer, named by the basename of the shapefile. Therefore you can add data to "merged.shp" with command:

ogr2ogr -f "ESRI Shapefile" merged.shp a.shp
ogr2ogr -f "ESRI Shapefile" -append -update merged.shp b.shp

However, shapefile driver has also another logic to consider a datastore which name is given without .shp extension as a multi-layer datastore. Practically this means a directory that contains one or more shapefiles as layers. You can test what happens with a command

ogr2ogr -f "ESRI Shapefile" merged a.shp
ogr2ogr -f "ESRI Shapefile" -append -update merged b.shp

Or then you can edit your script slightly to have

consolidated_file="./consolidated"

If you want to append data with ogr2ogr it is compulsory to use the -nln switch with some drivers, including a few which don't support multiple layers. For some other drivers it is not strictly necessary, but using -nln is always safe and fortunately it is used in the examples which you have found. Otherwise we would have a bunch of questions about why merging into shapefiles is successful but merging to other formats just creates new layers.

Blog

Friday, 25 November 2016

gdal - ogr2ogr merge multiple shapefiles: What is the purpose of -nln tag?

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?