Wednesday, 25 December 2019

gdal - gdal_merge'd files take more space


I have two 3.6 MB GeoTIFF compressed with LZW. When I merge them with:


gdal_merge.py *.tif -o ~/out.tif  -co COMPRESS=LZW -co BIGTIFF=YES

The files is 10x larger at 31 MB.


Is there some reason the filesize would increase when merging? They are a continuous region (all touching each other), in EPSG 3413 projection.



This question was asked here (How to merge raster scenes while maintaining small file size?) but not answered. That OP selected the "use compression" answer, but my compressed file is larger. I'm posting gdalinfo for one input file (first) and the final file (below).


NOTE Oddly, when I do this for >2 files, or the whole folder which is 800 MB, it is 8.4 GB uncompress, and LARGER, or 11 GB with LZW compression.


Driver: GTiff/GeoTIFF
Files: gimpdem0_0_v1.1.tif
Size is 8310, 15000
Coordinate System is:
PROJCS["unnamed",
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,

AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]],
PROJECTION["Polar_Stereographic"],
PARAMETER["latitude_of_origin",70],
PARAMETER["central_meridian",-45],
PARAMETER["scale_factor",1],
PARAMETER["false_easting",1],

PARAMETER["false_northing",1],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]]]
Origin = (-640000.000000000000000,-2905550.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND

Corner Coordinates:
Upper Left ( -640000.000,-2905550.000) ( 57d25'19.50"W, 63d 1'28.29"N)
Lower Left ( -640000.000,-3355550.000) ( 55d47'53.80"W, 59d11'57.96"N)
Upper Right ( -390700.000,-2905550.000) ( 52d39'30.45"W, 63d24'19.24"N)
Lower Right ( -390700.000,-3355550.000) ( 51d38'28.63"W, 59d31'30.22"N)
Center ( -515350.000,-3130550.000) ( 54d20'53.46"W, 61d18'11.32"N)
Band 1 Block=8310x1 Type=Int16, ColorInterp=Gray

And the merged file:


Driver: GTiff/GeoTIFF

Files: /Users/kdm/Desktop/20171011-204436.tif
Size is 8310, 30000
Coordinate System is:
PROJCS["unnamed",
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],

UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]],
PROJECTION["Polar_Stereographic"],
PARAMETER["latitude_of_origin",70],
PARAMETER["central_meridian",-45],
PARAMETER["scale_factor",1],
PARAMETER["false_easting",1],
PARAMETER["false_northing",1],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]]]

Origin = (-640000.000000000000000,-2455550.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( -640000.000,-2455550.000) ( 59d36'29.72"W, 66d52'51.74"N)
Lower Left ( -640000.000,-3355550.000) ( 55d47'53.80"W, 59d11'57.96"N)

Upper Right ( -390700.000,-2455550.000) ( 54d 2'25.84"W, 67d20' 7.02"N)
Lower Right ( -390700.000,-3355550.000) ( 51d38'28.63"W, 59d31'30.22"N)
Center ( -515350.000,-2905550.000) ( 55d 3'28.16"W, 63d14'13.92"N)
Band 1 Block=8310x1 Type=Int16, ColorInterp=Gray

Answer



To put the comments into an answer:


It looks like the processing was doing a bad job at predicting the optimal compression? Coupled with no tiling there was really inefficient storage.


The solution was to use the geotiff creation options PREDICTOR=2 & TILED=YES.


The final command to call is then


gdal_merge.py *.tif -o ~/out.tif  \

-co COMPRESS=LZW -co BIGTIFF=YES -co PREDICTOR=2 -co TILED=YES

EDIT


An explanation about what happens, taken from http://www.fileformat.info/format/tiff/corion-lzw.htm



The TIFF predictor algorithm


The idea is to make use of the fact that many continuous tone images rarely vary much in pixel value from one pixel to the next. In such images, if we replace the pixel values by differences between consecutive pixels, many of the differences should be 0, plus or minus 1, and so on. This reduces the apparent information content, and thus allows LZW to encode the data more compactly.



No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...