Saturday, 16 February 2019

gdal - File size inflation normal with gdalwarp?


After using gdalwarp to project and align-to-grid (via -tap) a number of rasters I noticed that the output rasters were significantly larger than the original rasters. A fairly thorough web search turned up this Trac issue:




Frank Warmerdam explained the reason:


"On careful review, the difference in the file in question is because gdal_translate uses the TIFFWriteScanline() interface to write the output file from within GTiffDataset::CreateCopy?(), and this only writes as much of the final 'strip' of the file as is required to complete image area. But gdalwarp goes through the blockio interface which writes the complete final strip, even the portion that falls off the end of the file."



This Trac issue is ~7 years old, however, and I know some changes to the GDAL utilities, including gdalwarp have been made since. I'd like to know if the above reasoning still holds and if the file size inflation I'm seeing is "normal." The word "normal" here might be taken to mean unsurprising or expected but, more importantly: is there anything that can be done to mitigate the effects i.e. reduce the output raster file size? Below is a table of the file size inflation I'm experiencing.


Input File Size (bytes)     Output File Size (bytes)    Inflation
1437380431 1698334217 18%
1428001178 1698334433 19%
41683165 137036637 228%

The input TIFF files were created in ArcGIS and thus have external Worldfiles, XML and DBF files but these do not make up the difference in file size. Here is a sample gdalwarp call as I've used it in all of these cases; the actual execution was handled by a Python subprocess (subprocess.Popen):



$ gdalwarp -tap -tr 30 30 -t_srs "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs" -co "COMPRESS=LZW" input_file.tif output_file.tif

I understand that in rare cases compression makes a larger file, but the effect is the same without the LZW compression. The ratios in the table are with LZW compression.



Answer



It's a well known and longstanding issue that gdalwarp doesn't deal with compression well. The solution is to gdalwarp without compression then gdal_translate with compression.


To avoid two lengthy processes, gdalwarp to VRT first, it's really quick, then gdal_translate with the -co compress=lzw option.


i.e.


$ gdalwarp -tap -tr 30 30 -t_srs "etc..." -of vrt input_file.tif output_file.vrt
$ gdal_translate -co compress=LZW output_file.vrt output_file.tif


If using GDAL 2x you can combine this into a single operation by writing the VRT to /vsistdout and piping that to gdal_translate and specifying /vsistdin as the input. For example:


gdalwarp -q -t_srs EPSG:32611 -of vrt input_file.tif /vsistdout/ | gdal_translate -co compress=lzw  /vsistdin/ output_file.tif

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...