Saturday 10 June 2017

Should GDAL be set to produce GeoTIFF files with compression? Which algorithm should be used?


I have a folder of GIS data that consists mainly of GeoTIFF files. The whole set weighs in at about 1.2 GB. I noticed that if I pack the contents into a tarball, it smashes down to about 82 MB. I would like to check the set into a revision control system sot it can be worked on by other people and it looks like there is some space that can be squeezed out.


The GDAL GeoTIFF driver page lists plenty of options that may be used to create compressed GeoTIFF files. There are also plenty of options that affect the way each algorithm works.


The help page does a good job at describing the options but doesn't elaborate on how to select an algorithm or the tradeoffs that are associated with the varying level of compression. This leads to the following questions:





  • The pros of using compression are a dramatic savings in space. What are the cons? Is information lost when the image is compressed?




  • How should one go about choosing an algorithm and compression level. Do some types of images lend themselves to a certain algorithm?





Answer



To select compression method you need to use a command like:


gdal_translate -co "COMPRESS=method" src_dataset dst_dataset


When you use compression biggest trade-off is extra processing time which is required to uncompress the image, and after uncompressing the image would still consume same amount of memory. About information loss there are two basic types of compression:



  • lossless - which preserve original data values

  • lossy - which degrade data to save even more space


You would lossless algorithms when original data values must be preserved, like DEMs, or raster features. Algorithms like PACKBITS, DEFLATE and LZW are lossless and can be ordered according compression ratio:



  1. LZW - highest compression ratio, highest processing power

  2. DEFLATE


  3. PACKBITS - lowest compression ratio, lowest processing power


Compression ratio still depends on data, if the data has a lot of similar values PACKBITS will yield good results.


Contrary to lossless you would use lossy algorithms like JPEG to compress rasters that don't have to return exact values. For instance, orthophotos or satellite imagery can be compressed using lossy algorithms.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...