Thursday 25 February 2016

python - Raster diff: how to check if images have identical values?


Is there a means to check to see if any 2 given raster layers have identical content?


We have a problem on our corporate shared storage volume: it's now so big that it takes over 3 days to conduct a full back up. Preliminary investigation reveals one of the biggest space consuming culprits are on/off rasters that really should be stored as 1-bit layers with CCITT compression.


a typical present/not-present raster


This sample image is currently 2bit (so 3 possible values) and saved as LZW compressed tiff, 11 MB in file system. After converting to 1bit (so 2 possible values) and applying CCITT Group 4 compression we get it down to 1.3 MB, almost a full order of magnitude of savings.



(This is actually a very well behaved citizen, there are others stored as 32bit float!)


This is fantastic news! However there are almost 7,000 images to apply this too. It would be straightforward to write a script to compress them:


for old_img in [list of images]:
convert_to_1bit_and_compress(old_img)
remove(old_img)
replace_with_new(old_img, new_img)

...but it's missing a vital test: is the newly compressed version content-identical?


  if raster_diff(old_img, new_img) == "Identical":
remove(old_img)

rename(new_img, old_img)

Is there a tool or method which can automatically (dis)prove the content of Image-A is value-identical to the content of Image-B?


I have access to ArcGIS 10.2 and QGIS, but am also open to most anything else than can obviate the need to inspect all these images manually to ensure correctness before overwriting. It would be horrible to mistakenly convert and overwrite an image that really did have more than on/off values in it. Most cost thousands of dollars to gather and generate.


a very bad result


update: The biggest offenders are 32bit floats that range up to 100,000px to a side, so ~30GB uncompressed.



Answer



Try converting your rasters to numpy arrays and then check to see if they have the same shape and elements with array_equal. If they are the same, the result should be True:


ArcGIS:


import arcpy, numpy


raster1 = r'C:\path\to\raster.tif'
raster2 = r'C:\path\to\raster.tif'

r1 = arcpy.RasterToNumPyArray(raster1)
r2 = arcpy.RasterToNumPyArray(raster2)

d = numpy.array_equal(r1,r2)

if d == False:

print "They differ"

else:
print "They are the same"

GDAL:


import numpy
from osgeo import gdal

raster1 = r'C:\path\to\raster.tif'

raster2 = r'C:\path\to\raster.tif'

ds1 = gdal.Open(raster1)
ds2 = gdal.Open(raster2)

r1 = numpy.array(ds1.ReadAsArray())
r2 = numpy.array(ds2.ReadAsArray())

d = numpy.array_equal(r1,r2)


if d == False:
print "They differ"

else:
print "They are the same"

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...