Friday 29 July 2016

raster - Cannot properly import MODIS MOD07_L2 .hdf files into R using rgdal


I've been trying to accurately import and process some .hdf files from the MODIS Atmospheric Profile product (MOD07_L2) in R for several days now. Still, there's something going wrong during data import. The below code can be reproduced using one exemplary .hdf file (MOD07_L2.A2013001.0835.006.2013001192145.hdf) that can be downloaded from Dropbox.


library(rgdal)

# Extraction of metadata via `GDALinfo`
filename <- "MOD07_L2.A2013001.0835.006.2013001192145.hdf"
gdalinfo <- GDALinfo(filename, returnScaleOffset = FALSE)

metadata <- attr(gdalinfo, "subdsmdata")

# Extraction of SDS string for parameter 'Skin_Temperature' (formerly 'Surface_Temperature')
sds <- metadata[grep("Skin_Temperature", metadata)[1]]
sds <- sapply(strsplit(sds, "="), "[[", 2)

# Raster import via `readGDAL`
sds.rg <- readGDAL(sds)

So far, so good, but here comes the confusing part:



> summary(sds.rg$band1)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-14870 -14850 -14850 -14840 -14840 -14820 53529

Considering the fact that Skin_Temperature has an official valid range from 150 to 350 K (see MOD07_L2:Format & Content), the mean would inherit a value of


> (-14840 - (-15000)) * 0.01
[1] 1.6

after considering the corresponding add_offset (-15000) and scale_factor (0.01). Note that we're still talking about Kelvin, not °C. Extracting SDS No. 8, i.e. Skin_Temperature, using


library(gdalUtils)

gdal_translate(filename, dst_dataset = "tmp.tif", sd_index = 8)

and opening the resulting file called "tmp.tif" in QGIS resulted in seemingly reliable values centered around 15000, i.e. roughly 27 °C. However, importing "tmp.tif" back into R using raster again resulted in values comparable to the ones shown above:


> summary(raster("tmp.tif"))
tmp
Min. -14867.31
1st Qu. -14848.13
Median -14845.89
3rd Qu. -14840.53
Max. -14819.93

NA's 0.00

I've been searching the internet and stumbled across similar problems related to rgdal. However, when I tried to cast toUnSigned on band 1 of my previously generated 'SpatialGridDataFrame', I received the following error message:


> toUnSigned(sds.rg$band1, 16)
Error in toUnSigned(sds.rg$band1, 16) : band not integer

Apparently, the data imported into R is not even of type integer (what it is supposed to be), but numeric:


> sds.rg$band1[1:5]
[1] NA NA -14839.40 -14840.25 -14839.26


Is there an apparent mistake in my code, or is there any point I miss when importing the .hdf and .tif files using rgdal? I would be extremely grateful for any kind of help.



Answer



The answer is surprisingly simple.


sgr_lst <- readGDAL(sds, as.is = TRUE)

solves the issue. The only thing left to do then is transform the resulting SpatialGridDataFrame to a proper Raster* object using raster(). For this purpose, it is necessary to retrieve the west, east, south, and north bounding coordinates (see ?extent: xmin, xmax, ymin, ymax) from the metadata e.g. via


meta <- attr(gdalinfo, "mdata")

## string patterns
crd_str <- paste0(c("WEST", "EAST", "SOUTH", "NORTH"), "BOUNDINGCOORDINATE")

## search for patterns in metadata
crd_id <- sapply(crd_str, function(i) grep(i, meta))
## extract information
crd <- meta[crd_id]
## create 'extent' object
crd <- as.numeric(sapply(strsplit(crd, "="), "[[", 2))
ext <- extent(crd)

The thus determined extent (together with the EPSG code) can then be passed on to the finally created RasterLayer which, after applying the scale factor and offset, definitely looks fine now.


## rasterize, apply offset and scale factor

rst_lst <- (raster(sgr_lst) - -1.5e+04) * 1.0e-02
## set extent and coordinate reference system
extent(rst_lst) <- crd
projection(rst_lst) <- "+init=epsg:4326"

lst


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...