I'd like to parse MOD13Q1 VI quality layers in R. My original code must have been incorrect, because it gave lots of "deep ocean" flags in the middle of Brazil. After the changes suggested by Mikkel, that's no longer the case. However I do see some VI usefulness values that don't appear in the documentation. I'm not sure whether that's a problem or whether the documentation is incomplete -- will explain further below. Here's my code, after the change suggested by Mikkel:
library(raster)
library(rgdal)
## See http://modis-land.gsfc.nasa.gov/MODLAND_grid.html: h11v09 should be entirely non-ocean
filename <- "MOD13Q1.A2013001.h11v09.005.2013018041108.250m_16_days_VI_Quality.tif"
qual <- raster(file.path("data", "modis", filename))
range(qual[]) # 2061 to 65535; only bits 0-15 have meaning and sum(2^(0:15)) = 65535
first_k_bits <- function(int, k=16, reverse=T) {
## https://lpdaac.usgs.gov/products/modis_products_table/mod13q1 TABLE 2:
## MOD13Q1 VI Quality: "Bit 0 is the least significant (read bit words right to left)"
integer_vector <- as.integer(intToBits(int))[1:k]
if(reverse) integer_vector <- rev(integer_vector)
return(paste(as.character(integer_vector), collapse=""))
}
df <- data.frame(bits=sapply(qual[], function(x) first_k_bits(x, k=16, reverse=T)))
df$bits[qual[] == 65535] <- NA # sum(2^(0:15)) = 65535, i.e. all bits are 1
df$quality <- substring(df$bits, 15, 16)
df$usefulness <- substring(df$bits, 11, 14)
df$land_water <- substring(df$bits, 3, 5)
table(df$quality)
round(table(df$quality) / nrow(df), 3) # 0.46 "Pixel produced, but most probably cloudy"
table(df$usefulness) # Zero cases of "0000" i.e. "highest quality"
usefulness_in_pdf <- c("0000", # Highest quality
"0001",
"0010",
## 0011 does not appear in table
"0100",
## 0101 does not appear in table
## 0110 does not appear in table
## 0111 does not appear in table
"1000",
"1001",
"1010",
## 1011 does not appear in table
"1100", # Lowest quality
"1101", # Quality so low that it is not useful
"1110", # L1B data faulty
"1111") # Not useful for any other reason/not processed
observed_usefulness <- unique(df$usefulness)
observed_usefulness[!observed_usefulness %in% usefulness_in_pdf] # NA 0011 0101 0110 0111
mean(!df$usefulness %in% usefulness_in_pdf) # 0.57
mean(!df$usefulness %in% usefulness_in_pdf & !is.na(df$usefulness)) # 0.47
## Look at land/water mask bits -- should be no "deep ocean" in the middle of Brazil
table(df$land_water) # Looks reasonable
round(table(df$land_water) / nrow(df), 2) # .89 land, .01 coastlines, .01 shallow water
land_raster <- raster(extent(qual))
dim(land_raster) <- dim(qual)
land_raster[] <- df$land_water == "001" # Land (nothing else but land)
png("land_raster.png")
plot(land_raster)
dev.off()
For the documentation (which is where I get the usefulness_in_pdf values), see pages 15-16 of http://vip.arizona.edu/documents/MODIS/MODIS_VI_UsersGuide_01_2012.pdf, or https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mod13q1 TABLE 2: MOD13Q1 VI Quality.
The values of land / water look correct. Here's the land raster I generate at the end of the code:
The only worrying thing is that I see values of VI usefulness in the data that don't appear in the documentation. VI usefulness has 4 bits, hence 16 possible values, but the tables in the documentation (https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mod13q1, click on layers, look at TABLE 2: MOD13Q1 VI Quality) list only 11 values. Why is that?
Edit: I'm on a little-endian machine:
Following https://serverfault.com/questions/163487/linux-how-to-tell-if-system-is-big-endian-or-little-endian, it looks like my machine is little-endian:
$ echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c6
1
Edit: a specific example in case it's helpful:
> qual[10^6]
2062
> intToBits(qual[10^6])
[1] 00 01 01 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
[26] 00 00 00 00 00 00 00
> as.integer(intToBits(qual[10^6]))
[1] 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> first_k_bits(qual[10^6], k=16, reverse=F)
[1] "0111000000010000"
> sum(2^(0:15) * as.integer(intToBits(qual[10^6]))[1:16])
[1] 2062
Answer
Just a tiny error, as far as I can see. Your substrings are incorrect. This can be seen by comparing the result from a 'which(df$bits="0000100001000100")' with a number of observed unique values, which can be seen in ArcGIS when colouring the tif-file by unique values. 00001000 01000100 = 2116, and there are 3891233 of that number in both ArcGIS and R. This tells us that everything up to, and including the construction of the data.frame went well.
Next step is where it goes wrong. The problem arises from combining the description of the QA layer and the substring function. The order of the bits are as one would expect, but the numbering in the description reads from right to left, while substring works from left to right.
From your own example: (2062)Base10 = (0000100000001110)Base2 - not (0111000000010000)Base2. You have to reverse it or adapt your mind to reverse the information in the MODIS-QC layer. If you don't reverse it, and adapt your substring, you will have to consider that the structure of the string doesn't match that of the MODIS-QC information. Main thing becomes that the structure of multi-bit flags are reversed.
Easier solution is to reverse and adapt the substring as below:
Go from:
df$quality <- substring(df$bits, 1, 2)
df$usefulness <- substring(df$bits, 3, 6)
df$land_water <- substring(df$bits, 12, 14)
To:
df$quality <- substring(df$bits, 15, 16)
df$usefulness <- substring(df$bits, 11, 14)
df$land_water <- substring(df$bits, 3, 5)
No comments:
Post a Comment