Monday, 11 February 2019

qgis - Reading d NetCDF4 format climate data observation in R?



I downloaded yearly Temp/Precip observation by the globe from here: Free open source climate data archive, and I download those for my research case: Terrestrial Air Temperature: 1900-2014 Gridded Monthly Time Series (V 4.01) and Terrestrial Precipitation: 1900-2014 Gridded Monthly Time Series (V 4.01). I need to clip out all data observation record of all German weather stations. However, I intend to extract average annual temperature and annual precipitation for each of these station coordinates of Germany.


But, when I tried to load those data and read them in R, I ran into problems. Respective data archive's website can be found look here. I think data format could be NetCDF4 format based on the description from above web link 4. I installed raster and ncdf4 package to read the data in R, but such operations were failed.


Here is how data look like:


list.files("stella/data/air_temp_1980_2014/", recursive = TRUE)

[1] "air_temp.1980" "air_temp.1981" "air_temp.1982" "air_temp.1983"
[5] "air_temp.1984" "air_temp.1985" "air_temp.1986" "air_temp.1987"
[9] "air_temp.1988" "air_temp.1989" "air_temp.1990" "air_temp.1991"
[13] "air_temp.1992" "air_temp.1993" "air_temp.1994" "air_temp.1995"
[17] "air_temp.1996" "air_temp.1997" "air_temp.1998" "air_temp.1999"

[21] "air_temp.2000" "air_temp.2001" "air_temp.2002" "air_temp.2003"
[25] "air_temp.2004" "air_temp.2005" "air_temp.2006" "air_temp.2007"
[29] "air_temp.2008" "air_temp.2009" "air_temp.2010" "air_temp.2011"
[33] "air_temp.2012" "air_temp.2013" "air_temp.2014"

Here is what I did in R:


library(ncdf4)
library(raster)
list.files("stella/data/air_temp_1980_2014/", recursive = TRUE)


nc_open("stella/data/air_temp_1980_2014/air_temp.1980")
raster("stella/data/air_temp_1980_2014/air_temp.1980")

Here is error that raised by Rstudio:


> nc_open("stella/data/air_temp_1980_2014/air_temp.1980")
Error in R_nc4_open: NetCDF: Unknown file format
Error in nc_open("stella/data/air_temp_1980_2014/air_temp.1980") :
Error in nc_open trying to open file stella/data/air_temp_1980_2014/air_temp.1980



> raster("stella/data/air_temp_1980_2014/air_temp.1980")
Error in .local(.Object, ...) : Couldn't determine X spacing

Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer", :
Cannot create a RasterLayer object from this file.

FYI, here I put the data on the fly for the test on your machine, files can be found on the fly, Here and Here.


How can I read those data in R?


Plus, I only need to pull out all station coordinates of germany from yearly Temp observation for the globe?


How can I make this happen in R?



Any workaround to do this?


Any more thoughts?




How can I read those data correctly in R (as data.frame object in R) ?


How to deal with this data?


Any thoughts?



Answer



These data are not in a netcdf format but, rather a space delimited ASCII format. This data is a bit difficult to deal with because of the lack of headers and any type of unique station identifier.


Returns a vector of file names on disk and pulls associate years from file name.


f <- list.files(getwd())

y <- as.numeric(unlist(lapply(strsplit(f, "[.]"), function(x) {x[2]})))

Reads files, adds associated year and combines into single data.frame.


stations <- data.frame()
for(i in 1:length(f)) {
d <- read.table(f[1], sep = "" , header = FALSE, na.strings = "")
d[,"year"] <- y[i]
names(d) <- c("long","lat", format(ISOdate(2000, 1:12, 1), "%b"), "year")
stations <- rbind(stations, d)
}

stations$ID <- paste(stations[,1], stations[,2], sep="_")
str(stations)

Since there is no unique identifier for a given station, it is difficult to aggregate the duplicate coordinates to each given unique station location. I had to get creative and use a unique concatenated vector of the coordinate pairs. Once you have the data subset to the desired region, you can use merge() or which(x %in% y) to create a subset index, with the ID column, containing the unique coordinate pairs to relate back to the data. However, since the data is by year, in a wide format, you cannot store it as a spatial class object without duplicating the feature geometry. Please give some though on how you are going to analyze this data before just jumping in.


library(sp) 
xy <- unique( paste(stations[,1], stations[,2], sep="_") )
stations.xy <- data.frame(long = as.numeric(unlist(lapply(strsplit(xy, "_"), function(x) {x[1]}))),
lat = as.numeric(unlist(lapply(strsplit(xy, "_"), function(x) {x[2]}))))
stations.xy$ID <- xy
coordinates(stations.xy) <- ~long+lat


str(stations.xy@data)
plot(stations.xy)

I am skipping any overlay/subset analysis because it is covered in your previous question. However, this can easily be accomplished using raster::intersect, spatialEco:point.in.poly, sp::over, rgeos::gIntersects, etc...


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...