I downloaded yearly Temp/Precip observation by the globe from here: Free open source climate data archive, and I download those for my research case: Terrestrial Air Temperature: 1900-2014 Gridded Monthly Time Series (V 4.01) and Terrestrial Precipitation: 1900-2014 Gridded Monthly Time Series (V 4.01). I need to clip out all data observation record of all German weather stations. However, I intend to extract average annual temperature and annual precipitation for each of these station coordinates of Germany.
But, when I tried to load those data and read them in R, I ran into problems. Respective data archive's website can be found look here. I think data format could be NetCDF4
format based on the description from above web link 4
. I installed raster
and ncdf4
package to read the data in R, but such operations were failed.
Here is how data look like:
list.files("stella/data/air_temp_1980_2014/", recursive = TRUE)
[1] "air_temp.1980" "air_temp.1981" "air_temp.1982" "air_temp.1983"
[5] "air_temp.1984" "air_temp.1985" "air_temp.1986" "air_temp.1987"
[9] "air_temp.1988" "air_temp.1989" "air_temp.1990" "air_temp.1991"
[13] "air_temp.1992" "air_temp.1993" "air_temp.1994" "air_temp.1995"
[17] "air_temp.1996" "air_temp.1997" "air_temp.1998" "air_temp.1999"
[21] "air_temp.2000" "air_temp.2001" "air_temp.2002" "air_temp.2003"
[25] "air_temp.2004" "air_temp.2005" "air_temp.2006" "air_temp.2007"
[29] "air_temp.2008" "air_temp.2009" "air_temp.2010" "air_temp.2011"
[33] "air_temp.2012" "air_temp.2013" "air_temp.2014"
Here is what I did in R:
library(ncdf4)
library(raster)
list.files("stella/data/air_temp_1980_2014/", recursive = TRUE)
nc_open("stella/data/air_temp_1980_2014/air_temp.1980")
raster("stella/data/air_temp_1980_2014/air_temp.1980")
Here is error that raised by Rstudio:
> nc_open("stella/data/air_temp_1980_2014/air_temp.1980")
Error in R_nc4_open: NetCDF: Unknown file format
Error in nc_open("stella/data/air_temp_1980_2014/air_temp.1980") :
Error in nc_open trying to open file stella/data/air_temp_1980_2014/air_temp.1980
> raster("stella/data/air_temp_1980_2014/air_temp.1980")
Error in .local(.Object, ...) : Couldn't determine X spacing
Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer", :
Cannot create a RasterLayer object from this file.
FYI, here I put the data on the fly for the test on your machine, files can be found on the fly
, Here and Here.
How can I read those data in R?
Plus, I only need to pull out all station coordinates of germany from yearly Temp observation for the globe?
How can I make this happen in R?
Any workaround to do this?
Any more thoughts?
How can I read those data correctly in R (as data.frame object in R) ?
How to deal with this data?
Any thoughts?
Answer
These data are not in a netcdf format but, rather a space delimited ASCII format. This data is a bit difficult to deal with because of the lack of headers and any type of unique station identifier.
Returns a vector of file names on disk and pulls associate years from file name.
f <- list.files(getwd())
y <- as.numeric(unlist(lapply(strsplit(f, "[.]"), function(x) {x[2]})))
Reads files, adds associated year and combines into single data.frame.
stations <- data.frame()
for(i in 1:length(f)) {
d <- read.table(f[1], sep = "" , header = FALSE, na.strings = "")
d[,"year"] <- y[i]
names(d) <- c("long","lat", format(ISOdate(2000, 1:12, 1), "%b"), "year")
stations <- rbind(stations, d)
}
stations$ID <- paste(stations[,1], stations[,2], sep="_")
str(stations)
Since there is no unique identifier for a given station, it is difficult to aggregate the duplicate coordinates to each given unique station location. I had to get creative and use a unique concatenated vector of the coordinate pairs. Once you have the data subset to the desired region, you can use merge() or which(x %in% y) to create a subset index, with the ID column, containing the unique coordinate pairs to relate back to the data. However, since the data is by year, in a wide format, you cannot store it as a spatial class object without duplicating the feature geometry. Please give some though on how you are going to analyze this data before just jumping in.
library(sp)
xy <- unique( paste(stations[,1], stations[,2], sep="_") )
stations.xy <- data.frame(long = as.numeric(unlist(lapply(strsplit(xy, "_"), function(x) {x[1]}))),
lat = as.numeric(unlist(lapply(strsplit(xy, "_"), function(x) {x[2]}))))
stations.xy$ID <- xy
coordinates(stations.xy) <- ~long+lat
str(stations.xy@data)
plot(stations.xy)
I am skipping any overlay/subset analysis because it is covered in your previous question. However, this can easily be accomplished using raster::intersect, spatialEco:point.in.poly, sp::over, rgeos::gIntersects, etc...
No comments:
Post a Comment