Saturday 18 June 2016

python - Convert timeseries stack of GTiff raster to single NetCDF


Moving from gdal-dev mailing list:


On Mon, Sep 2, 2013 at 7:09 PM, David Shean wrote:


Hi list, I'm trying to package a timeseries of GTiff rasters with identical projection/extent/resolution as a single NetCDF file for distribution. I've spent the past hour consulting the online doc and playing with gdal_translate, gdalbuildvrt and gdalwarp without any success.


Is there an easy way to do this using existing gdal command line utilities? I figured I'd ask before resorting to a custom solution using the NetCDF Python API.



Thanks. -David


On Tue, Sep 3, 2013 at 10:15 AM, Etienne Tourigny wrote:


what you want is probably outside the scope of gdal. It would require some clever metadata management so that gdal_translate puts them in a single file...


I would advise you convert them all to netcdf using gdal_translate and then use python-netcdf4 (not the one from numpy/scipy) to stack them in the temporal dimension.


On Tue, Sep 3, 2013, at 7:55 AM, "Signell, Richard" wrote:


David, If you post your question on the GIS stackexchange group https://gis.stackexchange.com/ I will provide an example code that should be helpful.


-Rich


====================


Update 9/3/13 17:04 PDT


Here is gdalinfo output for one of my input datasets:




gdalinfo 20120901T2024_align_x+22.19_y+3.68_z+14.97_warp.tif

Driver: GTiff/GeoTIFF
Files: 20120901T2024_align_x+22.19_y+3.68_z+14.97_warp.tif
Size is 10666, 13387
Coordinate System is:
PROJCS["unnamed",
GEOGCS["WGS 84",
DATUM["WGS_1984",

SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]],
PROJECTION["Polar_Stereographic"],
PARAMETER["latitude_of_origin",70],
PARAMETER["central_meridian",-45],
PARAMETER["scale_factor",1],

PARAMETER["false_easting",0],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]]]
Origin = (-211346.063781524338992,-2245136.291794800199568)
Pixel Size = (5.000000000000000,-5.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW

INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( -211346.064,-2245136.292) ( 50d22'39.70"W, 69d23'55.59"N)
Lower Left ( -211346.064,-2312071.292) ( 50d13'22.38"W, 68d48'10.75"N)
Upper Right ( -158016.064,-2245136.292) ( 49d 1'33.33"W, 69d26'16.42"N)
Lower Right ( -158016.064,-2312071.292) ( 48d54'35.06"W, 68d50'27.28"N)
Center ( -184681.064,-2278603.792) ( 49d38' 1.32"W, 69d 7'17.04"N)
Band 1 Block=256x256 Type=Float32, ColorInterp=Gray
NoData Value=-32767


Following up on Luke's suggested approach.


The vrt generation works fine:


gdalbuildvrt -separate newtest.vrt *warp.tif


PROJCS["unnamed",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433],AUTHORITY["EPSG","4326"]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",70],PARAMETER["central_meridian",-45],PARAMETER["scale_factor",1],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]]]
-2.1134606378152434e+05, 5.0000000000000000e+00, 0.0000000000000000e+00, -2.2451362917948002e+06, 0.0000000000000000e+00, -5.0000000000000000e+00

-3.27670000000000E+04


20110619T2024_align_x+15.51_y+1.15_z+12.10_warp.tif
1



-32767



-3.27670000000000E+04


20110802T2024_align_x+16.33_y+2.14_z+12.02_warp.tif
1



-32767


...


But when I attempt to translate to nc, I get the following error:



gdal_translate -of netcdf newtest.vrt newtest.nc

Input file size is 10666, 13387
Warning 1: Variable has 0 dimension(s) - not supported.
0...10...20...30...40...50ERROR 1: netcdf error #-62 : NetCDF: One or more variable sizes violate format constraints .
at (netcdfdataset.cpp,SetDefineMode,1574)


ERROR 1: netcdf error #-39 : NetCDF: Operation not allowed in define mode .
at (netcdfdataset.cpp,IWriteBlock,1435)

ERROR 1: netCDF scanline write failed: NetCDF: Operation not allowed in define mode
ERROR 1: An error occured while writing a dirty block
...ERROR 1: netcdf error #-39 : NetCDF: Operation not allowed in define mode .
at (netcdfdataset.cpp,IWriteBlock,1435)

ERROR 1: netCDF scanline write failed: NetCDF: Operation not allowed in define mode
ERROR 1: netcdf error #-62 : NetCDF: One or more variable sizes violate format constraints .

at (netcdfdataset.cpp,~netCDFDataset,1548)

So upon closer inspection, it appears that gdal is unhappy with the polar stereographic projection I'm using (EPSG:3413). See lines 1570-1582 of netcdfdataset.cpp:


https://code.vpac.org/gitorious/gdal-netcdf-testing/gdal-netcdf-driver/blobs/8fa3582669969ad4d55e461f5846b3ed33727f63/gdal/frmts/netcdf/netcdfdataset.cpp


My projection has a latitude_of_origin specified but no standard parallels as expected by the netcdf driver.



Answer



Here's some python code that does what you want, reading GDAL files that represent data at specific times and writing to a single NetCDF file that is CF-Compliant


#!/usr/bin/env python
'''
Convert a bunch of GDAL readable grids to a NetCDF Time Series.

Here we read a bunch of files that have names like:
/usgs/data0/prism/1890-1899/us_tmin_1895.01
/usgs/data0/prism/1890-1899/us_tmin_1895.02
...
/usgs/data0/prism/1890-1899/us_tmin_1895.12
'''

import numpy as np
import datetime as dt
import os

import gdal
import netCDF4
import re

ds = gdal.Open('/usgs/data0/prism/1890-1899/us_tmin_1895.01')
a = ds.ReadAsArray()
nlat,nlon = np.shape(a)

b = ds.GetGeoTransform() #bbox, interval
lon = np.arange(nlon)*b[1]+b[0]

lat = np.arange(nlat)*b[5]+b[3]


basedate = dt.datetime(1858,11,17,0,0,0)

# create NetCDF file
nco = netCDF4.Dataset('time_series.nc','w',clobber=True)

# chunking is optional, but can improve access a lot:
# (see: http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_choosing_shapes)

chunk_lon=16
chunk_lat=16
chunk_time=12

# create dimensions, variables and attributes:
nco.createDimension('lon',nlon)
nco.createDimension('lat',nlat)
nco.createDimension('time',None)
timeo = nco.createVariable('time','f4',('time'))
timeo.units = 'days since 1858-11-17 00:00:00'

timeo.standard_name = 'time'

lono = nco.createVariable('lon','f4',('lon'))
lono.units = 'degrees_east'
lono.standard_name = 'longitude'

lato = nco.createVariable('lat','f4',('lat'))
lato.units = 'degrees_north'
lato.standard_name = 'latitude'


# create container variable for CRS: lon/lat WGS84 datum
crso = nco.createVariable('crs','i4')
csro.long_name = 'Lon/Lat Coords in WGS84'
crso.grid_mapping_name='latitude_longitude'
crso.longitude_of_prime_meridian = 0.0
crso.semi_major_axis = 6378137.0
crso.inverse_flattening = 298.257223563

# create short integer variable for temperature data, with chunking
tmno = nco.createVariable('tmn', 'i2', ('time', 'lat', 'lon'),

zlib=True,chunksizes=[chunk_time,chunk_lat,chunk_lon],fill_value=-9999)
tmno.units = 'degC'
tmno.scale_factor = 0.01
tmno.add_offset = 0.00
tmno.long_name = 'minimum monthly temperature'
tmno.standard_name = 'air_temperature'
tmno.grid_mapping = 'crs'
tmno.set_auto_maskandscale(False)

nco.Conventions='CF-1.6'


#write lon,lat
lono[:]=lon
lato[:]=lat

pat = re.compile('us_tmin_[0-9]{4}\.[0-9]{2}')
itime=0

#step through data, writing time and data to NetCDF
for root, dirs, files in os.walk('/usgs/data0/prism/1890-1899/'):

dirs.sort()
files.sort()
for f in files:
if re.match(pat,f):
# read the time values by parsing the filename
year=int(f[8:12])
mon=int(f[13:15])
date=dt.datetime(year,mon,1,0,0,0)
print(date)
dtime=(date-basedate).total_seconds()/86400.

timeo[itime]=dtime
# min temp
tmn_path = os.path.join(root,f)
print(tmn_path)
tmn=gdal.Open(tmn_path)
a=tmn.ReadAsArray() #data
tmno[itime,:,:]=a
itime=itime+1

nco.close()


GDAL and NetCDF4 Python can be a bit of a pain to build, but the good news is that they are part of most scientific python distributions (Python(x,y), Enthought Python Distribution, Anaconda, ...)


Update: I haven't done polar stereographic in CF-compliant NetCDF yet, but I should look something like this. Here I've assumed that central_meridian and latitude_of_origin in GDAL are the same as straight_vertical_longitude_from_pole and latitude_of_projection_origin in CF:


#!/usr/bin/env python
'''
Convert a bunch of GDAL readable grids to a NetCDF Time Series.
Here we read a bunch of files that have names like:
/usgs/data0/prism/1890-1899/us_tmin_1895.01
/usgs/data0/prism/1890-1899/us_tmin_1895.02
...

/usgs/data0/prism/1890-1899/us_tmin_1895.12
'''

import numpy as np
import datetime as dt
import os
import gdal
import netCDF4
import re


ds = gdal.Open('/usgs/data0/prism/1890-1899/us_tmin_1895.01')
a = ds.ReadAsArray()
ny,nx = np.shape(a)

b = ds.GetGeoTransform() #bbox, interval
x = np.arange(nx)*b[1]+b[0]
y = np.arange(ny)*b[5]+b[3]


basedate = dt.datetime(1858,11,17,0,0,0)


# create NetCDF file
nco = netCDF4.Dataset('time_series.nc','w',clobber=True)

# chunking is optional, but can improve access a lot:
# (see: http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_choosing_shapes)
chunk_x=16
chunk_y=16
chunk_time=12


# create dimensions, variables and attributes:
nco.createDimension('x',nx)
nco.createDimension('y',ny)
nco.createDimension('time',None)
timeo = nco.createVariable('time','f4',('time'))
timeo.units = 'days since 1858-11-17 00:00:00'
timeo.standard_name = 'time'

xo = nco.createVariable('x','f4',('x'))
xo.units = 'm'

xo.standard_name = 'projection_x_coordinate'

yo = nco.createVariable('y','f4',('y'))
yo.units = 'm'
yo.standard_name = 'projection_y_coordinate'

# create container variable for CRS: x/y WGS84 datum
crso = nco.createVariable('crs','i4')
crso.grid_mapping_name='polar_stereographic'
crso.straight_vertical_longitude_from_pole = -45.

crso.latitude_of_projection_origin = 70.
crso.scale_factor_at_projection_origin = 1.0
crso.false_easting = 0.0
crso.false_northing = 0.0
crso.semi_major_axis = 6378137.0
crso.inverse_flattening = 298.257223563

# create short integer variable for temperature data, with chunking
tmno = nco.createVariable('tmn', 'i2', ('time', 'y', 'x'),
zlib=True,chunksizes=[chunk_time,chunk_y,chunk_x],fill_value=-9999)

tmno.units = 'degC'
tmno.scale_factor = 0.01
tmno.add_offset = 0.00
tmno.long_name = 'minimum monthly temperature'
tmno.standard_name = 'air_temperature'
tmno.grid_mapping = 'crs'
tmno.set_auto_maskandscale(False)

nco.Conventions='CF-1.6'


#write x,y
xo[:]=x
yo[:]=y

pat = re.compile('us_tmin_[0-9]{4}\.[0-9]{2}')
itime=0

#step through data, writing time and data to NetCDF
for root, dirs, files in os.walk('/usgs/data0/prism/1890-1899/'):
dirs.sort()

files.sort()
for f in files:
if re.match(pat,f):
# read the time values by parsing the filename
year=int(f[8:12])
mon=int(f[13:15])
date=dt.datetime(year,mon,1,0,0,0)
print(date)
dtime=(date-basedate).total_seconds()/86400.
timeo[itime]=dtime

# min temp
tmn_path = os.path.join(root,f)
print(tmn_path)
tmn=gdal.Open(tmn_path)
a=tmn.ReadAsArray() #data
tmno[itime,:,:]=a
itime=itime+1

nco.close()

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...