Thursday, 26 February 2015

python - How to extract specific information from GRIB files?


How to extract fetched GRIB data with Python and gribapi package (link goes to official ECMWF Wiki)? I tried to follow few examples from their "documentation" and do it myself, but I just cannot figure out how to retrieve only specific parameter (e.g. surface temperature or wind) for given latitude and longitude. I don't need to plot the data or visualizations, I just need to pass few parameters when reading the file and get numbers back.


So far, I tried to download original GRIB2 file from www.ftp.ncep.noaa.gov and run few Python API example scripts from mentioned ECMWF Wiki page. Since Python API documentation doesn't exist, I have some hard time to understand which methods should I use in order to:



  • read the file,

  • select only specific "message" (e.g. surface temperature only) if there's more "messages" in a file,


  • print out the value for a specific location (using grib_find_nearest method).


Example code which I have so far, mostly copied from ECMWF examples and this answer from Getting time dimension from GRIB file?:


import traceback
import sys

# hopefully the only external package I need
from gribapi import *

# file with all possible 'messages'

INPUT = '2015121406_gfs.t06z.pgrb2.0p25.f000'
VERBOSE=1 # verbose error reporting

def example():
f = open(INPUT)

# location we are interested in
lat, lon = 64.1353, -21.8952

while 1:

# STEP 1:
# open downloaded GRIB2 file
gid = grib_new_from_file(f)
if gid is None: break

# define the iterator (which is throughout the program the same?)
iterid = grib_iterator_new(gid,0)

# get the result for the nearest location
nearest = grib_find_nearest(gid, lat, lon)[0]


while 1:
# STEP 2:
# ???
#
# the loop goes through the whole file
# instead just selecting messages we need beforehand...
# fictional function which selects
# 'TMP' (temperature) on 'surface' messages only
# without need to iterate all of them:

#
# result = grib_select_specific_message(gid, 'TMP', 'surface')

result = grib_iterator_next(iterid)
if not result: break

# STEP 3:
# profit!
# result variable returns numbers which I process in any way I need, yay!


# more undocumented stuff,
# put here just because examples do it the same way
grib_iterator_delete(iterid)
grib_release(gid)

f.close()

# main program function
def main():
try:

example()
except GribInternalError,err:
if VERBOSE:
traceback.print_exc(file=sys.stderr)
else:
print >>sys.stderr,err.msg

return 1

# run the program

if __name__ == "__main__":
sys.exit(main())

The problem with this code is that it iterates trough the whole file, all the messages and all possible coordinates (whole planet, 0.25 degrees precision). If file is 300+ MB it takes a while to read it. I feel like using "iterator" here (whatever that is) is a bit wrong since I need to "cherry-pick" only specific information (few weather parameters for a particular location only) and grib_find_nearest() function doesn't need an iterator at all to select only part of the data.
This could be half-solved with filtering the data I need before downloading the file (results in much smaller file) but I'm still not sure how to do it (part of Downloading GRIB GFS files with specific filters?), but still I'd like to figure this out since it might happen that I will occasionally have to download "full" files.



Answer



I eventually ended up with ditching gribapi and switching to Met Office's Iris Python package which solves my problem in a very elegant way. Although it was a bit of a pain to install it (dependencies could be really tricky) at least the documentation is good and it is really easy to use it.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...