Reading and manipulating multiple netcdf files in python

https://stackoverflow.com/questions/22287039

11-06-2023
|

Question

I need help with reading multiple netCDF files, despite few examples in here, none of them works properly. I am using Python(x,y) vers 2.7.5, and other packages : netcdf4 1.0.7-4, matplotlib 1.3.1-4, numpy 1.8, pandas 0.12, basemap 1.0.2...

I have few things I'm used to do with GrADS that I need to start doing them in Python. I have a few 2 meter temperature data (4-hourly data, each year, from ECMWF), each file contains 2 meter temp data, with Xsize=480, Ysize=241, Zsize(level)=1, Tsize(time) = 1460 or 1464 for leap years. These are my files name look alike: t2m.1981.nc, t2m.1982.nc, t2m.1983.nc ...etc.

Based on this page: ( Loop through netcdf files and run calculations - Python or R ) Here is where I am now:

from pylab import *
import netCDF4 as nc
from netCDF4 import *
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np

f = nc.MFDataset('d:/data/ecmwf/t2m.????.nc') # as '????' being the years
t2mtr = f.variables['t2m']

ntimes, ny, nx = shape(t2mtr)
temp2m = zeros((ny,nx),dtype=float64)
print ntimes
for i in xrange(ntimes):
    temp2m += t2mtr[i,:,:] #I'm not sure how to slice this, just wanted to get the 00Z values.
      # is it possible to assign to a new array,...
      #... (for eg.) the average values of  00z for January only from 1981-2000? 

#creating a NetCDF file
nco = nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)

temp2m_v = nco.createVariable('t2m', 'i4',  ( 'y', 'x'))
temp2m_v.units='Kelvin'
temp2m_v.long_name='2 meter Temperature'
temp2m_v.grid_mapping = 'Lambert_Conformal' # can it be something else or ..
#... eliminated?).This is straight from the solution on that webpage.

lono = nco.createVariable('longitude','f8')
lato = nco.createVariable('latitude','f8')
xo = nco.createVariable('x','f4',('x')) #not sure if this is important
yo = nco.createVariable('y','f4',('y')) #not sure if this is important
lco = nco.createVariable('Lambert_Conformal','i4') #not sure

#copy all the variable attributes from original file
for var in ['longitude','latitude']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for lon,lat,x and y
lono=f.variables['longitude'][:]
lato=f.variables['latitude'][:]
#xo[:]=f.variables['x']
#yo[:]=f.variables['y']

#  write the temp at 2 m data
temp2m_v[:,:]=temp2m

# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6' #not sure what is this.
nco.close()

#attempt to plot the 00zJan mean
file=nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','r')
t2mtr=file.variables['t2m'][:]
lon=file.variables['longitude'][:]
lat=file.variables['latitude'][:]
clevs=np.arange(0,500.,10.)
map =   Basemap(projection='cyl',llcrnrlat=0.,urcrnrlat=10.,llcrnrlon=97.,urcrnrlon=110.,resolution='i')
x,y=map(*np.meshgrid(lon,lat))
cs = map.contourf(x,y,t2mtr,clevs,extend='both')
map.drawcoastlines()
map.drawcountries()
plt.plot(cs)
plt.show()

First question is at the temp2m += t2mtr[1,:,:] . I am not sure how to slice the data to get only 00z (let say for January only) of all files.

Second, While running the test, an error came at cs = map.contourf(x,y,t2mtr,clevs,extend='both') saying "shape does not match that of z: found (1,1) instead of (241,480)". I know some error probably on the output data, due to error on recording the values, but I can't figure out what/where .

Thanks for your time. I hope this is not confusing.

Solution

So t2mtr is a 3d array

ntimes, ny, nx = shape(t2mtr)

This sums all values across the 1st axis:

for i in xrange(ntimes):
    temp2m += t2mtr[i,:,:]

A better way to do this is:

temp2m = np.sum(tm2tr, axis=0)
temp2m = tm2tr.sum(axis=0) # alt

If you want the average, use np.mean instead of np.sum.

To average across a subset of the times, jan_times, use an expression like:

jan_avg = np.mean(tm2tr[jan_times,:,:], axis=0)

This is simplest if you want just a simple range, e.g the first 30 times. For simplicity I'm assuming the data is daily and years are constant length. You can adjust things for the 4hr frequency and leap years.

tm2tr[0:31,:,:]

A simplistic way on getting Jan data for several years is to construct an index like:

yr_starts = np.arange(0,3)*365 # can adjust for leap years
jan_times = (yr_starts[:,None]+ np.arange(31)).flatten()
# array([  0,   1,   2, ...  29,  30, 365, ..., 756, 757, 758, 759, 760])

Another option would be to reshape tm2tr (doesn't work well for leap years).

tm2tr.reshape(nyrs, 365, nx, ny)[:,0:31,:,:].mean(axis=1)

You could test the time sampling with something like:

np.arange(5*365).reshape(5,365)[:,0:31].mean(axis=1)

Doesn't the data set have a time variable? You might be able to extract the desired time indices from that. I worked with ECMWF data a number of years ago, but don't remember a lot of the details.

As for your contourf error, I would check the shape of the 3 main arguments: x,y,t2mtr. They should match. I haven't worked with Basemap.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow