Question

Using R, I am trying to open all the netcdf files I have in a single folder (e.g 20 files) read a single variable, and create a single data.frame combining the values from all files. I have been using RnetCDF to read netcdf files. For a single file, I read the variable with the following commands:

library('RNetCDF')
nc = open.nc('file.nc')
lw = var.get.nc(nc,'LWdown',start=c(414,315,1),count=c(1,1,240))

where 414 & 315 are the longitude and latitude of the value I would like to extract and 240 is the number of timesteps.

I have found this thread which explains how to open multiple files. Following it, I have managed to open the files using:

 filenames= list.files('/MY_FOLDER/',pattern='*.nc',full.names=TRUE)
 ldf = lapply(filenames,open.nc)

but now I'm stuck. I tried

  var1= lapply(ldf, var.get.nc(ldf,'LWdown',start=c(414,315,1),count=c(1,1,240)))

but it doesn't work. The added complication is that every nc file has a different number of timestep. So I have 2 questions:

1: How can I open all files, read the variable in each file and combine all values in a single data frame? 2: How can I set the last dimension in count to vary for all files?

Was it helpful?

Solution

Following @mdsummer's comment, I have tried a do loop instead and have managed to do everything I needed:

# Declare data frame
df=NULL

#Open all files
files= list.files('MY_FOLDER/',pattern='*.nc',full.names=TRUE)

# Loop over files
for(i in seq_along(files)) {
nc = open.nc(files[i])

# Read the whole nc file and read the length of the varying dimension (here, the 3rd dimension, specifically time)
lw = var.get.nc(nc,'LWdown')
x=dim(lw)

# Vary the time dimension for each file as required
lw = var.get.nc(nc,'LWdown',start=c(414,315,1),count=c(1,1,x[3]))

# Add the values from each file to a single data.frame
rbind(df,data.frame(lw))->df
}

There may be a more elegant way but it works.

OTHER TIPS

You're passing the additional function parameters wrong. You should use ... for that. Here's a simple example of how to pass na.rm to mean.

x.var <- 1:10
x.var[5] <- NA
x.var <- list(x.var)
x.var[[2]] <- 1:10
lapply(x.var, FUN = mean)
lapply(x.var, FUN = mean, na.rm = TRUE)

edit

For your specific example, this would be something along the lines of

var1 <- lapply(ldf, FUN = var.get.nc, variable = 'LWdown', start = c(414, 315, 1), count = c(1, 1, 240))

though this is untested.

I think this is much easier to do with CDO as you can select the varying timestep easily using the date or time stamp, and pick out the desired nearest grid point. This would be an example bash script:

# I don't know how your time axis is
# you may need to use a date with a time stamp too if your data is not e.g. daily
# see the CDO manual for how to define dates. 
date=20090101 
lat=10
lon=50

files=`ls MY_FOLDER/*.nc`
for file in $files ; do
  # select the nearest grid point and the date slice desired:
  # %??? strips the .nc from the file name
  cdo seldate,$date -remapnn,lon=$lon/lat=$lat $file ${file%???}_${lat}_${lon}_${date}.nc
done
Rscript here to read in the files

It is possible to merge all the new files with cdo, but you would need to be careful if the time stamp is the same. You could try cdo merge or cdo cat - that way you can read in a single file to R, rather than having to loop and open each file separately.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top