Extending a series of nonuniform netcdf data in a numpy array

https://stackoverflow.com/questions/2642951

27-09-2019
|

문제

I am new to python, apologies if this has been asked already.

Using python and numpy, I am trying to gather data across many netcdf files into a single array by iteratively calling append().

Naively, I am trying to do something like this:

from numpy import *
from pupynere import netcdf_file

x = array([])
y = [...some list of files...]

for file in y:
    ncfile = netcdf_file(file,'r')
    xFragment = ncfile.variables["varname"][:]
    ncfile.close()
    x = append(x, xFragment)

I know that under normal circumstances this is a bad idea, since it reallocates new memory on each append() call. But two things discourage preallocation of x:

1) The files are not necessarily the same size along axis 0 (but should be the same size along subsequent axes), so I would need to read the array sizes from each file beforehand to precalculate the final size of x.

However...

2) From what I can tell, pupynere (and other netcdf modules) load the entire file into memory upon opening the file, rather than just a reference (such as many netcdf modules in other enviroments). So to preallocate, I'd have to open the files twice.

There are many (>100) large (>1GB) files, so overallocating and reshaping is not practical, from what I can tell.

My first question is whether I am missing some intelligent way to preallocate.

My second question is more serious. The above snippet works for a single-dimension array. But if I try to load in a matrix, then initialisation becomes a problem. I can append a one-dimensional array to an empty array:

append( array([]), array([1, 2, 3]) )

but I cannot append an empty array to a matrix:

append( array([]), array([ [1, 2], [3, 4] ]), axis=0)

Something like x.extend(xFragment) would work, I believe, but I don't think numpy arrays have this functionality. I could also avoid the initialisation problem by treating the first file as a special case, but I'd prefer to avoid that if there's a better way to do it.

If anyone can offer help or a suggestion, or can identify a problem with my approach, then I'd be grateful. Thanks

해결책

You can solve the two problems by first loading the arrays from the files files into a list of arrays, and then using concatenate to join all the arrays. Something like this:

x = [] # a normal python list, not np.array
y = [...some list of files...]

for file in y:
    ncfile = netcdf_file(file,'r')
    xFragment = ncfile.variables["varname"][:]
    ncfile.close()
    x.append(xFragment)

combined_array = concatenate(x, axis=0)

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow