netCDF to *.csv without Loops(!)

https://stackoverflow.com/questions/20610620

02-09-2022
|

Question

I'm having some performance and 'ugly code' problem and maybe some of you can help. I have to export data from netCDF-files to *.csv. For this I wrote some python code. Let's take a 3-dim netcdf-File:

def to3dim_csv():
  var = ncf.variables['H2O'] #e.g. data for 'H2O' values
  one,two,three = var.shape #variable dimension shape e.g. (551,42,94)
  dim1,dim2,dim3 = var.dimensions #dimensions e.g. (time,lat,lon)

  if crit is not None:
    bool1 = foo(dim1,crit,ncf) #boolean table: ("value important?",TRUE,FALSE)
    bool2 = foo(dim2,crit,ncf)
    bool3 = foo(dim3,crit,ncf)

  writer.writerow([dim1,dim2,dim3,varn])
  for i in range(one):
    for k in range(two):
      for l in range(three):
        if bool1[i] and bool2[k] and bool3[l]:
          writer.writerow([
                        ncf.variables[dim1][i],
                        ncf.variables[dim2][k],
                        ncf.variables[dim3][l],
                        var[i,k,l],
                        ])
  ofile.close()

  # Sample csv output is like:
  # time,lat,lon,H2O
  # 1,90,10,100
  # 1,90,11,90
  # 1,91,10,101

I want to remove the for val in range(d): blocks. Perhaps using a recursiv function, like:

var = ncf.variables['H2O']
dims = [d for d in var.dimensions]
shapes = [var.variables[d].shape for d in dims]
bools = [bool_table(d,crit,ncf) for d in dims]
dims.append('H2O')
writer.writerow(dims)
magic_function(data)

def magic_function(data):

   [enter code]

   writer.writerow(data)
   magic_function(left_data)

Update: For anyone who is interested. This works instantaneous ...

def data_to_table(dataset, var):
    assert isinstance(dataset,xr.Dataset), 'Dataset must be xarray.Dataset'
    obj = getattr(dataset, var)
    table = np.zeros((obj.data.size, obj.data.ndim+1), dtype=np.object_)
    table[:,0] = obj.data.flat
    for i,d in enumerate(obj.dims):
        repeat = np.prod(obj.data.shape[i+1:])
        tile = np.prod(obj.data.shape[:i])
        dim = getattr(dataset, d)
        dimdata = dim.data
        dimdata = np.repeat(dimdata, repeat)
        dimdata = np.tile(dimdata, tile)
        table[:,i+1] = dimdata.flat
    return table

def export_to_csv(dataset, var, filename, size=None):
    obj = getattr(dataset, var)
    header = [var] + [x for x in obj.dims]
    tabular = data_to_table(dataset, var)
    size = slice(None,size,None) if size else slice(None,None,None)
    with open(filename, 'w') as f:
        writer = csv.writer(f,dialect=csv.excel)
        writer.writerow(header)
        writer.writerows(tabular[size])

Solution

Something like this. Get the indexes of bol1\2\3 and combine them while fetching the relevant values.

    with open('numpy.csv', 'wb') as f:
        out_csv = csv.writer(f)
        header = ['dim1','dim2','dim3','varn']
        out_csv.writerow(header)
        bol1_indices = np.nonzero(bol1)[0]
        bol2_indices = np.nonzero(bol2)[0]
        bol3_indices = np.nonzero(bol3)[0]
        out_csv.writerows(([a[i, k, l], dim1[i], dim2[k], dim3[l]] for i in bol1_indices for k in bol2_indices for  l in bol3_indices))

OTHER TIPS

Doing this in python will always be slow because the raw data is not in the same format that you want to save. Python will have to create the indices and save one value per line. What do you need the csv for? I recommend using ncdump, which converts very quickly to a simple text file. If you must use csv, then you can use the nc2text utility from FAN language utilities (see e.g. this page).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow