Question

I am trying to convert a .csv file to a netCDF4 via Python but I am having trouble figuring out how I can store information from a .csv table format into a netCDF. My main concern is how do we declare the variables from the columns into a workable netCDF4 format? Everything I have found is normally extracting information from a netCDF4 to a .csv or ASCII. I have provided the sample data, sample code, and my errors for declaring the appropriate arrays. Any help would be much appreciated.

The sample table is below:

Station Name    Country  Code   Lat Lon mn.yr   temp1   temp2   temp3   hpa 
Somewhere   US  12340   35.52   23.358  1.19    -8.3    -13.1   -5  69.5
Somewhere   US  12340           2.1971  -10.7   -13.9   -7.9    27.9
Somewhere   US  12340           3.1971  -8.4    -13 -4.3    90.8

My sample code is:

#!/usr/bin/env python

import scipy
import numpy
import netCDF4
import csv

from numpy import arange, dtype 

#Declare empty arrays

v1 = []
v2 = []
v3 = []
v4 = []

# Open csv file and declare variable for arrays for each heading

f = open('station_data.csv', 'r').readlines()

for line in f[1:]:
    fields = line.split(',')
    v1.append(fields[0]) #station
    v2.append(fields[1])#country
    v3.append(int(fields[2]))#code
    v4.append(float(fields[3]))#lat
    v5.append(float(fields[3]))#lon
#more variables included but this is just an abridged list
print v1
print v2
print v3
print v4

#convert to netcdf4 framework that works as a netcdf

ncout = netCDF4.Dataset('station_data.nc','w') 

# latitudes and longitudes. Include NaN for missing numbers

lats_out = -25.0 + 5.0*arange(v4,dtype='float32')
lons_out = -125.0 + 5.0*arange(v5,dtype='float32')

# output data.

press_out = 900. + arange(v4*v5,dtype='float32') # 1d array
press_out.shape = (v4,v5) # reshape to 2d array
temp_out = 9. + 0.25*arange(v4*v5,dtype='float32') # 1d array
temp_out.shape = (v4,v5) # reshape to 2d array

# create the lat and lon dimensions.

ncout.createDimension('latitude',v4)
ncout.createDimension('longitude',v5)

# Define the coordinate variables. They will hold the coordinate information

lats = ncout.createVariable('latitude',dtype('float32').char,('latitude',))
lons = ncout.createVariable('longitude',dtype('float32').char,('longitude',))

# Assign units attributes to coordinate var data. This attaches a text attribute to each of the coordinate variables, containing the units.

lats.units = 'degrees_north'
lons.units = 'degrees_east'

# write data to coordinate vars.

lats[:] = lats_out
lons[:] = lons_out

# create the pressure and temperature variables

press = ncout.createVariable('pressure',dtype('float32').char,('latitude','longitude'))
temp = ncout.createVariable('temperature',dtype('float32').char,'latitude','longitude'))

# set the units attribute.

press.units =  'hPa'
temp.units = 'celsius'

# write data to variables.

press[:] = press_out
temp[:] = temp_out

ncout.close()
f.close()

error:

Traceback (most recent call last):
  File "station_data.py", line 33, in <module>
    v4.append(float(fields[3]))#lat
ValueError: could not convert string to float: 
Was it helpful?

Solution 3

If you see your input file, there is no value corresponding to column Lat in second row. When you read the csv file this value i.e. fields[3] is stored as an empty string "". That's why you are getting a ValueError. Instead of using the default function you can define a new function which can handle this error:

def str_to_float(str):
    try:
        number = float(str)
    except ValueError:
        number = 0.0
# you can assign an appropriate value instead of 0.0 which suits your requirement
    return number

Now you can use this function in place of built-in float function this way:

v4.append(str_to_float(fields[3]))

OTHER TIPS

This is a perfect job for xarray, a python package that has a dataset object representing the netcdf common data model. Here's an example you can try:

import pandas as pd
import xarray as xr

url = 'http://www.cpc.ncep.noaa.gov/products/precip/CWlink/'

ao_file = url + 'daily_ao_index/monthly.ao.index.b50.current.ascii'
nao_file = url + 'pna/norm.nao.monthly.b5001.current.ascii'

kw = dict(sep='\s*', parse_dates={'dates': [0, 1]},
          header=None, index_col=0, squeeze=True, engine='python')

# read into Pandas Series
s1 = pd.read_csv(ao_file, **kw)
s2 = pd.read_csv(nao_file, **kw)

s1.name='AO'
s2.name='NAO'

# concatenate two Pandas Series into a Pandas DataFrame
df=pd.concat([s1, s2], axis=1)

# create xarray Dataset from Pandas DataFrame
xds = xr.Dataset.from_dataframe(df)

# add variable attribute metadata
xds['AO'].attrs={'units':'1', 'long_name':'Arctic Oscillation'}
xds['NAO'].attrs={'units':'1', 'long_name':'North Atlantic Oscillation'}

# add global attribute metadata
xds.attrs={'Conventions':'CF-1.0', 'title':'AO and NAO', 'summary':'Arctic and North Atlantic Oscillation Indices'}

# save to netCDF
xds.to_netcdf('/usgs/data2/notebook/data/ao_and_nao.nc')

Then running ncdump -h ao_and_nao.nc produces:

netcdf ao_and_nao {
dimensions:
        dates = 782 ;
variables:
        double dates(dates) ;
                dates:units = "days since 1950-01-06 00:00:00" ;
                dates:calendar = "proleptic_gregorian" ;
        double NAO(dates) ;
                NAO:units = "1" ;
                NAO:long_name = "North Atlantic Oscillation" ;
        double AO(dates) ;
                AO:units = "1" ;
                AO:long_name = "Arctic Oscillation" ;

// global attributes:
                :title = "AO and NAO" ;
                :summary = "Arctic and North Atlantic Oscillation Indices" ;
                :Conventions = "CF-1.0" ;

Note that you can install xarray using pip, but if you are using the Anaconda Python Distribution, you can install it from the Anaconda.org/conda-forge channel by using:

conda install -c conda-forge xarray

While xarray mentioned above is a great tool, it is also worth looking at the UK Met Office's iris library. A key advantage of Iris is that helps to create netCDF files that follow the Climate Forecast (CF-conventions). It does this by providing helper functions to define standard names, units, coordinate systems, and other metadata conventions. It also provides plotting, subsetting, and analysis utilities.

For earth science data such as this, CF is the recommended standard for netCDF files

As an example of its use, this Python notebook re-implements the AO/NAO example above.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top