How do I use parallel I/O with netCDF-4?

Question 1

Information related to using parallel I/O with Unidata NetCDF may be found here:

https://www.unidata.ucar.edu/software/netcdf/docs/parallel_io.html

The --enable-parallel flag is no longer necessary when configuring netCDF; It will check the documentation and update it if need be. The flag is necessary when building the hdf5 library, however.

In order to use parallel I/O with netCDF-4, you need to make sure that it was built against an hdf5 library with parallel I/O enabled. At configure time, netCDF will query the hdf5 library to see whether or not the parallel I/O symbols are present.

If they are, parallel I/O for netCDF-4 is assumed.
If they are not, parallel I/O for netCDF-4 files is turned off.

If you are installing the netCDF library yourself, you can specify the --enable-parallel-tests flag when configuring; when you run make check, parallel tests will be run. You can also scan the output in config.log to see if parallel I/O functionality was found in the hdf5 library; there should be a message notifying you whether or not it was enabled.

Note that there are some limitations to Parallel I/O with netCDF-4, specifically:

NetCDF-4 provides access to HDF5 parallel I/O features for netCDF-4/HDF5 files. NetCDF classic and 64-bit offset format may not be opened or created for use with parallel I/O. (They may be opened and created, but parallel I/O is not available.)

Assuming that the underlying netCDF library has parallel I/O enabled, and you are operating on the correct type of file, the standard API call invoked by ncdf4 should leverage parallel I/O automatically.

Question 2

There is one more R package dedicated for parallel handling of NetCDF files which is called pbdNCDF4.
This solution is based on standard ncdf4 package, so the syntax is very similar to the "traditional" approach. Further information available on CRAN: https://cran.r-project.org/web/packages/pbdNCDF4/vignettes/pbdNCDF4-guide.pdf

Question 3

Ward gave a fine answer. I wanted to add that there is another way to get parallel I/O features out of Unidata NetCDF-4.

NetCDF-4 has an architecture that separates the API from the back end storage implementation. Commonly, that's the NetCDF API on an HDF5 back end. But, here's the neat thing: you can also have the NetCDF API on the Northwestern/Argonne "Parallel-NetCDF" (http://cucis.ece.northwestern.edu/projects/PnetCDF/ and http://www.mcs.anl.gov/parallel-netcdf) back end.

This approach will give you a parallel I/O method to classic and 64-bit offset formatted datasets.

Question 4

Both Ward and Rob gave fine answers! ;-)

But there is yet another way to get parallel I/O on classic and 64-bit offset files, through the standard netCDF API.

When netCDF is built with --enable-pnetcdf, then the parallel-netcdf library is used, behind the scenes, to perform parallel I/O on classic, 64-bit offset, and CDF5 (though I have not tested that last format with parallel I/O).

When opening the file, use the NC_PNETCDF flag to the mode to indicate that you want to use parallel I/O for that file.