Question

I am just picking up HDF5 and I am a bit confused about the difference between creating data for the memory and creating data for the file. What's the difference?

In this example, creating a compound type data requires the data to be created in memory and placed in the file:

 /*
 * Create the memory data type. 
 */
s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));
H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT);
H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE);
H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_FLOAT);

/* 
 * Create the dataset.
 */
dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT);

/*
 * Wtite data to the dataset; 
 */
status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1);

However, in another example here, the author also creates a compound data for the file, which specifies a different data type. For example, in creating the data type for memory, serial_no used type H5T_NATIVE_INT, but in creating the datatype for the file, serial_no used H5T_STD_I64BE. Why does he do this?

    /*
 * Create the compound datatype for memory.
 */
memtype = H5Tcreate (H5T_COMPOUND, sizeof (sensor_t));
status = H5Tinsert (memtype, "Serial number",
            HOFFSET (sensor_t, serial_no), H5T_NATIVE_INT);
status = H5Tinsert (memtype, "Location", HOFFSET (sensor_t, location),
            strtype);
status = H5Tinsert (memtype, "Temperature (F)",
            HOFFSET (sensor_t, temperature), H5T_NATIVE_DOUBLE);
status = H5Tinsert (memtype, "Pressure (inHg)",
            HOFFSET (sensor_t, pressure), H5T_NATIVE_DOUBLE);

/*
 * Create the compound datatype for the file.  Because the standard
 * types we are using for the file may have different sizes than
 * the corresponding native types, we must manually calculate the
 * offset of each member.
 */
filetype = H5Tcreate (H5T_COMPOUND, 8 + sizeof (hvl_t) + 8 + 8);
status = H5Tinsert (filetype, "Serial number", 0, H5T_STD_I64BE);
status = H5Tinsert (filetype, "Location", 8, strtype);
status = H5Tinsert (filetype, "Temperature (F)", 8 + sizeof (hvl_t),
            H5T_IEEE_F64BE);
status = H5Tinsert (filetype, "Pressure (inHg)", 8 + sizeof (hvl_t) + 8,
            H5T_IEEE_F64BE);

/*
 * Create dataspace.  Setting maximum size to NULL sets the maximum
 * size to be the current size.
 */
space = H5Screate_simple (1, dims, NULL);

/*
 * Create the dataset and write the compound data to it.
 */
dset = H5Dcreate (file, DATASET, filetype, space, H5P_DEFAULT, H5P_DEFAULT,
            H5P_DEFAULT);
status = H5Dwrite (dset, memtype, H5S_ALL, H5S_ALL, H5P_DEFAULT, wdata);

What is the difference between these two methods?

Was it helpful?

Solution

From http://www.hdfgroup.org/HDF5/doc/UG/UG_frame11Datatypes.html:

H5T_NATIVE_INT corresponds to a C int type. On an Intel based PC, this type is the same as H5T_STD_I32LE, while on a MIPS system this would be equivalent to H5T_STD_I32BE.

That's say, H5T_NATIVE_INT has different memory layout on different type of processors. If your data is only used in memory, which means your data will not go out of this machine, you may like to use H5T_NATIVE_INT for better performance.

But if your data will be saved to file, and will be used by different systems, you must specify a certain int type to keep your data can be read correctly, e.g. H5T_STD_I64BE or H5T_STD_I32LE. If you use H5T_NATIVE_INT, and your created a data file on Intel based PC, the number will be saved as H5T_STD_I32LE. When this file is used by a MIPS system, it will read the number as H5T_STD_I32BE, which is not expected.

OTHER TIPS

The other answer here is missing some key ideas and makes using HDF5 datatypes seem harder than it is.

To begin with, the NATIVE types are simply aliases for what the C types map to on that platform (this is detected when the HDF5 library is built). If you use them in your code and look at the file you created with the h5dump tool, you will not see the NATIVE datatype but will instead see the real datatype (H5T_STD_I32LE or whatnot). These NATIVE types are admittedly a little confusing, but they are convenient for mapping between C types and HDF5 datatypes without having to know the byte order of the system you are on.

The other misconception I want to clear up is that the library will convert types for you when it is reasonable to do so. If a dataset contains H5T_STD_I32BE values and you declare the I/O buffer to be of H5T_NATIVE_INT on a little-endian system, the HDF5 library will convert the big-endian dataset integers to in-memory little-endian integers for you. You should not need to perform byte swapping on your own.

Here is a simple way to think about it:

  • You declare a dataset's storage datatype when you call H5Dcreate().
  • You declare the I/O buffer's datatype when you call H5Dread() and H5Dwrite().

Again, if these differ and type conversions are reasonable, the data will be converted during the read/write calls.

Note that this type conversion could have performance implications in time-critical applications. If the platforms where data will be written and read differ in byte order or word size, you might want to explicitly set the datatype instead of using the NATIVE aliases so you can force the conversion to take place on the less important platform.

Example: Suppose you have a BE writer and LE reader and that the data arrive slowly but reads have to be as fast as possible. in this case, you would want to explicitly create your dataset to store H5T_STD_I32LE data so the datatype conversions happen on the writer.

One last thing -- It's better to use the HOFFSET(s,m) macro instead of calculating offsets by hand when constructing compound types. It's more maintainable and your code will look nicer.

If you want more information about HDF5 datatypes, check out chapter 6 of the user's guide here: https://support.hdfgroup.org/HDF5/doc/UG/HDF5_Users_Guide-Responsive%20HTML5/index.html

You can also check out the H5T API docs in the reference manual here: https://support.hdfgroup.org/HDF5/doc/RM/RM_H5Front.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top