From https://docs.h5py.org/en/stable/special.html:
In HDF5, data in VL format is stored as arbitrary-length vectors of a
base type. In particular, strings are stored C-style in
null-terminated buffers. NumPy has no native mechanism to support
this. Unfortunately, this is the de facto standard for representing
strings in the HDF5 C API, and in many HDF5 applications.
Thankfully, NumPy has a generic pointer type in the form of the
“object” (“O”) dtype. In h5py, variable-length strings are mapped to
object arrays. A small amount of metadata attached to an “O” dtype
tells h5py that its contents should be converted to VL strings when
stored in the file.
Existing VL strings can be read and written to with no additional
effort; Python strings and fixed-length NumPy strings can be
auto-converted to VL data and stored.
Example
In [27]: dt = h5py.special_dtype(vlen=str)
In [28]: dset = h5File.create_dataset('vlen_str', (100,), dtype=dt)
In [29]: dset[0] = 'the change of water into water vapour'
In [30]: dset[0]
Out[30]: 'the change of water into water vapour'