The h5py documentations (http://www.h5py.org/docs/high/dataset.html) says the following:

Importantly, h5py does not use NumPy to do broadcasting before the write...

>>> dset2 = f.create_dataset("MyDataset", (1000,1000,1000), 'f')
>>> data = np.arange(1000*1000, dtype='f').reshape((1000,1000))
>>> dset2[:] = data  # Does NOT allocate 3.8 G of memory

What doest broadcasting refers to in this case?

有帮助吗?

解决方案

Here, broadcasting is copying the (1000,1000) array 1000 times, so it matches the (1000,1000,1000) shape.

H5py will not create the full array in memory first before writing to disk, instead it will write the (1000,1000) array 1000 times, creating the correct array on disk while only using 1/1000 of the memory.

You can read more about the rules of numpy broadcasting here.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top