문제

I have a large data-frame (126041 Obs. of 604 variables). I'm new to HDF5 formats. I save the HDF5 file as follows:

writeH5DataFrame(myData,"C:/myDir/myHDF5.h5",overwrite=T)

  1. how can I read the data frame back? there doesn't appear to be any readH5DataFrame or loadH5DataFrame function?

  2. also, the writeH5DataFrame takes an incredibly long time, probably because of the large number of columns (604 in this case). The documentation mentions that "the data for each column is stored in a separate H5Dataset." - not sure if this the reason for the long time taken. Is there any way to speed up writing a DataFrame in HDF5 format?

도움이 되었습니까?

해결책

I don't know which package are you using, but using rhdf5 package, it looks very easy to write/read hdf5 files.

## uncomment the 2 lines after to install the package
## source("http://bioconductor.org/biocLite.R")
## biocLite("rhdf5")
library(rhdf5)
## empty HDF5 file : the data base
h5createFile("myhdf5file.h5")
## create group hierarchy. : tables or datasets
h5createGroup("myhdf5file.h5","group1")
h5createGroup("myhdf5file.h5","group2")

## save a matrix 
A = matrix(1:10,nr=5,nc=2)
h5write(A, "myhdf5file.h5","group1/A")

## save an array with attribute 
B = array(seq(0.1,2.0,by=0.1),dim=c(5,2,2))
attr(B, "scale") <- "liter"
h5write(B, "myhdf5file.h5","group2/B")
## check the data base
h5ls("myhdf5file.h5")

   group   name       otype  dclass       dim
0       / group1   H5I_GROUP                  
1 /group1      A H5I_DATASET INTEGER     5 x 2
2       / group2   H5I_GROUP                  
3 /group2      B H5I_DATASET   FLOAT 5 x 2 x 2

 ## read A and B
 D = h5read("myhdf5file.h5","group1/A")
 E = h5read("myhdf5file.h5","group2/B")
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top