Question

Learning the ILNumerics HDF5 API. I really like the option to setup a complex HDF5 file in one expression using C# object initializers. I created the following file:

using (var f = new H5File("myFile.h5")) {

    f.Add(new H5Group("myTopNode") {
        new H5Dataset("dsNo1", ILMath.vec<float>(1,200)),  // no attributes 
        new H5Group("myGroup") {
            new H5Dataset("dsYes", ILMath.rand(100,200)) { // matching dataset
                Attributes = { 
                    { "att1", 1 },
                    { "att2", 2 } 
                }
            }, 
            new H5Dataset("dsNo2") {  // attributes but wrong name
                Attributes = { 
                    { "wrong1", -100 },
                    { "wrong2", -200 } 
                }
            }
        }
    });
}

Now I am searching for a clever way to iterate over the file and filter for datasets with specific properties. I want to find all datasets having at least one attribute with "att" in its name, collect and return their content. This is what I made so far:

IList<ILArray<double>> list = new List<ILArray<double>>();
using (var f = new H5File("myFile.h5")) {
    var groups = f.Groups;
    foreach (var g in groups) {
        foreach (var obj in g) {
            if (obj.H5Type == H5ObjectTypes.Dataset && obj.Name.Contains("ds")) {
                var ds = obj as H5Dataset;
                // look for attributes 
                foreach (var att in ds.Attributes) {
                    //ds.Attributes["att"].
                    if (att.Name.Contains("att")) {
                        list.Add(ds.Get<double>());
                    }
                }
            }
        }
    }
}
return list; 

But it does not work recursively. I could adopt it but ILNumerics claims to be convenient so there must be some better way? Something similar to h5py in python?

Was it helpful?

Solution

H5Group provides the Find<T> method which does just what you are looking for. It iterates over the whole subtree, taking arbitrary predicates into account:

var matches = f.Find<H5Dataset>(
                 predicate: ds => ds.Attributes.Any(a => a.Name.Contains("att")));

Why not make your function return 'ILCell' instead of a 'List'? This more nicely integrates into the ILNumerics memory management (there will be no storage laying around and waiting for the garbage collector to come by):

using (var f = new H5File("myFile.h5")) {
    // create container for the dataset contents
    ILCell c = cell(size(1, 1)); // one element init

    // retrieve datasets filtered
    var matches = f.Find<H5Dataset>(predicate: ds => {
        if (ds.Attributes.Any(a => a.Name.Contains("att"))) {
            c[end + 1] = ds.Get<double>();
            return true; 
        }
        return false; 
    });
    return c; 
}

Some links:

http://ilnumerics.net/hdf5-interface.html

http://ilnumerics.net/Cells.html

http://ilnumerics.net/GeneralRules.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top