Question

My C library generates a very big array of POD structs. What is the most efficient way to pass it to Ruby side? On Ruby side a raw array of values is fine for me.

My current solution works by storing each element and field separately and it is very slow. Profiling showed that this functions takes about ~15% of program time on average data and it is not even computational part.

I've read about Data_Wrap_Struct, but not sure that i need it. If I will pass a raw void* to string and then unpack it on the Ruby side, will it be much faster?

struct SPacket
{
    uint32_t field1;
    uint32_t field2;
    uint16_t field3;
    uint8_t field4;
};

VALUE rb_GetAllData(VALUE self) // SLOOOW
{
    size_t count = 0;

    struct SPacket* packets = GetAllData(&count);

    VALUE arr = rb_ary_new2(count);

    for(size_t i = 0; i < count; i++)
    {
        VALUE sub_arr = rb_ary_new2(4);

        rb_ary_store(sub_arr, 0, UINT2NUM(packets[i].field1));
        rb_ary_store(sub_arr, 1, UINT2NUM(packets[i].field2));
        rb_ary_store(sub_arr, 2, UINT2NUM(packets[i].field3));
        rb_ary_store(sub_arr, 3, UINT2NUM(packets[i].field4));

        rb_ary_store(arr, i, sub_arr);
    }

    return arr;
}
Was it helpful?

Solution

Your method copies your C array into a Ruby array. You could avoid this my creating a Ruby collection class that wraps the C array using Data_Wrap_Struct and acts directy on it.

Data_Wrap_Struct is a macro that takes a Ruby class and a C struct (and optionally a couple of pointers to functions for memory management that I’m deliberately omitting) and create an instance of that class that has the struct “attached”. In the functions that provide the implementation of this classes methods you then use Data_Get_Struct to “unwrap” the struct that you can then access in the function.

In this case, something like this:

// declare a variable for the new class
VALUE rb_cSPacketCollection;

// a struct that will be wrapped by the class
struct SPacketDataStruct {
    struct SPacket * data;
    int count;
};

VALUE rb_GetAllData() {
    struct SPacketDataStruct* wrapper = malloc(sizeof (struct SPacketCollectionWrapper));
    wrapper->data = GetAllData(&wrapper->count);
    return Data_Wrap_Struct(rb_cSPacketCollection, 0, 0, wrapper);
}

and in your Init_whatever() method you’ll need to create the class:

rb_cSPacketCollection = rb_define_class("SPacketCollection", rb_cObject);

This alone isn’t much use, you need to define some methods on this new class. As an example you could create a [] method to allow access to the individual SPackets:

VALUE SPacketCollection_get(VALUE self, VALUE index) {
    // unwrap the struct
    struct SPacketDataStruct* wrapper;
    Data_Get_Struct(self, struct SPacketDataStruct, wrapper);

    int i = NUM2INT(index);

    // bounds check
    if (i >= wrapper->count) {
        rb_raise(rb_eIndexError, "Index out of bounds");
    }

    // just return an array in this example
    VALUE arr = rb_ary_new2(4);

    rb_ary_store(arr, 0, UINT2NUM(wrapper->data[i].field1));
    rb_ary_store(arr, 1, UINT2NUM(wrapper->data[i].field2));
    rb_ary_store(arr, 2, UINT2NUM(wrapper->data[i].field3));
    rb_ary_store(arr, 3, UINT2NUM(wrapper->data[i].field4));

    return arr;
}

and then in your Init_ method, after creating the class you define the method:

rb_define_method(rb_cSPacketCollection, "[]", SPacketCollection_get, 1);

Note Data_Get_Struct is a macro and the usage is slightly odd, in that it doesn’t return the unwrapped struct.

Since you’ve started using Data_Wrap_Struct by this stage, you could go further and create a new class that wraps an individual SPacket struct and operates directly on it:

// declare a variable for the new class
VALUE rb_cSPacket;

//and a function to get a field value
// you'll need to create more methods to access the other fields
// (and possibly to set them)
VALUE SPacket_field1(VALUE self) {
    struct SPacket* packet;
    Data_Get_Struct(self, struct SPacket, packet);

    return UINT2NUM(packet->field1);
}

In your Init_ function, create it and define the methods:

rb_cSPacket = rb_define_class("SPacket", rb_cObject);
rb_define_method(rb_cSPacket, "field1", SPacket_field1, 0);

This may entail a bit of work to create all the getters and setters for the fields, it will depend on how you’re using it. Something like ffi could help here, but I don’t know how ffi would deal with the collection class – it would probably be worth looking into.

Now change your [] function to return an instance if this new class:

VALUE SPacketCollection_get(VALUE self, VALUE index) {
    //unwrap the struct
    struct SPacketDataStruct* wrapper;
    Data_Get_Struct(self, struct SPacketDataStruct, wrapper);

    int i = NUM2INT(index);

    //bounds check
    if (i >= wrapper->count) {
        rb_raise(rb_eIndexError, "Index out of bounds");
    }

    //create an instance of the new class, and wrap it around the 
    //struct in the array
    struct SPacket* packet = &wrapper->data[i];
    return Data_Wrap_Struct(rb_cSPacket, 0, 0, packet);
}

With this you can now do something like this in Ruby:

c = get_all_data # in my testing I just made this a global method
c[2].field1 # returns the value of field1 of the third SPacket in the array

It might be worth creating an each method on the collection class, and then you can include the Enumerable module and make available a load of methods:

VALUE SPacketCollection_each(VALUE self) {
    //unwrap the struct as before
    struct SPacketDataStruct* wrapper;
    Data_Get_Struct(self, struct SPacketDataStruct, wrapper);
    int i;

    for(i = 0; i < wrapper->count; i++) {
        //create a new instance if the SPacket class
        // wrapping this entry
        struct SPacket* packet = &wrapper->data[i];
        rb_yield(Data_Wrap_Struct(rb_cSPacket, 0, 0, packet));
    }
    return self;
}

in Init_whatever:

rb_define_method(rb_cSPacketCollection, "each", SPacketCollection_each, 0);
rb_include_module(rb_cSPacketCollection, rb_mEnumerable);

In this example I haven’t been concerned about things like object identity and memory management. With everything backed by the same array you could have multiple objects that all share the same data, you’ll have to consider whether this is okay for your use. Also you may have noticed I’ve malloced but not freed. You’ll need to determine who “owns” the data array and make sure you don’t introduce any memory leaks. You can pass a function to Data_Wrap_Struct that will be called when the object is garbage collected to free memory.

If you haven’t already seen it, the Pickaxe book has a good chapter on C extensions, and is now available online.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top