Question

I am trying to read a binary file into memory, and then use it like so:

struct myStruct {
    std::string mystring; // is 40 bytes long
    uint myint1; // is 4 bytes long
};

typedef unsigned char byte;

byte *filedata = ReadFile(filename); // reads file into memory, closes the file
myStruct aStruct;
aStruct.mystring = filedata.????

I need a way of accessing the binary file with an offset, and getting a certain length at that offset. This is easy if I store the binary file data in a std::string, but i figured that using that to store binary data is not as good way of doing things. (filedata.substr(offset, len))

Reasonably extensive (IMO) searching hasn't turned anything relevant up, any ideas? I am willing to change storage type (e.g. to std::vector) if you think it is necessary.

Was it helpful?

Solution

If you're not going to use a serialization library, then I suggesting adding serialization support to each class:

struct My_Struct
{
    std::string my_string;
    unsigned int my_int;
    void Load_From_Buffer(unsigned char const *& p_buffer)
    {
        my_string = std::string(p_buffer);
        p_buffer += my_string.length() + 1; // +1 to account for the terminating nul character.
        my_int = *((unsigned int *) p_buffer);
        p_buffer += sizeof(my_int);
    }
};

unsigned char * const buffer = ReadFile(filename);
unsigned char * p_buffer = buffer;
My_Struct my_variable;
my_variable.Load_From_Buffer(p_buffer);

Some other useful interface methods:

unsigned int Size_On_Stream(void) const; // Returns the size the object would occupy in the stream.
void Store_To_Buffer(unsigned char *& p_buffer); // Stores object to buffer, increments pointer.

With templates you can extend the serialization functionality:

void Load_From_Buffer(std::string& s, unsigned char *& p_buffer)
{
    s = std::string((char *)p_buffer);
    p_buffer += s.length() + 1;
}

void template<classtype T> Load_From_Buffer(T& object, unsigned char *& p_buffer)
{
  object.Load_From_Buffer(p_buffer);
}

Edit 1: Reason not to write structure directly

In C and C++, the size of a structure may not be equal to the sum of the size of its members.
Compilers are allowed to insert padding, or unused space, between members so that the members are aligned on an address.

For example, a 32-bit processor likes to fetch things on 4 byte boundaries. Having one char in a structure followed by an int would make the int on relative address 1, which is not a multiple of 4. The compiler would pad the structure so that the int lines up on relative address 4.

Structures may contain pointers or items that contain pointers.
For example, the std::string type may have a size of 40, although the string may contain 3 characters or 300. It has a pointer to the actual data.

Endianess.
With multibyte integers some processors like the Most Significant Byte (MSB), a.k.a. Big Endian, first (the way humans read numbers) or the Least Significant Byte first, a.k.a. Little Endian. The Little Endian format takes less circuitry to read than the Big Endian.

Edit 2: Variant records

When outputting things like arrays and containers, you must decide whether you want to output the full container (include unused slots) or output only the items in the container. Outputting only the items in the container would use a variant record technique.

Two techniques for outputting variant records: quantity followed by items or items followed by a sentinel. The latter is how C-style strings are written, with the sentinel being a nul character.

The other technique is to output the quantity of items, followed by the items. So if I had 6 numbers, 0, 1, 2, 3, 4, 5, the output would be:
6 // The number of items
0
1
2
3
4
5

In the above Load_From_Buffer method, I would create a temporary to hold the quantity, write that out, then follow with each item from the container.

OTHER TIPS

You could overload the std::ostream output operator and std::istream input operator for your structure, something like this:

struct Record {
    std::string name;
    int value;
};

std::istream& operator>>(std::istream& in, Record& record) {
    char name[40] = { 0 };
    int32_t value(0);
    in.read(name, 40);
    in.read(reinterpret_cast<char*>(&value), 4);
    record.name.assign(name, 40);
    record.value = value;
    return in;
}

std::ostream& operator<<(std::ostream& out, const Record& record) {
    std::string name(record.name);
    name.resize(40, '\0');
    out.write(name.c_str(), 40);
    out.write(reinterpret_cast<const char*>(&record.value), 4);
    return out;
}

int main(int argc, char **argv) {
    const char* filename("records");
    Record r[] = {{"zero", 0 }, {"one", 1 }, {"two", 2}};
    int n(sizeof(r)/sizeof(r[0]));

    std::ofstream out(filename, std::ios::binary);
    for (int i = 0; i < n; ++i) {
        out << r[i];
    }
    out.close();

    std::ifstream in(filename, std::ios::binary);
    std::vector<Record> rIn;
    Record record;
    while (in >> record) {
        rIn.push_back(record);
    }
    for (std::vector<Record>::iterator i = rIn.begin(); i != rIn.end(); ++i){
        std::cout << "name: " << i->name << ", value: " << i->value
                  << std::endl;
    }
    return 0;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top