Strategy for Binary File Format Description to C++ Implementation

https://softwareengineering.stackexchange.com/questions/322867

19-12-2020
|

Pregunta

I am dealing with a lot of legacy, reverse engineered binary file formats, often with lost source code and reading/writing these files needs to be recoded in C++.

I am wondering if there are good examples or ideas on simplyfing the process of converting documentation of the file format into code with goal being to load data into a class that can be loaded/saved/processed.

From current investigation into the issue I think boost serialization may be one of the best options ( http://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/ ) Although not sure if there is a simpler way just using C++ and STL?

I am mostly concerned about the ease of describing the data, and minimizing rework for each new type of binary file format being worked on.

Solución

I am wondering if there are good examples or ideas on simplyfing the process of converting documentation of the file format into code with goal being to load data into a class that can be loaded/saved/processed.

This can be solved at multiple levels:

you can use boost::spirit parsing, or a custom serializer/deserializer (as suggested in the comments)
you can hide the implementation behind a custom set of boost::iostream device buffer types.

I am mostly concerned about the ease of describing the data, and minimizing rework for each new type of binary file format being worked on.

I would do this by creating some custom types that map i/o bytes to semantic information, transparently to the user:

/// map custom file header info into BlaBla information
class BlaBlaHeaderField
{
     std::uint32_t binary_header;
     BlaBlaHeaderField(std::uint32_t binary_header) { ... }

     /// custom property (interprets individual bytes)
     int BlaBlaParity() { return (binary_header & 0x01); }
};

This way, the format will be close to self-documenting from the code, later.

You can also use a union and overlay the fields with an integer/long/whatever.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange