Question

I want to keep a bunch of simple structures (just 3 ints per structure at the moment) in a file, and be able to read back just one of those structures at any given time.

As a first step, I'm trying to output them to a file, then read them back using boost::serialization. Currently I'm doing this, which crashes:

std::array<Patch, 3> outPatches;

outPatches[0].ZOrigin = 0;
outPatches[0].XOrigin = 0;
outPatches[0].Resolution = 64;

outPatches[1].ZOrigin = 1;
outPatches[1].XOrigin = 5;
outPatches[1].Resolution = 3;

outPatches[2].ZOrigin = 123;
outPatches[2].XOrigin = 546;
outPatches[2].Resolution = 6;

std::ofstream ofs("testing.sss", std::ios::binary);

for (auto const& patch : outPatches)
{
    std::cout << "start archive: " << ofs.tellp() << std::endl;
    {
    boost::archive::binary_oarchive oa(ofs);
    std::cout << "start patch: " << ofs.tellp() << std::endl;

    oa << patch;
    }
}

ofs.close();


std::array<Patch, 3> inPatches;

std::ifstream ifs("testing.sss", std::ios::binary);

for (auto& patch : inPatches)
{
    std::cout << "start archive: " << ifs.tellg() << std::endl;
    {
    boost::archive::binary_iarchive ia(ifs); // <-- crash here on second patch

    std::cout << "start patch: " << ifs.tellg() << std::endl;

    ia >> patch;
    }
}

ifs.close();

for (int i = 0; i != 3; ++i)
    std::cout << "check: " << (inPatches[i] == outPatches[i]) << std::endl;

I was planning on using tell to make an index of where each structure is, and seek to skip to that structure on load. Is this a reasonable approach to take? I don't know much about streams beyond the basics.

I've tried putting all the patches in one o/iarchive instead, which works fine for reading everything sequentially. However, seeking on the stream didn't work.

I've found this, which might be what I want, but I have no idea what it's doing or how to use it, or whether it would work with boost::serialization: read part of a file with iostreams

I'd probably be willing to switch to another serialization method if necessary, since I've not got very far with this.

Edit 3: Moved edits 1 and 2 to an answer.

Was it helpful?

Solution 2

Boost serialization

It doesn't seem possible to skip around inside a boost serialization archive. The best I've got so far is to use multiple archives on one stream:

static const int numPatches = 5000;

std::vector<int> indices(numPatches, 0);
std::iota(indices.begin(), indices.end(), 0);

std::vector<Patch> outPatches(numPatches, Patch());

std::for_each(outPatches.begin(), outPatches.end(), 
    [] (Patch& p)
    {
        p.ZOrigin = rand();
        p.XOrigin = rand();
        p.Resolution = rand();
    });


std::vector<int64_t> offsets(numPatches, 0);

std::ofstream ofs("testing.sss", std::ios::binary);

for (auto i : indices)
{
    offsets[i] = ofs.tellp();

    boost::archive::binary_oarchive oa(ofs, 
        boost::archive::no_header | boost::archive::no_tracking);
    oa << outPatches[i];
}

ofs.close();


std::random_shuffle(indices.begin(), indices.end());


std::vector<Patch> inPatches(numPatches, Patch());

std::ifstream ifs("testing.sss", std::ios::binary);

for (auto i : indices)
{
    ifs.seekg(offsets[i]);

    boost::archive::binary_iarchive ia(ifs,
        boost::archive::no_header | boost::archive::no_tracking);
    ia >> inPatches[i];

    ifs.clear();
}

std::cout << std::all_of(indices.begin(), indices.end(), 
    [&] (int i) { return inPatches[i] == outPatches[i]; }) << std::endl;

Unfortunately, this is very slow, so I don't think I can use it. Next up is testing protobuf.


google::protobuf

I've got something working with protobuf. It required a bit of fiddling around (apparently I have to use the LimitingInputStream type, and store the size of each object), but it's a lot faster than the boost::serialization version:

static const int numPatches = 500;

std::vector<int> indices(numPatches, 0);
std::iota(indices.begin(), indices.end(), 0);

std::vector<Patch> outPatches(numPatches, Patch());

std::for_each(outPatches.begin(), outPatches.end(), 
    [] (Patch& p)
    {
        p.ZOrigin = rand();
        p.XOrigin = rand();
        p.Resolution = 64;
    });


std::vector<int64_t> streamOffset(numPatches, 0);
std::vector<int64_t> streamSize(numPatches, 0);

std::ofstream ofs("testing.sss", std::ios::binary);

PatchBuffer buffer;

for (auto i : indices)
{
    buffer.Clear();

    WriteToPatchBuffer(buffer, outPatches[i]);

    streamOffset[i] = ofs.tellp();
    streamSize[i] = buffer.ByteSize();

    buffer.SerializeToOstream(&ofs);
}

ofs.close();

std::random_shuffle(indices.begin(), indices.end());

std::vector<Patch> inPatches(numPatches, Patch());

std::ifstream ifs("testing.sss", std::ios::binary);

for (auto i : indices)
{
    ifs.seekg(streamOffset[i]);

    buffer.Clear();

    google::protobuf::io::IstreamInputStream iis(&ifs);
    google::protobuf::io::LimitingInputStream lis(&iis, streamSize[i]);
    buffer.ParseFromZeroCopyStream(&lis);

    ReadFromPatchBuffer(inPatches[i], buffer);

    ifs.clear();
}

std::cout << std::all_of(indices.begin(), indices.end(), 
    [&] (int i) { return inPatches[i] == outPatches[i]; }) << std::endl;

OTHER TIPS

I once had a similar case (with boost / serialization). What I did back then (and it was quite efficient, if I remember) was to map the file into a virtual address, write a streamer that operates on memory buffers instead of files and for each part that I wanted to read assign appropriate offset to the streamer as buffer start / length and initialize the iarchive with the streamer so the serialization library treated it as if each object was in a separate file.

Of course, adding to the file required a re-map. Now that I look back at this, it seems a bit weird, but it was efficient, afair.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top