Question

I'm building a distributed C++ application that needs to do lots of serialization and deserialization of simple data structures that's being passed between different processes and computers.

I'm not interested in serializing complex class hierarchies, but more of sending structures with a few simple members such as number, strings and data vectors. The data vectors can often be many megabytes large. I'm worried that text/xml-based ways of doing it is too slow and I really don't want to write this myself since problems like string encoding and number endianess can make it way more complicated than it looks on the surface.

I've been looking a bit at protocol buffers and boost.serialize. According to the documents protocol buffers seems to care much about performance. Boost seems somewhat more lightweight in the sense that you don't have an external language for specifying the data format which I find quite convenient for this particular project.

So my question comes down to this: does anyone know if the boost serialization is fast for the typical use case I described above?

Also if there are other libraries that might be right for this, I'd be happy to hear about them.

Was it helpful?

Solution

I would strongly suggest protocol buffers. They're incredibly simple to use, offer great performance, and take care of issues like endianness and backwards compatibility. To make it even more attractive, serialized data is language-independent thanks to numerous language implementations.

OTHER TIPS

ACE and ACE TAO come to mind, but you might not like the size and scope of it. http://www.cs.wustl.edu/~schmidt/ACE.html

Regarding your query about "fast" and boost. That is a subjective term and without knowing your requirements (throughput, etc) it is difficult to answer that for you. Not that I have any benchmarks for the boost stuff myself...

There are messaging layers you can use, but those are probably slower than boost. I'd say that you identified a good solution in boost, but I've only used ACE and other proprietary communications/messaging products.

My guess is that boost is fast enough. I have used it in previous projects to serialize data to and from disk, and its performance never even came up as an issue.

My answer here talks about serialization in general, which may be helpful to you beyond which serialization library you choose to use.

Having said that, it looks like you know most of the main trouble spots with serialization (endianess string encoding). You did leave out versioning and forwards/backwards compatibility. If time is not critical I recommend writing your own serialization code. It is an enlightening experience, and the lessons you learn are invaluable. Though I will warn you it will tend to make you hate XML based protocols for their bloatedness. :)

Whichever path you choose good luck with your project.

Also check out ONC-RPC (old SUN-RPC)

boost.serialization doesn't care about string encodings or endianness. You'll be similarly well off not using it if that matters to you.

You might want to look into ICE from ZeroC: http://www.zeroc.com/

It works similar to CORBA, except that it's entirely specced and defined by the company. The upside is that the implementations work as intended, since there aren't all that many. The downside is that if you're using a language they don't support, you're out of luck.

If you are only sending well defined defined data structures, then perhaps you should be looking at ASN.1 as an encoding methodology ?

There's also Thrift, which looks like an alpha project but is used and developed by Facebook, so it has a few users of it.

Or good old DCE, which was the standard MS decided to use for COM. Its now open-source, 20 years too late, but better than never.

Don't pre-emptively optimize. Measure first and optimize second.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top