In process call performance of frameworks like Corba (e.g. TAO), Thrift, D-Bus, ICE

Question 1

What is common with most communication frameworks that I'm aware of is that they will always serialize, send and deserialize, which will always be a performance hit over passing references to other threads and accessing data directly (with or without mutex). This shouldn't always be dramatic when responsibilities are assigned wisely to minimize communication.

Remark that with these sort of architectural choices, performance is only one of the aspects to consider. Others are: security, stability, flexibility, deployment, maintainability, licenses, etc...

Question 2

omniORB for a long time has had a co-located shortcut that made direct calls, but starting with version 4 it has a proprietary POA policy that bypasses even more of the required CORBA behavior to make it almost as fast as a direct virtual call. See the omniORB Wiki and search for "Shortcut local calls." Unfortunately this doesn't seem to be in the official docs, at least that I could find.

Question 3

From ZeroMQ / Learn the basics:

In 2011, CERN (the European Organization for Nuclear Research) compared CORBA, Ice, Thrift, ZeroMQ, YAMI4, RTI, and Qpid (AMQP). Read their analysis and conclusions. (PDF)

Which might just be the comparison you were after. (Found thanks to Matthieu Rouget's comment.)

I'd also pitch in that, while some ORBs allow you to skip the marshalling, you still can't avoid the dynamic memory allocation, which is what really matters for performance. (Today CPUs are insanely fast, memory access is slow, and asking the OS to allocate a memory page is really slow.)

So wherein C++ you might just return a const string &, CORBA's C++ binding will force you to dynamically allocate and free a string or data structure (whether by return type or out parameter). This isn't significant if the method calls across process/network anyway, but in-process it becomes quite significant compared to plain C++.

Another 'gotcha' we were burnt by, is that you can't define mutually-recusive structures (i.e. struct 'A' includes a 'B' which includes an 'A' again). This meant we had to convert those to interfaces, which allocates a CORBA Servant "server side" (in-process) per structure, which is very memory heavy. I gather there are advanced tricks to avoid actually creating servants, but ultimately we just want to get away from CORBA altogether, not dig ourselves in deeper.

Especially in C++, memory management is very fragile and difficult to program correctly. (See The Rise and Fall or CORBA, section 'complexity'.) I attribute many person-years of additional effort due to this technology choice.

I'd be curious to hear how you got on & what you adopted.

Question 4

One of several reasons for IBM System Object Model creation was CORBA. IBM SOM is "local CORBA" and IBM DSOM is an implementation of CORBA.

You should probably estimate somFree.

Another option is UNO (from OpenOffice.org). I can't say I like UNO, It's worse, but it's more mature than long forgotten SOM. UNO local (in-process) ecosystem is separated into partitions depending on programming language. C++ and Java are most common partitions. There is no serialization, but preferred mechanism for inter-partition interaction is late binding (Java Proxy->Java Dispatch->C++ Dispatch->C++ object) (kinda IDispatch in OLE) although direct bindings can be also maid (Java Proxy->C++ object).

Question 5

ICE from ZeroC definately supports collocation invocation when marshalling of data is avoided. You can find details on documentation from their site: http://doc.zeroc.com/display/Ice/Location+Transparency Though collocation call has some overhead vs virtual method call, unfortunately I do not have actual numbers, but it also depends on conditions ie how many servants registered in particular adapter etc.