Frage

I know there are differences in the source code between C and C++ programs - this is not what I'm asking about.

I also know this will vary from CPU to CPU and OS to OS, depending on compiler.

I'm teaching myself C++ and I've seen numerous references to libraries that can be used by both languages. This has started me thinking - are there significant differences between the binary executables of the two languages?

For libraries to be easily used by both, I would think they'd have to be similar on an executable level.

Are there many situations where a person could examine a executable file and tell whether it was created by C or C++ source code? Or would the binaries be pretty similar?

War es hilfreich?

Lösung

In most cases, yes, it's pretty easy. Here are just a few clues that I've seen often enough to remember them easily:

  1. C++ program will typically end up with at least a few visible symbols that have been mangled.
  2. C++ program will typically have at least a few calls to virtual functions, which are typically quite distinctive from code you'll typically see in C.
  3. Many C++ compilers implement a calling convention for C++ that gives special consideration to passing the this pointer into C++ member functions. Again, since the this pointer simply doesn't exist in C, you'll rarely see a direct analog (though in some cases, they will use the same convention to pass some other pointer, so you need to be careful about this one).

Andere Tipps

A executable is a executable is a executable, no matter what language it's written in. If it's built for the target architecture, it'll run on the architecture.

The (arguably) most important difference between C and C++-compiled code, and the one relevant to libraries that can be linked both against C and C++ executables, is that of name mangling. Basically: when a library is compiled, it exports a set of symbols (function names, exported variables, etc.) that executables linked against the library can use. How these symbols are named is a fairly compiler/linker-specific, and if the subsequent executable is linked using a linker using an incompatible convention, then symbols won't resolve correctly. In addition, C and C++ have slightly different conventions. The Wikipedia article linked above has more of the details; suffice to say, when declaring exported symbols in a header file, you'll usually see a construction like:

#ifdef __cplusplus
extern "C" {
#endif

/* exported declarations here */

#ifdef __cplusplus
}
#endif

__cplusplus is a preprocessor macro only defined when compiling C++ code. The idea here is that, when using the header in C++, the compiler is instructed to use the C way of naming exported symbols (inside the "extern "C" { /* foo */ }" block, so the library can be linked both in C and C++ correctly.

I think I could tell if something is C++ or C from reading the disassembled binary code [for processor architectures that I'm familiar with, x86, x86_64 and ARM]. But in reality, there isn't much difference, you'd have to look pretty hard to know for sure.

Signs to look for are "indirect calls" (function pointer calls via a table) and this-pointers. Although C can have pointer to struct arguments and will often use function pointers, it's not usually set up in the way that C++ does it. Also, you'll notice, sometimes, that the compiler takes a pointer to a struct and adds a small offset - that's removing the outer layer of an inherited class. This CAN happen in C as well, but it won't be as common/distinctive.

Looking just at the binary [unless you can "do disassembly in your head" would be a lot harder - especially if it's been stripped of symbols - that's like the guy who could tell you what classical music something was on an old Vinyl record from looking at the tracks [with the label hidden] - not something most people can do, even if they are "good".

In practice, a C program (or a C++ program) is rarely only pure standard C (or C++) (for instance the C99 standard has no mean to scan a directory). So programs use additional libraries.

On Linux, most binaries are dynamically linked. Use the ldd command to find out.

If the binary is linked to the stdc++ library, the source code is likely C++.

If only the libc.so library is linked, the source code is probably only C (but you could link statically the libstdc++.a library).

You can also use tools working on binary files (e.g. objdump, readelf, strings, nm on Linux ....) to find more about them.

The code generated by C and C++ compilers is generally the same code. There are two important differences:

  • Name mangling: Each function and global variable becomes a symbol at compile time. In C these symbol's names are the same as their names in your source code. In C++ they are being mangled a bit to allow for polymorphic code
  • Calling conventions: If you call a method in C++ the this-pointer is passed as a hidden first parameter. Other conventions might also be different such as call by reference which does not exist in C

You can use an block such as this to let the C++-compiler generate code compatible to C:

extern "C" {
    /* code */
}
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top