Domanda

We have a large codebase with > 40 projects (in VS lingo) creating several DLLs/SOs (~15) and an EXE.

There are a few Utility projects which are statically linked to create the EXE and also used by most of the DLLs. Ideally, we'd want these Utility projects to be DLLs too, so that the code isn't duplicated in each of the DLLs that depend on them.

Are there any tools to do a binary analysis on the DLLs to see how much of duplication exists (code + data)? Getting an estimate on this would help.

È stato utile?

Soluzione 2

Well, on a Unix/Linux/OSX system you'd do something like

for eachfile in *.exe *.dll ; do
    nm $eachfile | sort | uniq > $eachfile.symbols.txt
done

cat *.symbols.txt | sort | uniq -c > count-duplicate-symbols.txt

sort -r count-duplicate-symbols.txt | less

The first three lines say "Dump the symbols out of each .exe and .dll file in the current directory; store each dump in a separate file. By the way, if the same line appears multiple times in a single file, just store it once."

The line beginning with cat says "Count the number of times each line appears across all the files we just produced. Write a new file named count-duplicate-symbols.txt that contains the duplicated lines with their counts."

The final line says "Sort this file by the number of duplicates (in increasing order), and pipe it to the terminal so I can read it."

If you wanted to see which source files contained the offending duplicate symbols, you could use grep for that.

Notice that this approach probably won't work for static symbols (functions and variables), and it may produce false positives for things like inline functions which are supposed to appear everywhere. You could filter out symbols appearing in linkonce sections, prettyprint the output with c++filt, etc. etc.

Some of these tools are definitely available for Windows. I don't know if they all are.

Altri suggerimenti

No tools, just the one between your ears. You want to focus on the projects that link a static library, find the ones where the same static library is used more than once. That's the start point for assuming that a function can be linked in more than once.

Then you can use the linker's /VERBOSE option, it shows you which functions are getting linked in from the static library. There's a lot of output from that option, it is however brief and easy to parse.

As an alternative, consider using the linker's /MAP option to generate a .map file. Which shows in detail which functions got linked into the final executable. Having the same function appear more than once in different .map files is your lead that it might be beneficial to put it in a DLL instead. Writing a little program in your favorite scripting language that processes the /VERBOSE output or .map files and finds matches is feasible.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top