سؤال

How do I diff two binary files?

I have two versions of a program, version 1 and version 2. I've made a small number of changes between the two version, but unfortunately haven't been backing up regularly, and so although I've got the source for version 2, I only have the binary of version 1. I need to find out what, exactly, I changed between the two versions. I've tried creating an objdump of the two versions and then using diff to find the changes, but that doesn't work because the offsets are different, and so diff considers almost every line to have changed.

For example, one line might be bgez v0,4074d0<daemonize+0xd4> in version 1, and bgez v0,4073d4<daemonize+0xd4> in version 2. These are copied directly from the dump files - you can see the two lines do the same thing, but diff can't distinguish them. The files are too big for me to examine every line manually; How do I detect functionality changes, while ignoring differences in offset?

هل كانت مفيدة؟

المحلول 2

I eventually solved this by removing the raw instructions and offset markers so I only had the assembly, then using sed to strip out every digit, and filtering diff to ignore changes consisting of only 1 line. I was a little surprised that it worked, but it did.

نصائح أخرى

It is possible. I am currently working on a project that is capable of searching function and memory pointer addresses from a compiled file within a new/modified binary file. It supports windows PE and ELF binaries on x86 and x86_64. There is also a paper describing the approach. It works good for my reversing project, where I have to update all hooks and memory addresses frequently when binary updates are made. But there are other use-cases as well.

Check it out here.

The trick is that it does not rely on weak text comparisons, it disassembles the binaries and compares all functions by measuring the geometric distance between them using code metrics.

Short answer: you can't.

Long answer: write your own diff that can ignore one or either of the operands of the opcodes if they are a numerical immediate value.

You could use sed or awk (or perl or...) to write a filter to make all the offsets identical before running diff. Writing such a filter is left as an exercise to the reader. :-P

Sure something like bsdiff would do the job?

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top