Question

So my employer has this old .NET / C# program that needs to be rewritten and for which the source code has been lost. It was developed by an ex employee but they haven't been here in years. Maybe it was an oversight on their part or maybe it wasn't - at this point it doesn't really matter.

Anyway, so I'm trying to figure out what it does and that's kinda gotten me thinking about the limitations of .NET decompilers.

Is trying to decompile .NET like trying make a minified js file readable? With a minified js you could indent the code to some pre-determined coding standard and you could rename variables to match what the functions that are assigning values to them are named but you'd still be losing a lot of info. You'd be losing the actual variable names and any comments that the developer had made.

Is that a fair analogy?

It seems that that's either what's going on in my case or else the dev really didn't leave any comments and he really did name half of the variables based on their type and not their application (which would be consistent with systems hungarian I guess).

Was it helpful?

Solution 2

Local variable names are not required for reflection, so they get tossed out. In fact, on the bytecode level there are no actual local variables, just stack locations. Same about comments - they are not retained.

You can use MSIL Disassembler (Ildasm.exe) to see what actually remains in the executable. The names strNN and so on are generated by the decompiler in attempt to help you a bit with recovering the logic of the code.

OTHER TIPS

Note: Most of what I'm saying is based on Java, but as I understand it, CLR operates in pretty much the same way.

Basically, the way it works is that the compiler converts your source code into a format known as bytecode, which can then be executed by a VM. Generally, compilers don't bother optimizing the code they generate, because it will be optimized at runtime by the VM anyway. So if the code was compiled by a standard compiler and not obfuscated, the translation to bytecode is very direct and predictable, meaning that you can decompile it into reasonable looking source.

However, you will still lose anything that is basically syntactical sugar. The compiler will only include stuff that is necessary for execution. Luckily, reflection support (and debugging if enabled) means that a lot of source level information will be preserved in the bytecode, probably through optional metadata. But stuff like whitespace and comments are not accessible even with reflection, so there is no way to recover them.

The analogy with minified JS is not exact but it is still useful. In the case of Javascript, the source files are the input to the VM, so there is no visible intermediate bytecode stage. Minification is the result of an optimizer going through and reformatting the source code, but it is still source code. On the other hand, in both cases the lost information is the result of a tool not preserving it due to it being unnecessary for execution.

If the files were obfuscated, then all that goes out the window. Obfuscators deliberately mess up the patterns introduced by the compiler and will remove all the optional metadata that they can. You can often still decompile obfuscated code, but it will be a mess and doesn't bear the helpful information of the original source such as formatting and variable names.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top