Delphi: Access violation when putting a string in an editbox?

https://stackoverflow.com/questions/3277572

17-09-2020
|

Question

Well, I am studing some inline assembly in Delphi and the assembly crypto routine is all going great, until I try to parse the ShortString into the Textbox.

The violation I get is as follows:

The full code is here:

procedure TForm2.Button1Click(Sender: TObject);

var
len,keylen:integer;
name, key:ShortString;

begin

name :=  ShortString(Edit1.Text);
key := '_r <()<1-Z2[l5,^';
len := Length(name);
keylen := Length(key);

nameLen := len;
serialLen := keyLen;

asm

  XOR EAX,EAX
  XOR ESI,ESI
 XOR EDX,EDX
 XOR ECX,ECX


  @loopBegin:

        MOV EAX,ESI
        PUSH $019
        CDQ
        IDIV DWORD PTR DS:[serialLen]
        MOV EAX,ESI
        POP EBX
        LEA ECX,DWORD PTR DS:[key+EDX]
        CDQ
        IDIV DWORD PTR DS:[nameLen]
        LEA EAX,DWORD PTR DS:[name]
        MOVZX EAX,BYTE PTR DS:[name+EDX]
        MOVZX EDX,BYTE PTR DS:[ECX]
        XOR EAX,EDX
        CDQ
        IDIV EBX
        ADD DL,$041
        INC ESI
        CMP ESI,DWORD PTR DS:[serialLen]
        MOV BYTE PTR DS:[ECX],DL

        JL @loopBegin


end;

edit2.Text:= TCaption(key);


end;

If i place a breakpoint on the line "edit2.Text:= TCaption(key);" I can see that the ShortString "key" has indeed been properly encrypted, but with a lot of weird characters behind it, too.

The first 16 characters is the real encryption.

encryption http://img831.imageshack.us/img831/365/29944312.png

bigger version: http://img831.imageshack.us/img831/365/29944312.png

thanks!

Solution

What the code does

For those of you that don't speak assembler, this is what the code is probably supposed to do, in Pascal. "Probably" because the original contains some bugs:

procedure TForm14.Button1Click(Sender: TObject);
var KeyLen:Integer;
    Name, Key:ShortString;
    i:Integer;
    CurrentKeyByte:Byte;
    CurrentNameByte:Byte;
begin
  Name := ShortString(Edit1.Text);
  Key := '_r <()<1-Z2[l5,^';
  keyLen := Length(key);

  asm int 3 end; // This is here so I can inspect the assembler output in the IDE
                 // for the "Optimised" version of the code

  for i:=1 to Length(Name) do
  begin
    CurrentKeyByte := Byte(Key[i mod KeyLen]);
    CurrentNameByte := Byte(Name[i]);
    CurrentNameByte := ((CurrentKeyByte xor CurrentNameByte) mod $019) + $041;
    Name[i] := AnsiChar(CurrentNameByte);
  end;

  Caption := Name;

end;

With optimizations turned on, the assembler code generated by this is actually shorter compared to the proposed code, contains no redundant code and I'm willing to bet is faster. Here are a few optimizations I noticed in the Delphi-generated code (compared to the assembler code proposed by the OP):

Delphi reversed the loop (downto 0). This saves one "CMP" instruction because the compiler can simply "DEC ESI" and loop on the zero flag.
Used "XOR EDX" and "DIV EBX" for the second division, saving some tiny amount of cycles.

Why is the provided assembler code failing?

Here's the original assembler code, with comments. The bug's at the end of the routine, at the "CMP" instruction - it's comparing ESI to the length of the KEY, not to the length of the NAME. If the KEY is longer then the NAME, "encryption" goes on over the top of NAME, overwriting stuff (amongst the stuff that gets overwritten is the NULL terminator for the string, causing the debugger to show funny chars after the correct chars).

While overwriting EBX and ESI is not allowed, this is not what's causing the code to AV, probably because the surrounding Delphi code didn't use EBX or ESI (just tried this).

asm

 XOR EAX,EAX ; Wasteful, the first instruction in Loop overwrites EAX
 XOR ESI,ESI
 XOR EDX,EDX ; Wasteful, the first CDQ instruction in Loop overwrites EDX
 XOR ECX,ECX ; Wasteful, the first LEA instruction overwrites ECX


 @loopBegin:
       ; Etering the loop, ESI holds the index for the next char to be
       ; encrypted.

       MOV EAX,ESI ; Load EAX with the index for the next char, because
                   ; we intend to do some divisions (setting up the call to IDIV)
       PUSH $019   ; ? pushing this here, so we can pop it 3 lines later... obfuscation
       CDQ         ; Sign-extend EAX (required for IDIV)
       IDIV DWORD PTR DS:[serialLen] ; Divide EAX by the length of the key.
       MOV EAX,ESI ; Load the index back to EAX, we're planning on an other IDIV. Why???
       POP EBX     ; Remember the PUSH $019?
       LEA ECX,DWORD PTR DS:[key+EDX] ; EDX is the result of "ESI mod serialLen", this
                                      ; loads the address of the current char in the
                                      ; encryption key into ECX. Dividing by serialLen
                                      ; is supposed to make sure we "wrap around" at the
                                      ; end of the key
        CDQ ; Yet some more obfuscation. We're now extending EAX into EDX in preparation for IDIV.
            ; This is obfuscation becasue the "MOV EAX, ESI" instruction could be written right here
            ; before the CDQ.
        IDIV DWORD PTR DS:[nameLen] ; We divide the current index by the length of the text
                                    ; to be encrypted. Once more the code will only use the reminder,
                                    ; but why would one do this? Isn't ESI (the index) always supposed to
                                    ; be LESS THEN nameLen? This is the first sign of trouble.
        LEA EAX,DWORD PTR DS:[name] ; EAX now holds the address of NAME.
        MOVZX EAX,BYTE PTR DS:[name+EDX] ; EAX holds the current character in name
        MOVZX EDX,BYTE PTR DS:[ECX]      ; EDX holds the current character in Key
        XOR EAX,EDX ; Aha!!!! So this is an obfuscated XOR loop! EAX holds the "name[ESI] xor key[ESI]"
        CDQ         ; We're extending EAX (the XOR result) in preparation for a divide
        IDIV EBX    ; Divde by EAX by EBX (EBX = $019). Why????
        ADD DL,$041 ; EDX now holds the remainder of our previous XOR, after the division by $019;
                    ; This is an number from $000 to $018. Adding $041 turns it into an number from
                    ; $041 to $05A (ASCII chars from "A" to "Z"). Now I get it. This is not encryption,
                    ; this is a HASH function! One can't un-encrypt this (information is thrown away at
                    ; the division).
        INC ESI     ; Prep for the next char


        ; !!! BUG !!!
        ;
        ; This is what's causing the algorithm to generate the AV. At this step the code is
        ; comparing ESI (the current char index) to the length of the KEY and loops back if
        ; "ESI < serialLen". If NAME is shorter then KEY, encryption will encrypt stuff beyond
        ; then end of NAME (up to the length of KEY). If NAME is longer then KEY, only Length(Key)
        ; bytes would be encrypted and the rest of "Name" would be ignored.
        ;
        CMP ESI,DWORD PTR DS:[serialLen]


        MOV BYTE PTR DS:[ECX],DL ; Obfuscation again. This is where the mangled char is written
                                 ; back to "Name".

        JL @loopBegin            ; Repeat the loop.

My 2 cents worth of advice

Assembler should be used for SPEED optimizations and nothing else. It looks to me as if the OP tried to use Assembler to obfuscate what the code is doing. Didn't help, it only took me a few minutes to figure out exactly what the code is doing and I'm NOT an assembler expert.

OTHER TIPS

First off, you need to preserve EDI and ESI. Only EAX, EDX, and ECX, can be used without preservation (except when you load it and need to preserve it).

Try adding some PUSH EDI, PUSH ESI and POP ESI, POP EDI around your code.

You cannot simply co-opt the registers for your own purposes in inline ASM without preserving (saving and restoring) the register contents.

In your code you are trampling over EAX (which holds "self") and EDX (which - with the default register calling convention - most likely holds "Sender").

And as I understand it other registers may also be used for local variables.

Hint: what if ESI or EAX or whatever holds Self? Your assembler trash it. Next line you're trying to use Edit2, which requires access to Self, which is... well, no longer with us.

Both compiler and you use registers. You need to play nice and cooperate with compiler, which means saving/restoring registers' values.

Otherwise I think that you need to offload assembler code to separate routine, so there will be no Pascal code, which may use registers too. Note, that you will still need to conform calling convention protocol: not all registers can be used freely.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow