Question

With Delphi XE4, try the following code:

procedure TForm3.Button1Click(Sender: TObject);
var
  myStr: string;
begin
  Edit1.Text := TPerlRegEx.EscapeRegExChars('test');
end;

The result (Edit1.Text) is empty.

Is this a bug or I'm missing something? I previously had no problem with this TPerlRegEx.EscapeRegExChars function with the version from regular-expressions.info pre-DelphiXE.

Update 2: Just upgrading an app written in D2010 and encountering this bug, but just wondering how such an obvious bug can exist this long... now I'm seriously considering making my code compatible to Free Pascal, but I really like the antonymous method...

Update 1: I'm using Delphi XE4 Update 1.

Was it helpful?

Solution

It appears to be a bug. If that's the case, both the XE4 and XE5 versions contain it. I've opened a QC report to report it for XE4..XE6.

The problem appears to be with the last line of the function:

Result.Create(Tmp, 0, J);

Stepping through in the debugger shows that the Tmp (a TCharArray) correctly contains 't','e','s','t', #0, #0, #0, #0 at that point, yet Result contains '' when the function actually returns, as setting a breakpoint on the end; following that line indicates that result contains '' at that point (and when the function returns).

Providing a replacement version in a class helper with a minor change to actually store the return value from the call to Create fixes the problem:

type
  TPerlRegExHelper = class helper for TPerlRegEx
  public
    class function EscapeRegExCharsEx(const S: string): string; static;
  end;

class function TPerlRegExHelper.EscapeRegExCharsEx(const S: string): string;
var
  I, J: Integer;
  Tmp: TCharArray;
begin
  SetLength(Tmp, S.Length * 2);
  J := 0;
  for I := Low(S) to High(S) do
  begin
    case S[I] of
      '.', '[', ']', '(', ')', '?', '*', '+', '{', '}', '^', '$', '|', '\':
        begin
          Tmp[J] := '\';
          Inc(j);
          Tmp[J] := S[I];
        end;
      #0:
        begin
          Tmp[J] := '\';
          Inc(j);
          Tmp[J] := '0';
        end;
      else
        Tmp[J] := S[I];
    end;
    Inc(J);
  end;
  { Result.Create(Tmp, 0, J); }  // The problem code from the original
  Result := String.Create(Tmp, 0, J);
end;

The XE3 (and the open-source version you mention) implement the logic totally differently, using the more standard manipulation of Result beginning at the first line of the function with Result := S;, and then using System.Insert as needed to add room for the escape characters.

OTHER TIPS

This is a bug introduced in the XE4 release that is still present in XE6. Previous versions were fine. It looks like the changes were made in readiness for some future switch to immutable strings.

Rather ironically the bug is caused by the string never being assigned a value at all. It's one thing to set out not to mutate a string, but quite another never to initialize it!

So to the analysis of the bug. The method in question in TPerlRegEx.EscapeRegExChars defined in the System.RegularExpressionsCore unit. This is a class function that returns a string. Its signature is:

class function EscapeRegExChars(const S: string): string;

The XE4 implementation makes but one reference to the result variable. As follows:

Result.Create(Tmp, 0, J);

Here, Tmp is an array of char containing the escaped text to be returned, and J is the length of that text.

So, it seems clear that the author intended for this code to assign to the function return variable Result. Sadly that does not occur. Why not? Well, the Create method being called is defined in the helper for string. This is TStringHelper defined in the System.SysUtils unit. There are three Create overloads and the one in play here is:

class function Create(const Value: array of Char; StartIndex: Integer; 
  Length: Integer): string; overload; static;

Note that this is a class static function. That means that it is not an instance method and has no Self pointer. So when called like this:

Result.Create(Tmp, 0, J);

It is simply a function call whose return value is ignored. It might appear that the result variable would be set but remember that this Create is a class static method. It therefore has no instance. The compiler simply uses the type of Result to resolve the method. The code is equivalent to:

string.Create(Tmp, 0, J);

Nothing more exciting than a call to a function whose return value is simply ignored. Defeated by the extended syntax that allows us to ignore function return values.

The fix to the code is simple enough. Replace that final line with

Result := string.Create(Tmp, 0, J);

You could apply the fix in a copy of the unit, and include that unit in your code. An alternative to that, my preferred option, is to use a code hook. Like this:

unit FixTPerlRegExEscapeRegExChars;

interface

implementation

uses
  System.SysUtils, Winapi.Windows, System.RegularExpressionsCore;

procedure PatchCode(Address: Pointer; const NewCode; Size: Integer);
var
  OldProtect: DWORD;
begin
  if VirtualProtect(Address, Size, PAGE_EXECUTE_READWRITE, OldProtect) then
  begin
    Move(NewCode, Address^, Size);
    FlushInstructionCache(GetCurrentProcess, Address, Size);
    VirtualProtect(Address, Size, OldProtect, @OldProtect);
  end;
end;

type
  PInstruction = ^TInstruction;
  TInstruction = packed record
    Opcode: Byte;
    Offset: Integer;
  end;

procedure RedirectProcedure(OldAddress, NewAddress: Pointer);
var
  NewCode: TInstruction;
begin
  NewCode.Opcode := $E9;//jump relative
  NewCode.Offset := NativeInt(NewAddress)-NativeInt(OldAddress)-SizeOf(NewCode);
  PatchCode(OldAddress, NewCode, SizeOf(NewCode));
end;

function EscapeRegExChars(Self: TPerlRegEx; const S: string): string;
var
  I, J: Integer;
  Tmp: TCharArray;
begin
  SetLength(Tmp, S.Length * 2);
  J := 0;
  for I := Low(S) to High(S) do
  begin
    case S[I] of
      '.', '[', ']', '(', ')', '?', '*', '+', '{', '}', '^', '$', '|', '\':
        begin
          Tmp[J] := '\';
          Inc(j);
          Tmp[J] := S[I];
        end;
      #0:
        begin
          Tmp[J] := '\';
          Inc(j);
          Tmp[J] := '0';
        end;
      else
        Tmp[J] := S[I];
    end;
    Inc(J);
  end;
  Result := string.Create(Tmp, 0, J);
end;

initialization
  RedirectProcedure(@TPerlRegEx.EscapeRegExChars, @EscapeRegExChars);

end.

Add this unit to your project and the calls to TPerlRegEx.EscapeRegExChars will start working again.

{$APPTYPE CONSOLE}

uses
  System.RegularExpressionsCore,
  FixTPerlRegExEscapeRegExChars in 'FixTPerlRegExEscapeRegExChars.pas';

begin
  Writeln(TPerlRegEx.EscapeRegExChars('test'));
  Readln;
end.

Output

test

QC#124091

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top