This is a bug introduced in the XE4 release that is still present in XE6. Previous versions were fine. It looks like the changes were made in readiness for some future switch to immutable strings.
Rather ironically the bug is caused by the string never being assigned a value at all. It's one thing to set out not to mutate a string, but quite another never to initialize it!
So to the analysis of the bug. The method in question in TPerlRegEx.EscapeRegExChars
defined in the System.RegularExpressionsCore
unit. This is a class function that returns a string. Its signature is:
class function EscapeRegExChars(const S: string): string;
The XE4 implementation makes but one reference to the result variable. As follows:
Result.Create(Tmp, 0, J);
Here, Tmp
is an array of char
containing the escaped text to be returned, and J
is the length of that text.
So, it seems clear that the author intended for this code to assign to the function return variable Result
. Sadly that does not occur. Why not? Well, the Create
method being called is defined in the helper for string
. This is TStringHelper
defined in the System.SysUtils
unit. There are three Create
overloads and the one in play here is:
class function Create(const Value: array of Char; StartIndex: Integer;
Length: Integer): string; overload; static;
Note that this is a class static function. That means that it is not an instance method and has no Self
pointer. So when called like this:
Result.Create(Tmp, 0, J);
It is simply a function call whose return value is ignored. It might appear that the result variable would be set but remember that this Create
is a class static method. It therefore has no instance. The compiler simply uses the type of Result
to resolve the method. The code is equivalent to:
string.Create(Tmp, 0, J);
Nothing more exciting than a call to a function whose return value is simply ignored. Defeated by the extended syntax that allows us to ignore function return values.
The fix to the code is simple enough. Replace that final line with
Result := string.Create(Tmp, 0, J);
You could apply the fix in a copy of the unit, and include that unit in your code. An alternative to that, my preferred option, is to use a code hook. Like this:
unit FixTPerlRegExEscapeRegExChars;
interface
implementation
uses
System.SysUtils, Winapi.Windows, System.RegularExpressionsCore;
procedure PatchCode(Address: Pointer; const NewCode; Size: Integer);
var
OldProtect: DWORD;
begin
if VirtualProtect(Address, Size, PAGE_EXECUTE_READWRITE, OldProtect) then
begin
Move(NewCode, Address^, Size);
FlushInstructionCache(GetCurrentProcess, Address, Size);
VirtualProtect(Address, Size, OldProtect, @OldProtect);
end;
end;
type
PInstruction = ^TInstruction;
TInstruction = packed record
Opcode: Byte;
Offset: Integer;
end;
procedure RedirectProcedure(OldAddress, NewAddress: Pointer);
var
NewCode: TInstruction;
begin
NewCode.Opcode := $E9;//jump relative
NewCode.Offset := NativeInt(NewAddress)-NativeInt(OldAddress)-SizeOf(NewCode);
PatchCode(OldAddress, NewCode, SizeOf(NewCode));
end;
function EscapeRegExChars(Self: TPerlRegEx; const S: string): string;
var
I, J: Integer;
Tmp: TCharArray;
begin
SetLength(Tmp, S.Length * 2);
J := 0;
for I := Low(S) to High(S) do
begin
case S[I] of
'.', '[', ']', '(', ')', '?', '*', '+', '{', '}', '^', '$', '|', '\':
begin
Tmp[J] := '\';
Inc(j);
Tmp[J] := S[I];
end;
#0:
begin
Tmp[J] := '\';
Inc(j);
Tmp[J] := '0';
end;
else
Tmp[J] := S[I];
end;
Inc(J);
end;
Result := string.Create(Tmp, 0, J);
end;
initialization
RedirectProcedure(@TPerlRegEx.EscapeRegExChars, @EscapeRegExChars);
end.
Add this unit to your project and the calls to TPerlRegEx.EscapeRegExChars
will start working again.
{$APPTYPE CONSOLE}
uses
System.RegularExpressionsCore,
FixTPerlRegExEscapeRegExChars in 'FixTPerlRegExEscapeRegExChars.pas';
begin
Writeln(TPerlRegEx.EscapeRegExChars('test'));
Readln;
end.
Output
test
QC#124091