Question

i am trying to construct a (test) WideString of:

á (U+00E1 Small Letter Latin A with acute)

but using it's decomposed form:

LATIN SMALL LETTER A (U+0061) COMBINING ACUTE ACCENT (U+0301)

So i have the code fragment:

var
    test: WideString;
begin
   test := #$0061#$0301;
   MessageBoxW(0, PWideChar(test), 'Character with diacratic', MB_ICONINFORMATION or MB_OK);
end;

Except it doesn't appear to work:

enter image description here

This could be a bug in MessageBox, but i'm going to go ahead and say that it's more likely the bug is in my code.

Some other variations i have tried:

test := WideString(#$0061#$0301);


const
    SmallLetterLatinAWithAcuteDecomposed: WideString = #$0061#$0301;
test := SmallLetterLatinAWithAcuteDecomposed


test := #$0061+#$0301;  (Doesn't compile; incompatible types)


test := WideString(#$0061)+WideString(#$0301);  (Doesn't compile; crashes compiler)


test := 'a'+WideString(#$0301);  (Doesn't compile; crashes compiler)


//Arnauld's thought:
test := #$0301#$0061;

Bonus chatter

Was it helpful?

Solution

Best answer:

const
    n: WideString = '';  //n=Nothing

s := n+#$0061+#$0301;

This fixes all cases i have below that otherwise fail.


The only variant that works is to declare it as a constant:

AccentAcute: WideString = #$0301;
AccentAcute: WideString = WideChar($0301);
AccentAcute: WideString = WideChar(#$0301);
AccentAcute: WideString = WideString(#$0301);

Sample Usage:

s := 'Pasta'+AccentAcute;

Constant based syntaxes that do not work

  • AccentAcute: WideString = $0301;
    incompatible types
  • AccentAcute: WideString = #0301;
    gives enter image description here
  • AccentAcute: WideString = WideString($0301);
    invalid typecast
  • AccentAcute: WideString = WideString(#$0301);
    invalid typecast
  • AccentAcute: WideChar = WideChar(#0301); gives Pastai
  • AccentAcute: WideChar = WideChar($0301); gives Pasta´

Other syntaxes that fail

  • 'Pasta'+WideChar($0301)
    gives Pasta´
  • 'Pasta'+#$0301
    gives Pasta´
  • WideString('Pasta')+#$0301
    gives enter image description here

Summary of all constant based syntaxes i found think up:

AccentAcute: WideString =            #$0301;   //works
AccentAcute: WideString =   WideChar(#$0301);  //works
AccentAcute: WideString = WideString(#$0301);  //works
AccentAcute: WideString =             $0301;   //incompatble types
AccentAcute: WideString =    WideChar($0301);  //works
AccentAcute: WideString =  WideString($0301);  //invalid typecast

AccentAcute: WideChar =            #$0301;     //fails, gives Pasta´
AccentAcute: WideChar =   WideChar(#$0301);    //fails, gives Pasta´
AccentAcute: WideChar = WideString(#$0301);    //incompatible types
AccentAcute: WideChar =             $0301;     //incompatible types
AccentAcute: WideChar =    WideChar($0301);    //fails, gives Pasta´
AccentAcute: WideChar =  WideString($0301);    //invalid typecast

Rearranging WideChar can work, as long as you only append to a variable

//Works
t := '0123401234012340123';
t := t+WideChar(#$D840);
t := t+WideChar(#$DC00);

//fails
t := '0123401234012340123'+WideChar(#$D840);
t := t+WideChar(#$DC00);

//fails
t := '0123401234012340123'+WideChar(#$D840)+WideChar(#$DC00);

//works
t := '0123401234012340123';
t := t+WideChar(#$D840)+WideChar(#$DC00);

//works
t := '';
t := t+WideChar(#$D840)+WideChar(#$DC00);

//fails; gives junk
t := ''+WideChar(#$D840)+WideChar(#$DC00);

//crashes compiler
t := WideString('')+WideChar(#$D840)+WideChar(#$DC00);

//doesn't compile
t := WideChar(#$D840)+WideChar(#$DC00);

Definitely hitting against compiler nonsense; cases that weren't tested tested fully. Yes, i know David, we should upgrade.

OTHER TIPS

This works in Delphi 5/7:

var
  test: WideString;
begin

   test := WideChar($0061);
   test := test + WideChar($0301);

   MessageBoxW(0, PWideChar(test), 'Character with diacratic', MB_ICONINFORMATION or MB_OK);
end;

In short:

  • In delphi 5 and delphi 7, it does not appear that concatenating WideChars to WideString works using #$xxxx form literals.
  • # doesn't seem to work as you'd expect for unicode literals.

  • You can't just add two or more widechars in a single expression, like this:

    test := WideChar(a)+WideChar(b);  // won't compile in D5/D7.
    

Did you try #$0301#$0061 (i.e. diacritic first)?

OK.

So #$.... only handles ASCII 8 bits constants in this version.

You can just use a workaround using memory level:

type
    TWordArray  = array[1..MaxInt div SizeOf(word)-2] of word;
    // start at [1], just as WideStrings
    // or: TWordArray  = array[0..MaxInt div SizeOf(word)-1] of word;
    PWordArray = ^TWordArray;

var
  test: WideString;
begin
  test := '12'; // or SetLength(test,2);
  PWordArray(test)[1] := $61; 
  PWordArray(test)[2] := $301;
  MessageBoxW(0, pointer(test), 'Character with diacratic', MB_ICONINFORMATION or MB_OK);
end;

This will always work since you don't play with chars/widechars and such.

And it will also work as expected with Unicode version of Delphi.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top