문제

It's possible to convert the XML to UTF-8 encoding in Delphi 6?
Currently that's what I am doing:

  • Fill TXMLDocument with AnsiString
  • At the end convert the Data to UTF-8 by using WideStringVariable = AnsiToUtf8(Doc.XML.Text);
  • Save the value of WideStringVariable to file using TFileStream and Adding BOM for UTF8 at the file beggining.

CODE:

Procedure SaveAsUTF8( const Name:String; Data: TStrings );

const
  cUTF8 = $BFBBEF;
var
  W_TXT: WideString;
  fs: TFileStream;
  wBOM: Integer;
begin
  if TRIM(Data.Text) <> '' then begin    
    W_TXT:= AnsiToUTF8(Data.Text);
    fs:= Tfilestream.create( Name, fmCreate );
    try
      wBOM := cUTF8;
      fs.WriteBUffer( wBOM, sizeof(wBOM)-1);
      fs.WriteBuffer( W_TXT[1], Length(W_TXT)*Sizeof( W_TXT[1] ));
    finally
      fs.free
    end;
  end;
end;

If I open the file in Notepad++ or another editor that detects encoding, it shows me UTF-8 with BOM. However, it seems like the text it's not properly encoded.

What is wrong and how can I fix it?

UPDATE: XML Properties:

XMLDoc.Version := '1.0';
XMLDoc.Encoding := 'UTF-8';
XMLDoc.StandAlone := 'yes';
도움이 되었습니까?

해결책

You can save the file using standard SaveToFile method over the TXMLDocument variable: http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/XMLDoc_TXMLDocument_SaveToFile.html

Whether the file would be or not UTF8 you have to check using local tools like aforementioned Notepad++ or Hex Editor or anything else.


If you insist of using intermediate string and file stream, you should use the proper variable. AnsiToUTF8 returns UTF8String type and that is what to be used. Compiling `WideStringVar := AnsiStringSource' would issue compiler warning and

It is a proper warning. Googling for "Delphi WideString" - or reading Delphi manuals on topic - shows that WideString aka Microsoft OLE BSTR keeps data in UTF-16 format. http://delphi.about.com/od/beginners/l/aa071800a.htm Thus assignment UTF16 string <= 8-bit source would necessarily convert data and thus dumping WideString data can not be dumping UTF-8 text by the definition of WideString

Procedure SaveAsUTF8( const Name:String; Data: TStrings );
const
  cUTF8: array [1..3] of byte = ($EF,$BB,$BF)
var
  W_TXT: UTF8String;
  fs: TFileStream;
  Trimmed: AnsiString;
begin
  Trimmed := TRIM(Data.Text);
  if Trimmed <> '' then begin    
    W_TXT:= AnsiToUTF8(Trimmed);
    fs:= TFileStream.Create( Name, fmCreate );
    try
      fs.WriteBuffer( cUTF8[1], sizeof(cUTF8) );
      fs.WriteBuffer( W_TXT[1], Length(W_TXT)*Sizeof( W_TXT[1] ));
    finally
      fs.free
    end;
  end;
end;

BTW, this code of yours would not create even empty file if the source data was empty. It looks rather suspicious, though it is you to decide whether that is an error or not wrt the rest of your program.


The proper "uploading" of received file or stream to web is yet another issue (to be put as a separate question on Q&A site like SO), related to testing conformance with HTTP. As a foreword, you can readsome hints at WWW server reports error after POST Request by Internet Direct components in Delphi

다른 팁

In order to have the correct encoding inside the document, you should set it by using the Encoding property in your XML Document, like this:

myXMLDocument.Encoding := 'UTF-8';

I hope this helps.

You simply need to call the SaveToFile method of the document:

XMLDoc.SaveToFile(FileName);

Since you specified the encoding already, the component will use that encoding.

This won't include a BOM, but that's generally what you want for an XML file. The content of the file will specify the encoding.


As regards your SaveAsUTF8 method, it is not needed, but it is easy to fix. And that may be instructive to you.

The problem is that you are converting to UTF-16 when you assign to a WideString variable. You should instead put the UTF-8 text into an AnsiString variable. Changing the type of the variable that you named W_TXT to AnsiString is enough.

The function might look like this:

Procedure SaveAsUTF8(const Name: string; Data: TStrings);
const    
  UTF8BOM: array [0..2] of AnsiChar = #$EF#$BB#$BF;
var
  utf8: AnsiString;
  fs: TFileStream;
begin
  utf8 := AnsiToUTF8(Data.Text);
  fs:= Tfilestream.create(Name, fmCreate);
  try
    fs.WriteBuffer(UTF8BOM, SizeOf(UTF8BOM));
    fs.WriteBuffer(Pointer(utf8)^, Length(utf8));
  finally
    fs.free;
  end;
end;

Another solution:

procedure SaveAsUTF8(const Name: string; Data: TStrings);
var
  fs: TFileStream;
  vStreamWriter: TStreamWriter;
begin
  fs := TFileStream.Create(Name, fmCreate);
  try
    vStreamWriter := TStreamWriter.Create(fs, TEncoding.UTF8);
    try
      vStreamWriter.Write(Data.Text);
    finally
      vStreamWriter.Free;
    end;
  finally
    fs.free;
  end;
end;
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top