Question

I've been writing some function that downloads source code of specified web page by URL:

function GetWebPage(const url: string): tStringList;
var
  idHttp: TidHttp;
begin
  Result := tStringList.Create;
  idHttp := TidHttp.Create(nil);

  // set params
  idHttp.Request.UserAgent := 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)';
  idHttp.Request.AcceptLanguage := 'ru en';
  idHttp.Response.KeepAlive := True;
  idHttp.HandleRedirects := True;
  idHttp.ConnectTimeout := 5000;
  idHttp.ReadTimeout := 5000;

  try
    try
      Result.values['responce'] := idHttp.Get(url);
    except
      Result.values['responce'] := '';
    end;

  finally
    Result.values['code'] := IntToStr(idHttp.ResponseCode);
    FreeAndNil(idHttp);
  end;

I'ts working perfectly with english URL adresses, when I specify a URL like президент.рф, iside Indy that URL transforms to ?????????.?? - (screen shot of HTTP Analyzer)

enter image description here

I've found this solution for my problem:

idHttp.IOHandler.DefStringEncoding := TEncoding.Ansi; 
// also tried - TEncoding.Unicode, TEncoding.UTF8

But it not working - when I try to call my function, I get error:

enter image description here

So, how I can force its function to work with cyrillic adresses?

Thank you.

Was it helpful?

Solution

URLs can only contain ASCII characters in them. You need to pre-format the URL to encode non-ASCII characters before then passing it to TIdHTTP. You can use the TIdURI.URLEncode() method for that purpose, eg:

Result.values['responce'] := idHttp.Get(TIdURI.URLEncode(url));
GetWebPage('http://президент.рф');

UTF-8 is commonly used for URL encodings, so it is the default encoding used by TIdURL, but not all servers use UTF-8, so if you need to use a different encoding then TIdURI.URLEncode() has an optional AByteEncoding parameter for that purpose.

With that said, international resources are better serviced using IRIs instead of URLs, but Indy does not natively support IRIs yet (that will be implemented in Indy 11).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top