How to solve encoding issue while getting a page's source code?

https://stackoverflow.com/questions/23407237

13-07-2023
|

Frage

I was getting page source code with

          Uri url = new Uri(urlAddress);
        WebClient client = new WebClient();
        client.Encoding = System.Text.Encoding.UTF8;
        string htlm = client.DownloadString(url);

but it gives character issue at kickass.to (torrrent site) even though it writes

     "meta http-equiv="Content-Type" content="text/html; charset=utf-8""

at the source code.

also tried this method http://www.tech-recipes.com/rx/1954/get_web_page_contents_in_code_with_csharp/ to get source code which didnt work

example source code: http://pastebin.com/ycBjWLRi

How can I get source code properly?

Lösung

I noticed something about forcing character encoding in a recent article I read over at:

Html Agility Pack - Massive Information Extraction

It says you should set it up like this:

HtmlWeb htmlWeb = new HtmlWeb() { 
  AutoDetectEncoding = false, 
  OverrideEncoding = Encoding.GetEncoding("iso-8859-2") 
};

This is using Html Agility Pack which you have tagged your question with but you dont seem to have actually used it in your code example above or in the article you linked out to on tech-recipes.com.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow