Question

I'm trying to scrape web page via C# application, but it keeps responding

"The remote server returned an error: (404) Not Found."

The web page is accesible through browser, but the app keeps failing. Any help appreciated.

var d = DateTime.UtcNow.Date;
var AddressString = @"http://www.booking.com/searchresults.html?src=searchresults&si=ai%2Cco%2Cci%2Cre%2Cdi&ss={0}&checkin_monthday={1}&checkin_year_month={2}&checkout_monthday={3}&checkout_year_month={4}";
var URi = String.Format(AddressString, "Prague", d.Day, d.Year + "-" + d.Month, d.Day + 1, d.Year + "-" + d.Month);
var request = (HttpWebRequest)WebRequest.Create(URi);
request.Timeout = 5000;
request.UserAgent = "Fiddler"; //I tried to set next three rows not to be null
request.Credentials = CredentialCache.DefaultCredentials;
request.Proxy = WebProxy.GetDefaultProxy();
try
{
    var response = (HttpWebResponse)request.GetResponse();
}
catch(WebException e)
{
    var response = (HttpWebResponse)e.Response; //e.Response contains WebPage, but it is incomplete
    StreamReader sr = new StreamReader(response.GetResponseStream());
    HtmlDocument doc = new HtmlDocument();
    doc.Load(sr);
    var a = doc.DocumentNode.SelectNodes("div[@class='resut-details']"); //fails, as not all desired nodes arent in response
 }

EDIT:

Hi guys, thx for suggestions.

I added header: "Accept-Encoding: gzip,deflate,sdch" according to David Martins reply, but it didn't helped on its own.

I used Fidller to try to get any info about the problem, but I saw that app for the first time and it didn't made me any smarter. On the other hand, I tried to change request.UserAgent to that which is sent by my browser ("User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36";) and voila, I am not getting 404 exception anymore, but the document is not readable, as it is filled with such chars: ¿½O~���G�. I tried setting request.TransferEncoding = "UTF-8", but to enable this propperty, request.SendChunked must be set to true, which ends in

ProtocolViolationException

Additional information: Content-Length or Chunked Encoding cannot be set for an operation that does not write data.

EDIT 2: I'm forgetting something and I can't figure out what. I'm getting somehow encoded response and need to decode it first to read it correctly. Even in Fiddler, when I want to see response, I need to confirm decoding to inspect result. After I decode it in fiddler, I'm getting just what I want to get into my application...

Était-ce utile?

La solution

So, after trying suggestions from Jon Skeet and David Martin I got somewhere further and found relevant answer on new question in another toppic. If anyone ever looked for sth similar, answer is here:

.NET: Is it possible to get HttpWebRequest to automatically decompress gzip'd responses?

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top