Question

I'm using HttpWebRequest to scrape Wikipedia.org. A lot of times there will be links to topics on a page that have been consolidated and therefore they redirect you to the correct page.

for example

http://en.wikipedia.org/wiki/Polish_prisoners_of_war_in_Soviet_Union_(after_1939)

redirects you to the correct topic which is

http://en.wikipedia.org/wiki/Polish_prisoners_of_war_in_the_Soviet_Union_(after_1939)

Notice the addition of the word "the".

I need to determine at this point whether or not a redirect has happened. Can anyone suggest how I might do this?

Thanks!

UPDATE

I marked the response below as answered because technically that is how you tell if you have been redirected. The problem I am having is the Wikipedia is not actually doing a hard redirect with http response codes 3xx. They are doing soft redirects which serves up different content under the same Url. I'll have to find another solution.

Was it helpful?

Solution

Try this:

if(reponse.ResponseUri != request.RequestUri) {
    //You were redirected
}

OTHER TIPS

There is a property called "AllowAutoRedirects" on the HttpWebRequest object. If you turn that off you can follow the redirects yourself.

You could also try checking the HttpWebResponse.ResponseUri.

Use the HttpWebRequest.Address property, which is explicitly defined as "the URI after any redirections that happen during the request are complete"

Note that this should be used instead of the similar HttpWebResponse.ResponseUri, as its documentation says:

Applications that need to access the last redirected ResponseUri should use the HttpWebRequest..::..Address property rather than ResponseUri, since the use of ResponseUri property may open security vulnerabilities.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top