Question

I'm curious, how come I get 404 error running this line:

urllib2.urlopen("http://localhost/new-post#comment-29")

While everything works fine surfing http://localhost/new-post#comment-29 in any browser...

urlopen method does not parse urls with "#" in it?

Anybody knows?

Was it helpful?

Solution

In the HTTP protocol, the fragment (from # onwards) is not sent to the server across the network: it's locally retained by the browser and used, once the server's response is fully received, to somehow "visually locate" the exact spot in the page to be shown as "current" (for example, if the returned page is in HTML, this will be done by parsing the HTML and looking for the first suitable <a> flag).

So, the procedure is: remove the fragment e.g. via urlparse.urlparse; use the rest to fetch the resource; parse it appropriately based on the server response's content-type header; then take whatever visual action your program does regarding the "current spot" on the resource, based on locating within the parsed resource the fragment you retained in the first step.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top