Domanda

I'm having a little problem which looks very simple... but I just don't get it! I try to download the website content of: http://cspsp.gshi.org/ (if you try to access it via www.cspsp.gshi.org you get to the wrong page....)

For this I do it like that in Powershell:
(New-Object System.Net.WebClient).DownloadFile( 'http://cspsp.gshi.org/', 'save.htm' )

I can acess the website with Firefox and download its contents easily but Powershell always outputs something like that:
The remoteserver returned an Error: (404) Nothing found. (translated from German)

I'm not sure what I'm doing wrong here. Other websites like Google just work fine.

Thanks for all help!

È stato utile?

Soluzione

It appears that the site relies on the User-Agent request headers being sent by HTTP clients, and that System.Net.WebClient doesn't send even a default value (at least, it didn't when I hit my own, local servers.)

Either way, this worked for me:

$request = (New-Object System.Net.WebClient)
$request.headers['User-Agent'] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.40 Safari/537.17"
$request.DownloadFile('http://cspsp.gshi.org/', 'saved.html')

Hope this helps. :D

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top