Pergunta

I'm having a little problem which looks very simple... but I just don't get it! I try to download the website content of: http://cspsp.gshi.org/ (if you try to access it via www.cspsp.gshi.org you get to the wrong page....)

For this I do it like that in Powershell:
(New-Object System.Net.WebClient).DownloadFile( 'http://cspsp.gshi.org/', 'save.htm' )

I can acess the website with Firefox and download its contents easily but Powershell always outputs something like that:
The remoteserver returned an Error: (404) Nothing found. (translated from German)

I'm not sure what I'm doing wrong here. Other websites like Google just work fine.

Thanks for all help!

Foi útil?

Solução

It appears that the site relies on the User-Agent request headers being sent by HTTP clients, and that System.Net.WebClient doesn't send even a default value (at least, it didn't when I hit my own, local servers.)

Either way, this worked for me:

$request = (New-Object System.Net.WebClient)
$request.headers['User-Agent'] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.40 Safari/537.17"
$request.DownloadFile('http://cspsp.gshi.org/', 'saved.html')

Hope this helps. :D

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top