質問

I have an experimental web crawler and I noticed that it cannot read some pages, for example on some particular domains curl says it failed after following 50 redirects but wget reads that same domain just fine:

curl 'netflix.com' -L -o 'output.txt'

Result:

curl: (47) Maximum (50) redirects followed

No data in output.txt file.

While this command works fine:

wget netflix.com

Any ideas on what can cause this? I doubt that remote server handles requests based on the two different user agents.

役に立ちましたか?

解決

This is probably because you didn't tell curl to use cookies, which it doesn't do unless you ask it to - while wget enables them by default.

Use the --cookie or --cookie-jar options to enable cookies.

他のヒント

--max-redirs is the option used to limit the number of redirects. The default, as stated, is 50.

The "47" you see there is the error code for hitting the limit of redirects.

The redirect limit for wget is 20, by default, so there is definitely something else going on as the redirect for curl is higher.

The run on my system for the same command works fine, and only has about 3 - 5 redirects.

You could use the --verbose option to track what those redirects are and perhaps compare them to the default output from wget.

Cookies are enabled by default on wget, and not with curl, as reminded by @DanielStenberg so hopefully he will answer and be accepted.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top