“wget --domains” not helping.. what am I doing wrong? [closed]
-
21-08-2019 - |
Question
I'm attempting to use wget to recursively grab only the .jpg files from a particular website, with a view to creating an amusing screensaver for myself. Not such a lofty goal really.
The problem is that the pictures are hosted elsewhere (mfrost.typepad.com), not on the main domain of the website (www.cuteoverload.com).
I have tried using "-D" to specified the allowed domains, but sadly no cute jpgs have been forthcoming. How could I alter the line below to make this work?
wget -r -l2 -np -w1 -D www.cuteoverload.com,mfrost.typepad.com -A.jpg -R.html.php.gif www.cuteoverload.com/
Thanks.
Solution
An examination of wget's man page[1] says this about -D:
Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H.
This advisory about -H looks interesting:
Enable spanning across hosts when doing recursive retrieving.
So you need merely to add the -H flag to your invocation.
(Having done this, looks like all the images are restricted to mfrost.typepad.com/cute_overload/images/2008/12/07 and mfrost.typepad.com/cute_overload/images/2008/12/08).
-- [1] Although wget's primary reference manual is in info format.