Question

Is it possible to recursively download files from specific TLDs with wget?

Specifically, I'm trying to download the full text of the Code of Massachusetts Regulations. The actual text of the regulations is stored in multiple files across multiple domains—so I'd like to start the recursive download from the index page, but only follow links to .gov and .us domains.

Was it helpful?

Solution

With help from the wget documentation on spanning hosts, I was able to make this work with the -H and -D flags:

wget -r -l5 -H -D.us,.gov http://www.lawlib.state.ma.us/source/mass/cmr/index.html
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top