Question

I want to scrape all the pages and subpages of a web site (by scrape, I mean save all the online content to local HTML files).

What is the best utility to crawl all the pages? Ideally, I would like to specify how many layers deep to scrape.

Was it helpful?

Solution

You have two options:

You can use the wget command-line utility like so:

wget -rl 10

Replace the 10 with the number of levels you would like to recurse down into.

Or, you can use SiteSucker. A GUI application that recursively downloads websites. You can also specify how far down to recurse with SiteSucker.

OTHER TIPS

DeepVacuum

GUI for wget, $15.

cocoawget

(free)

Licensed under: CC-BY-SA with attribution
Not affiliated with apple.stackexchange
scroll top