how to save a public html page with all media and preserve structure

https://stackoverflow.com/questions/79612

09-06-2019
|

Question

Looking for a Linux application (or Firefox extension) that will allow me to scrape an HTML mockup and keep the page's integrity. Firefox does an almost perfect job but doesn't grab images referenced in the CSS.

The Scrabbook extension for Firefox gets everything, but flattens the directory structure.

I wouldn't terribly mind if all folders became children of the index page.

Solution

See Website Mirroring With wget

wget --mirror –w 2 –p --HTML-extension –-convert-links http://www.yourdomain.com

OTHER TIPS

Have you tried wget?

wget -r does what you want, and if not, there are plenty of flags to configure it. See man wget.

Another option is curl, which is even more powerful. See http://curl.haxx.se/.

Teleport Pro is great for this sort of thing. You can point it at complete websites and it will download a copy locally maintaining directory structure, and replacing absolute links with relative ones as necessary. You can also specify whether you want content from other third-party websites linked to from the original site.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow