Retrieving the entire html with external js/css/images through javascript

https://stackoverflow.com/questions/11415518

20-06-2021
|

Question

I already have a Javascript file (performing some functions), that will be appended to a webpage. Now I want the Javascript to collect the entire webpage along with its html tags, images, external Javascript files and external css files. I don't want to use Jquery/any other external library here.

My motive is to get the entire webpage, save it, and display it as similar as the original one.

Is this possible with Javascript?

Any help will be greatly appreciated.

Solution

Short Answer - No

No, it's not possible with JavaScript, especially the "saving" part, as JavaScript doesn't have file access rights in browser environments (which we assume here), except when developing browser extensions or when explicitly modifying your browser's security properties to allow this.

Long Answer - If You Reall Must: The Long and Winding Road...

Loading the Right Content

First you need to figure out whether you want to fetch the page in its static status (as it is sent by the server on the first page load), or in its currently rendered status (after it's been rendered in the browser, and that scripts have executed and may have added content to the page).

Loading Resources

Then you'll need to iterate over all the elements of the DOM, and fetch all external resources (including the ones referenced in CSS files).

You'll probably want to have all resources fetch using HTML or plain-text mime-types in your requests, as otherwise your browser might trigger visible downloads with end-user popups, and not at all perform your transparent downloads.

Updating all references

Next you need to figure out how you'd want to organize your "downloaded" content, and where to put the resources and how to name them to avoid conflicts.

Once done, you need to iterate over all the DOM elements again and update the references to use the paths of your local resources instead of your local resources.

Writing Content to Disk

Now the last bit is to save all these resources to disk, using either your browser's custom APIs or the HTML5 File System APIs.

Here Be Dragons

None of this guarantees that you'll achieve what you want, as some pages could still contain code that won't behave nicely once downloaded like this. There may be code requesting content from remote URLs or assuming some directory structures and endpoints, or using resource names that you may have modified, etc... (that would be strange, but is not that uncommon).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow