Come ottenere l'intero documento HTML come stringa?

https://stackoverflow.com/questions/817218

03-07-2019
|

Domanda

Esiste un modo in JS per ottenere l'intero HTML all'interno dei tag html , come una stringa?

document.documentElement.??

Soluzione

MS ha aggiunto le proprietà outerHTML e innerHTML qualche tempo fa.

Secondo MDN , <=> è supportato in Firefox 11, Chrome 0.2, Internet Explorer 4.0, Opera 7, Safari 1.3, Android, Firefox Mobile 11, IE Mobile, Opera Mobile e Safari Mobile. <=> è nella Analisi e serializzazione DOM .

Vedi quirksmode per la compatibilità del browser per ciò che funzionerà per te. Tutti supportano <=>.

var markup = document.documentElement.innerHTML;
alert(markup);

Altri suggerimenti

Puoi farlo

new XMLSerializer().serializeToString(document)

nei browser più recenti di IE 9

Vedi https://caniuse.com/#feat=xml-serializer

Credo che document.documentElement.outerHTML dovrebbe restituirlo per te.

Secondo MDN , outerHTML è supportato in Firefox 11, Chrome 0.2, Internet Explorer 4.0, Opera 7, Safari 1.3, Android, Firefox Mobile 11, IE Mobile, Opera Mobile e Safari Mobile. <=> è nella Analisi e serializzazione DOM .

La pagina MSDN sulla <=> proprietà osserva che è supportato in IE 5+. La risposta di Colin si collega alla pagina quirksmode del W3C, che offre un buon confronto della compatibilità tra browser (anche per altre funzionalità DOM).

Ho provato le varie risposte per vedere cosa viene restituito. Sto usando l'ultima versione di Chrome.

Il suggerimento document.documentElement.innerHTML; ha restituito <head> ... </body>

Il suggerimento di Gaby document.getElementsByTagName('html')[0].innerHTML; ha restituito lo stesso.

Il suggerimento document.documentElement.outerHTML; ha restituito <html><head> ... </body></html> che è tutto a parte il 'doctype'.

Puoi recuperare l'oggetto doctype con document.doctype; Questo restituisce un oggetto, non una stringa, quindi se hai bisogno di estrarre i dettagli come stringhe per tutti i tipi di documento fino a HTML5 compreso, è descritto qui: Ottieni DocType di un HTML come stringa con Javascript

Volevo solo HTML5, quindi per creare l'intero documento era sufficiente quanto segue:

alert('<!DOCTYPE HTML>' + '\n' + document.documentElement.outerHTML);

Puoi anche fare:

document.getElementsByTagName('html')[0].innerHTML

Non otterrai il tag Doctype o html, ma tutto il resto ...

document.documentElement.outerHTML

SOLO PROBABILE IE:

>     webBrowser1.DocumentText

per FF da 1.0:

//serialize current DOM-Tree incl. changes/edits to ss-variable
var ns = new XMLSerializer();
var ss= ns.serializeToString(document);
alert(ss.substr(0,300));

può funzionare in FF. (Mostra i MOLTO PRIMI 300 caratteri dall'inizio MOLTO del testo sorgente, principalmente doctype-defs.)

MA tieni presente che il normale " Salva con nome " -La finestra di dialogo di FF POTREBBE NON salvare lo stato corrente della pagina, piuttosto il X / h / tml-source-text originariamente caricato! ! (un POST-up di ss su alcuni file temporanei e il reindirizzamento a quello potrebbe fornire un testo sorgente salvabile CON le modifiche / modifiche precedentemente apportate ad esso.)

Anche se FF sorprende per il buon recupero su " indietro " e un'inclusione NICE di stati / valori su " Salva (as) ... " per campi come input, textarea ecc., non su elementi in contenteditable / designMode ...

Se NON un xhtml- resp. file xml (tipo mime, NON solo estensione del nome file!), si può usare document.open/write/close per impostare l'app. contenuto nel livello sorgente, che verrà salvato nella finestra di dialogo di salvataggio dell'utente dal menu File / Salva di FF. vedere: http://www.w3.org/MarkUp/2004/xhtml-faq#docwrite resp.

https://developer.mozilla.org/en- USA / docs / Web / API / document.write

Neutro alle domande di X (ht) ML, prova un " view-source: http: // ... " come valore dell'attributo src di un iframe (creato dallo script !?), - per accedere a un documento iframes in FF:

<iframe-elementnode>.contentDocument, vedi google " mdn contentDocument " per appr. membri, come "textContent" per esempio. 'Capito anni fa e non mi piace strisciare per questo. Se ho ancora urgente bisogno, menzionalo, che devo immergermi ...

document.documentElement.innerHTML

Uso sempre

document.getElementsByTagName('html')[0].innerHTML

Probabilmente non è il modo giusto, ma posso capirlo quando lo vedo.

Usa document.documentElement.

La stessa domanda ha risposto qui: https://stackoverflow.com/a/7289396/2164160

Per estrarre le cose anche da <html>...</html>, soprattutto dalla dichiarazione <!DOCTYPE ...>, puoi attraversare document.childNodes, trasformandoli in una stringa:

const html = [...document.childNodes]
    .map(node => nodeToString(node))
    .join('\n') // could use '' instead, but whitespace should not matter.

function nodeToString(node) {
    switch (node.nodeType) {
        case node.ELEMENT_NODE:
            return node.outerHTML
        case node.TEXT_NODE:
            // Text nodes should probably never be encountered, but handling them anyway.
            return node.textContent
        case node.COMMENT_NODE:
            return `<!--${node.textContent}-->`
        case node.DOCUMENT_TYPE_NODE:
            return doctypeToString(node)
        default:
            throw new TypeError(`Unexpected node type: ${node.nodeType}`)
    }
}

Ho pubblicato questo codice come document-outerhtml su npm.

modifica Nota che il codice sopra dipende da una funzione doctypeToString; la sua implementazione potrebbe essere la seguente (il codice seguente è pubblicato su npm come doctype-to-string ):

function doctypeToString(doctype) {
    if (doctype === null) {
        return ''
    }
    // Checking with instanceof DocumentType might be neater, but how to get a
    // reference to DocumentType without assuming it to be available globally?
    // To play nice with custom DOM implementations, we resort to duck-typing.
    if (!doctype
        || doctype.nodeType !== doctype.DOCUMENT_TYPE_NODE
        || typeof doctype.name !== 'string'
        || typeof doctype.publicId !== 'string'
        || typeof doctype.systemId !== 'string'
    ) {
        throw new TypeError('Expected a DocumentType')
    }
    const doctypeString = `<!DOCTYPE ${doctype.name}`
        + (doctype.publicId ? ` PUBLIC "${doctype.publicId}"` : '')
        + (doctype.systemId
            ? (doctype.publicId ? `` : ` SYSTEM`) + ` "${doctype.systemId}"`
            : ``)
        + `>`
    return doctypeString
}

Ho solo bisogno di doctype html e dovrei funzionare bene in IE11, Edge e Chrome. Ho usato sotto il codice funziona bene.

function downloadPage(element, event) {
    var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);

    if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {
        document.execCommand('SaveAs', '1', 'page.html');
        event.preventDefault();
    } else {
        if(isChrome) {
            element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));
        }
        element.setAttribute('download', 'page.html');
    }
}

e nel tuo tag anchor utilizzare in questo modo.

<a href="#" onclick="downloadPage(this,event);" download>Download entire page.</a>

Esempio

    function downloadPage(element, event) {
    	var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);
    
    	if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {
    		document.execCommand('SaveAs', '1', 'page.html');
    		event.preventDefault();
    	} else {
    		if(isChrome) {
                element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));
    		}
    		element.setAttribute('download', 'page.html');
    	}
    }

I just need doctype html and should work fine in IE11, Edge and Chrome. 

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

<p>
<a href="#" onclick="downloadPage(this,event);"  download><h2>Download entire page.</h2></a></p>

<p>Some image here</p>

<p><img src="https://placeimg.com/250/150/animals"/></p>

È necessario scorrere il documento childNodes e ottenere il contenuto di HTML esterno.

in VBA sembra così

For Each e In document.ChildNodes
    Put ff, , e.outerHTML & vbCrLf
Next e

usando questo, ti permette di ottenere tutti gli elementi della pagina web incluso < ! DOCTYPE & Gt; nodo se esiste

Il modo corretto è in realtà:

webBrowser1.DocumentText

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow