ドキュメント全体のHTMLを文字列として取得する方法は？

https://stackoverflow.com/questions/817218

03-07-2019
|

質問

JSに html タグ内のHTML全体を文字列として取得する方法はありますか？

document.documentElement.??

解決

MSは、少し前にouterHTMLおよびinnerHTMLプロパティを追加しました。

MDN によると、<=>はFirefox 11、Chrome 0.2、Internet Explorer 4.0、Opera 7、Safari 1.3、Android、Firefox Mobile 11、IE Mobile、Opera Mobile、Safari Mobile。 <=>は DOM解析およびシリアル化仕様に含まれています。

ブラウザの互換性については、 quirksmode をご覧ください。すべてが<=>をサポートします。

var markup = document.documentElement.innerHTML;
alert(markup);

他のヒント

できること

new XMLSerializer().serializeToString(document)

IE 9より新しいブラウザで

https://caniuse.com/#feat=xml-serializer

document.documentElement.outerHTMLはそれを返すべきだと思います。

MDN によると、outerHTMLはFirefox 11、Chrome 0.2、Internet Explorer 4.0、Opera 7、Safari 1.3、Android、Firefox Mobile 11、IE Mobile、Opera Mobile、Safari Mobile。 <=>は DOM解析およびシリアル化仕様に含まれています。

<=>プロパティのMSDNページ IE 5+でサポートされていることに注意してください。 Colinの回答は、W3Cのquirksmodeページにリンクしています。このページでは、ブラウザー間の互換性を（他のDOM機能についても）よく比較できます。

返される内容を確認するために、さまざまな答えを試しました。最新バージョンのChromeを使用しています。

提案document.documentElement.innerHTML;が<head> ... </body>を返しました

Gabyの提案document.getElementsByTagName('html')[0].innerHTML;は同じ結果を返しました。

提案document.documentElement.outerHTML;が返しました<html><head> ... </body></html> これは「Doctype」以外のすべてです。

document.doctype;を使用してdoctypeオブジェクトを取得できます。これにより、文字列ではなくオブジェクトが返されます。したがって、HTML5までのすべてのdoctypeの詳細を文字列として抽出する必要がある場合は、ここで説明します：Javascriptを使用してHTMLのDocTypeを文字列として取得

HTML5だけが欲しかったので、ドキュメント全体を作成するには次のもので十分でした：

alert('<!DOCTYPE HTML>' + '\n' + document.documentElement.outerHTML);

次のこともできます。

document.getElementsByTagName('html')[0].innerHTML

Doctypeまたはhtmlタグは取得しませんが、その他はすべて取得します...

document.documentElement.outerHTML

おそらくIEのみ：

>     webBrowser1.DocumentText

FFから1.0まで：

//serialize current DOM-Tree incl. changes/edits to ss-variable
var ns = new XMLSerializer();
var ss= ns.serializeToString(document);
alert(ss.substr(0,300));

FFで動作する場合があります。（source-textの非常に最初から非常に最初の300文字を表示します。ほとんどはdoctype-defsです。）

しかし、通常の<！> quot; Save As <！> quot; -FF MIGHTのダイアログは、元々ロードされたX / h / tml-source-textではなく、ページの現在の状態を保存しないことに注意してください！！（いくつかの一時ファイルへのssのPOSTアップとそれにリダイレクトすると、以前に行われた変更/編集で保存可能なソーステキストが配信される場合があります。）

FFは<！> quot; back <！> quot;での良好な回復により驚きますが、 <！> quot; Save（as）... <！> quot;の状態/値のNICE包含 contentlikeable / designModeの要素ではなく、入力のようなFIELDS、textareaの場合 ...

xhtml-でない場合xml-file（mime-type、ファイル名拡張子ではありません！）、document.open/write/closeを使用してapprを設定できます。 FFの[ファイル]メニューの[保存]メニューからユーザーの保存ダイアログに保存されます。見る： http://www.w3.org/MarkUp/2004/xhtml-faq#docwrite

https://developer.mozilla.org/en- US / docs / Web / API / document.write

X（ht）MLの質問に中立、<！> quot; view-source：http：// ... <！> quot; （script-made !?）iframeのsrc-attribの値として、-FFのiframes-documentにアクセスするには：

<iframe-elementnode>.contentDocument、google <！> quot; mdn contentDocument <！> quot;を参照約たとえば、「textContent」などのメンバー。「何年も前に手に入れたので、それをforうのは好きではありません。緊急の必要性がある場合、これに言及して、私が飛び込んだ...

document.documentElement.innerHTML

私はいつも使用しています

document.getElementsByTagName('html')[0].innerHTML

おそらく正しい方法ではありませんが、見れば理解できます。

document.documentElementを使用します。

ここで回答された同じ質問： https://stackoverflow.com/a/7289396/2164160

<html>...</html>、最も重要なのは<!DOCTYPE ...>宣言の外側にも物事を取得するには、document.childNodesを調べて、それぞれを文字列に変換します：

const html = [...document.childNodes]
    .map(node => nodeToString(node))
    .join('\n') // could use '' instead, but whitespace should not matter.

function nodeToString(node) {
    switch (node.nodeType) {
        case node.ELEMENT_NODE:
            return node.outerHTML
        case node.TEXT_NODE:
            // Text nodes should probably never be encountered, but handling them anyway.
            return node.textContent
        case node.COMMENT_NODE:
            return `<!--${node.textContent}-->`
        case node.DOCUMENT_TYPE_NODE:
            return doctypeToString(node)
        default:
            throw new TypeError(`Unexpected node type: ${node.nodeType}`)
    }
}

このコードを document-outerhtml としてnpmで公開しました。

編集上記のコードは関数doctypeToStringに依存していることに注意してください。その実装は次のようになります（以下のコードはnpmで doctype-to-string ）：

function doctypeToString(doctype) {
    if (doctype === null) {
        return ''
    }
    // Checking with instanceof DocumentType might be neater, but how to get a
    // reference to DocumentType without assuming it to be available globally?
    // To play nice with custom DOM implementations, we resort to duck-typing.
    if (!doctype
        || doctype.nodeType !== doctype.DOCUMENT_TYPE_NODE
        || typeof doctype.name !== 'string'
        || typeof doctype.publicId !== 'string'
        || typeof doctype.systemId !== 'string'
    ) {
        throw new TypeError('Expected a DocumentType')
    }
    const doctypeString = `<!DOCTYPE ${doctype.name}`
        + (doctype.publicId ? ` PUBLIC "${doctype.publicId}"` : '')
        + (doctype.systemId
            ? (doctype.publicId ? `` : ` SYSTEM`) + ` "${doctype.systemId}"`
            : ``)
        + `>`
    return doctypeString
}

doctype htmlが必要なだけで、IE11、Edge、Chromeで正常に動作するはずです。私はそれが正常に動作する以下のコードを使用しました。

function downloadPage(element, event) {
    var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);

    if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {
        document.execCommand('SaveAs', '1', 'page.html');
        event.preventDefault();
    } else {
        if(isChrome) {
            element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));
        }
        element.setAttribute('download', 'page.html');
    }
}

およびアンカータグで次のように使用します。

<a href="#" onclick="downloadPage(this,event);" download>Download entire page.</a>

例

    function downloadPage(element, event) {
    	var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);
    
    	if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {
    		document.execCommand('SaveAs', '1', 'page.html');
    		event.preventDefault();
    	} else {
    		if(isChrome) {
                element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));
    		}
    		element.setAttribute('download', 'page.html');
    	}
    }

I just need doctype html and should work fine in IE11, Edge and Chrome. 

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

<p>
<a href="#" onclick="downloadPage(this,event);"  download><h2>Download entire page.</h2></a></p>

<p>Some image here</p>

<p><img src="https://placeimg.com/250/150/animals"/></p>

ドキュメントchildNodesを反復処理し、outerHTMLコンテンツを取得する必要があります。

VBAでは次のようになります

For Each e In document.ChildNodes
    Put ff, , e.outerHTML & vbCrLf
Next e

これを使用すると、<！> ltを含むWebページのすべての要素を取得できます。！DOCTYPE <！> gt;存在する場合はノード

正しい方法は実際には：

webBrowser1.DocumentText

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow