Austretende HTML-Strings mit jQuery

https://stackoverflow.com/questions/24816

09-06-2019
|

Frage

Kennt jemand eine einfache Möglichkeit, HTML von Strings zu entkommen in jQuery ? Ich muss in der Lage sein, eine beliebige Zeichenfolge übergeben und haben es richtig für die Anzeige in einer HTML-Seite entkam (JavaScript / HTML-Injection-Angriffe zu verhindern). Ich bin sicher, es ist möglich, jQuery zu erweitern, dies zu tun, aber ich weiß nicht genug über den Rahmen zur Zeit, dies zu erreichen.

Lösung

Da Sie verwenden jQuery können Sie einstellen, nur das Element text Eigenschaft:

// before:
// <div class="someClass">text</div>
var someHtmlString = "<script>alert('hi!');</script>";

// set a DIV's text:
$("div.someClass").text(someHtmlString);
// after: 
// <div class="someClass">&lt;script&gt;alert('hi!');&lt;/script&gt;</div>

// get the text in a string:
var escaped = $("<div>").text(someHtmlString).html();
// value: 
// &lt;script&gt;alert('hi!');&lt;/script&gt;

Andere Tipps

Es gibt auch die Lösung von mustache.js

var entityMap = {
  '&': '&amp;',
  '<': '&lt;',
  '>': '&gt;',
  '"': '&quot;',
  "'": '&#39;',
  '/': '&#x2F;',
  '`': '&#x60;',
  '=': '&#x3D;'
};

function escapeHtml (string) {
  return String(string).replace(/[&<>"'`=\/]/g, function (s) {
    return entityMap[s];
  });
}

$('<div/>').text('This is fun & stuff').html(); // "This is fun &amp; stuff"

Quelle: http: // debuggable.com/posts/encode-html-entities-with-jquery:480f4dd6-13cc-4ce9-8071-4710cbdd56cb

Wenn Sie sich für HTML entkommen, gibt es nur drei, die ich daran denken kann, wäre wirklich notwendig:

html.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");

auf Ihrem Anwendungsfall abhängig, können Sie auch Dinge tun müssen, wie " ". Wenn die Liste groß genug bekommt, ich würde nur ein Array verwenden:

var escaped = html;
var findReplace = [[/&/g, "&amp;"], [/</g, "&lt;"], [/>/g, "&gt;"], [/"/g, "&quot;"]]
for(var item in findReplace)
    escaped = escaped.replace(findReplace[item][0], findReplace[item][1]);

encodeURIComponent() entweicht es nur für URLs, nicht für HTML.

Ich schrieb eine winzig kleine Funktion, die dies tut. Es entweicht nur ", &, < und > (aber in der Regel, das ist alles, was Sie sowieso benötigen). Es ist etwas elegantere dann die früher vorgeschlagenen Lösungen, dass sie nur verwendet ein .replace() alle um die Konvertierung zu tun. ( EDIT 2:. Reduzierte Komplexität des Codes macht die Funktion noch kleiner und übersichtlicher, wenn Sie neugierig sind über den ursprünglichen Code siehe Ende dieser Antwort)

function escapeHtml(text) {
    'use strict';
    return text.replace(/[\"&<>]/g, function (a) {
        return { '"': '&quot;', '&': '&amp;', '<': '&lt;', '>': '&gt;' }[a];
    });
}

Das ist einfach Javascript, kein jQuery verwendet.

Escaping `/` und `'` zu

Bearbeiten in Reaktion auf mklement 's Kommentar.

Die obige Funktion kann leicht mit jedem Charakter erweitert werden. Um mehr Zeichen angeben zu entkommen, einfach einfügen, sie beide in der Zeichenklasse in dem regulären Ausdruck (das heißt innerhalb der /[...]/g) und als ein Eintrag in dem chr Objekt. ( EDIT 2:. Verkürzte diese Funktion auch in der gleichen Art und Weise)

function escapeHtml(text) {
    'use strict';
    return text.replace(/[\"&'\/<>]/g, function (a) {
        return {
            '"': '&quot;', '&': '&amp;', "'": '&#39;',
            '/': '&#47;',  '<': '&lt;',  '>': '&gt;'
        }[a];
    });
}

Beachten Sie die obige Verwendung von ' für Apostroph (die symbolische Einheit ' könnte stattdessen verwendet wurde - in XML definiert ist, wurde aber ursprünglich nicht in der HTML-Spezifikation enthalten und kann daher nicht von allen Browsern unterstützt werden. Siehe: < a href = "http://en.wikipedia.org/wiki/Character_encodings_in_HTML#HTML_character_references"> Wikipedia-Artikel über HTML-Zeichenkodierungen ). Ich erinnere mich auch irgendwo gelesen, dass mit Dezimal-Einheiten mehr ist weitgehend unterstützt als hexadezimal, aber ich kann nicht, dass jetzt aber die Quelle zu finden scheinen. (Und es kann nicht viele Browser gibt, die nicht die hexadezimalen Entitäten nicht unterstützt.)

Hinweis: / und ' in der Liste der geschützten Zeichen Hinzufügen ist nicht alles, was nützlich, da sie keine besondere Bedeutung in HTML und nicht Notwendigkeit zu entkommen.

Original `escapeHtml` Funktion

EDIT 2: Die ursprüngliche Funktion verwendet, um eine Variable (chr), um das Objekt zu speichern, für den .replace() Rückruf erforderlich ist. Diese Variable benötigt auch eine zusätzliche anonyme Funktion Umfang es, so dass die Funktion (unnötig) ein wenig größer und komplexer.

var escapeHtml = (function () {
    'use strict';
    var chr = { '"': '&quot;', '&': '&amp;', '<': '&lt;', '>': '&gt;' };
    return function (text) {
        return text.replace(/[\"&<>]/g, function (a) { return chr[a]; });
    };
}());

Ich habe nicht, welche der beiden Versionen sind schneller getestet. Wenn Sie das tun, können Sie hier Infos und Links zu ihm hinzuzufügen.

Einfach genug Strich zu verwenden:

_.escape(string)

Unders ist ein Dienstprogramm-Bibliothek, die eine Menge von Funktionen bietet, die native js nicht bieten. Es gibt auch lodash , die die gleiche API wie Strich ist aber wurde neu geschrieben performanter sein.

Ich weiß, wie spät ich zu dieser Party bin, aber ich habe eine sehr einfache Lösung, die jQuery nicht erforderlich.

escaped = new Option(unescaped).innerHTML;

Edit: Das ist nicht zitiert entkommen. Der einzige Fall, in dem Anführungszeichen brauchen würde entgangen werden soll, wenn der Inhalt eingefügt werden wird inline auf ein Attribut in einem HTML-String. Es ist schwer für mich, einen Fall vorstellen, wo dies zu tun wäre ein gutes Design.

Edit 2: Wenn die Leistung entscheidend ist, ist die höchste Performance-Lösung (um ca. 50%) nach wie vor eine Reihe von regex ersetzt. Moderne Browser erkennt, dass die regulären Ausdrücke keine Operatoren enthalten, nur einen String, und sie alle in einem einzigen Arbeitsgang zusammenfallen.

Here is a clean, clear JavaScript function. It will escape text such as "a few < many" into "a few < many".

function escapeHtmlEntities (str) {
  if (typeof jQuery !== 'undefined') {
    // Create an empty div to use as a container,
    // then put the raw text in and get the HTML
    // equivalent out.
    return jQuery('<div/>').text(str).html();
  }

  // No jQuery, so use string replace.
  return str
    .replace(/&/g, '&amp;')
    .replace(/>/g, '&gt;')
    .replace(/</g, '&lt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&apos;');
}

After last tests I can recommend fastest and completely cross browser compatible native javaScript (DOM) solution:

function HTMLescape(html){
    return document.createElement('div')
        .appendChild(document.createTextNode(html))
        .parentNode
        .innerHTML
}

If you repeat it many times you can do it with once prepared variables:

//prepare variables
var DOMtext = document.createTextNode("test");
var DOMnative = document.createElement("span");
DOMnative.appendChild(DOMtext);

//main work for each case
function HTMLescape(html){
  DOMtext.nodeValue = html;
  return DOMnative.innerHTML
}

Look at my final performance comparison (stack question).

Try Underscore.string lib, it works with jQuery.

_.str.escapeHTML('<div>Blah blah blah</div>')

output:

'&lt;div&gt;Blah blah blah&lt;/div&gt;'

I've enhanced the mustache.js example adding the escapeHTML() method to the string object.

var __entityMap = {
    "&": "&amp;",
    "<": "&lt;",
    ">": "&gt;",
    '"': '&quot;',
    "'": '&#39;',
    "/": '&#x2F;'
};

String.prototype.escapeHTML = function() {
    return String(this).replace(/[&<>"'\/]/g, function (s) {
        return __entityMap[s];
    });
}

That way it is quite easy to use "Some <text>, more Text&Text".escapeHTML()

escape() and unescape() are intended to encode / decode strings for URLs, not HTML.

Actually, I use the following snippet to do the trick that doesn't require any framework:

var escapedHtml = html.replace(/&/g, '&amp;')
                      .replace(/>/g, '&gt;')
                      .replace(/</g, '&lt;')
                      .replace(/"/g, '&quot;')
                      .replace(/'/g, '&apos;');

If you have underscore.js, use _.escape (more efficient than the jQuery method posted above):

_.escape('Curly, Larry & Moe'); // returns: Curly, Larry &amp; Moe

If your're going the regex route, there's an error in tghw's example above.

<!-- WON'T WORK -  item[0] is an index, not an item -->

var escaped = html; 
var findReplace = [[/&/g, "&amp;"], [/</g, "&lt;"], [/>/g,"&gt;"], [/"/g,
"&quot;"]]

for(var item in findReplace) {
     escaped = escaped.replace(item[0], item[1]);   
}


<!-- WORKS - findReplace[item[]] correctly references contents -->

var escaped = html;
var findReplace = [[/&/g, "&amp;"], [/</g, "&lt;"], [/>/g, "&gt;"], [/"/g, "&quot;"]]

for(var item in findReplace) {
     escaped = escaped.replace(findReplace[item[0]], findReplace[item[1]]);
}

This is a nice safe example...

function escapeHtml(str) {
    if (typeof(str) == "string"){
        try{
            var newStr = "";
            var nextCode = 0;
            for (var i = 0;i < str.length;i++){
                nextCode = str.charCodeAt(i);
                if (nextCode > 0 && nextCode < 128){
                    newStr += "&#"+nextCode+";";
                }
                else{
                    newStr += "?";
                }
             }
             return newStr;
        }
        catch(err){
        }
    }
    else{
        return str;
    }
}

You can easily do it with vanilla js.

Simply add a text node the document. It will be escaped by the browser.

var escaped = document.createTextNode("<HTML TO/ESCAPE/>")
document.getElementById("[PARENT_NODE]").appendChild(escaped)

(function(undefined){
    var charsToReplace = {
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;'
    };

    var replaceReg = new RegExp("[" + Object.keys(charsToReplace).join("") + "]", "g");
    var replaceFn = function(tag){ return charsToReplace[tag] || tag; };

    var replaceRegF = function(replaceMap) {
        return (new RegExp("[" + Object.keys(charsToReplace).concat(Object.keys(replaceMap)).join("") + "]", "gi"));
    };
    var replaceFnF = function(replaceMap) {
        return function(tag){ return replaceMap[tag] || charsToReplace[tag] || tag; };
    };

    String.prototype.htmlEscape = function(replaceMap) {
        if (replaceMap === undefined) return this.replace(replaceReg, replaceFn);
        return this.replace(replaceRegF(replaceMap), replaceFnF(replaceMap));
    };
})();

No global variables, some memory optimization. Usage:

"some<tag>and&symbol©".htmlEscape({'©': '&copy;'})

result is:

"some&lt;tag&gt;and&amp;symbol&copy;"

function htmlEscape(str) {
    var stringval="";
    $.each(str, function (i, element) {
        alert(element);
        stringval += element
            .replace(/&/g, '&amp;')
            .replace(/"/g, '&quot;')
            .replace(/'/g, '&#39;')
            .replace(/</g, '&lt;')
            .replace(/>/g, '&gt;')
            .replace(' ', '-')
            .replace('?', '-')
            .replace(':', '-')
            .replace('|', '-')
            .replace('.', '-');
    });
    alert(stringval);
    return String(stringval);
}

2 simple methods that require NO JQUERY...

You can encode all characters in your string like this:

function encode(e){return e.replace(/[^]/g,function(e){return"&#"+e.charCodeAt(0)+";"})}

Or just target the main characters to worry about &, line breaks, <, >, " and ' like:

function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}

var myString='Encode HTML entities!\n"Safe" escape <script></'+'script> & other tags!';

test.value=encode(myString);

testing.innerHTML=encode(myString);

/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/

<p><b>What JavaScript Generated:</b></p>

<textarea id=test rows="3" cols="55"></textarea>

<p><b>What It Renders Too In HTML:</b></p>

<div id="testing">www.WHAK.com</div>

Plain JavaScript escaping example:

function escapeHtml(text) {
    var div = document.createElement('div');
    div.innerText = text;
    return div.innerHTML;
}

escapeHtml("<script>alert('hi!');</script>")
// "&lt;script&gt;alert('hi!');&lt;/script&gt;"

function htmlDecode(t){
   if (t) return $('<div />').html(t).text();
}

works like a charm

This answer provides the jQuery and normal JS methods, but this is shortest without using the DOM:

unescape(escape("It's > 20% less complicated this way."))

Escaped string: It%27s%20%3E%2020%25%20less%20complicated%20this%20way.

If the escaped spaces bother you, try:

unescape(escape("It's > 20% less complicated this way.").replace(/%20/g, " "))

Escaped string: It%27s %3E 20%25 less complicated this way.

Unfortunately, the escape() function was deprecated in JavaScript version 1.5. encodeURI() or encodeURIComponent() are alternatives, but they ignore ', so the last line of code would turn into this:

decodeURI(encodeURI("It's > 20% less complicated this way.").replace(/%20/g, " ").replace("'", '%27'))

All major browsers still support the short code, and given the number of old websites, i doubt that will change soon.

If you are saving this information in a database, its wrong to escape HTML using a client-side script, this should be done in the server. Otherwise its easy to bypass your XSS protection.

To make my point clear, here is a exemple using one of the answers:

Lets say you are using the function escapeHtml to escape the Html from a comment in your blog and then posting it to your server.

var entityMap = {
    "&": "&amp;",
    "<": "&lt;",
    ">": "&gt;",
    '"': '&quot;',
    "'": '&#39;',
    "/": '&#x2F;'
  };

  function escapeHtml(string) {
    return String(string).replace(/[&<>"'\/]/g, function (s) {
      return entityMap[s];
    });
  }

The user could:

Edit the POST request parameters and replace the comment with javascript code.
Overwrite the escapeHtml function using the browser console.

If the user paste this snippet in the console it would bypass the XSS validation:

function escapeHtml(string){
   return string
}

All solutions are useless if you dont prevent re-escape, e.g. most solutions would keep escaping & to &.

escapeHtml = function (s) {
    return s ? s.replace(
        /[&<>'"]/g,
        function (c, offset, str) {
            if (c === "&") {
                var substr = str.substring(offset, offset + 6);
                if (/&(amp|lt|gt|apos|quot);/.test(substr)) {
                    // already escaped, do not re-escape
                    return c;
                }
            }
            return "&" + {
                "&": "amp",
                "<": "lt",
                ">": "gt",
                "'": "apos",
                '"': "quot"
            }[c] + ";";
        }
    ) : "";
};

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow

Austretende HTML-Strings mit jQuery

Escaping / und ' zu

Original escapeHtml Funktion

Escaping `/` und `'` zu

Original `escapeHtml` Funktion