Codifica HTML persa quando l'attributo viene letto dal campo di input

https://stackoverflow.com/questions/1219860

10-07-2019
|

Domanda

I & # 8217; m usando JavaScript per estrarre un valore da un campo nascosto e visualizzarlo in una casella di testo. Il valore nel campo nascosto è codificato.

Ad esempio,

<input id='hiddenId' type='hidden' value='chalk &amp; cheese' />

viene trascinato in

<input type='text' value='chalk &amp; cheese' />

tramite alcuni jQuery per ottenere il valore dal campo nascosto (esso & # 8217; s a questo punto che perdo la codifica):

$('#hiddenId').attr('value')

Il problema è che quando leggo chalk & cheese dal campo nascosto, JavaScript sembra perdere la codifica. Non voglio che il valore sia chalk & cheese. Voglio che il valore amp; sia mantenuto.

Esiste una libreria JavaScript o un metodo jQuery che codificherà HTML una stringa?

Soluzione

MODIFICA: questa risposta è stata pubblicata molto tempo fa e la funzione htmlDecode ha introdotto una vulnerabilità XSS. È stato modificato cambiando l'elemento temporaneo da div a textarea riducendo la possibilità XSS. Ma al giorno d'oggi, ti incoraggio a utilizzare l'API DOMParser come suggerito in altra risposta .

Uso queste funzioni:

function htmlEncode(value){
  // Create a in-memory element, set its inner text (which is automatically encoded)
  // Then grab the encoded contents back out. The element never exists on the DOM.
  return $('<textarea/>').text(value).html();
}

function htmlDecode(value){
  return $('<textarea/>').html(value).text();
}

Fondamentalmente un elemento div viene creato in memoria, ma non viene mai aggiunto al documento.

Sulla funzione htmlEncode ho impostato innerText l'elemento e ho recuperato il codificato innerHTML; sulla funzione <=> imposto il <=> valore dell'elemento e <=> viene recuperato.

Controlla un esempio corrente qui .

Altri suggerimenti

Il trucco di jQuery non codifica le virgolette e in IE spoglia il tuo spazio bianco.

Basandomi sulla fuga templatetag di Django, che immagino sia già ampiamente utilizzata / testata, ho creato questa funzione che fa ciò che è necessario.

È probabilmente più semplice (e forse più veloce) di qualsiasi soluzione alternativa per il problema di rimozione degli spazi bianchi, e codifica le virgolette, che è essenziale se, ad esempio, utilizzerai il risultato all'interno di un valore di attributo.

function htmlEscape(str) {
    return str
        .replace(/&/g, '&amp;')
        .replace(/"/g, '&quot;')
        .replace(/'/g, '&#39;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;');
}

// I needed the opposite function today, so adding here too:
function htmlUnescape(str){
    return str
        .replace(/&quot;/g, '"')
        .replace(/&#39;/g, "'")
        .replace(/&lt;/g, '<')
        .replace(/&gt;/g, '>')
        .replace(/&amp;/g, '&');
}

Aggiornamento 17/06/2013:
Nella ricerca della fuga più veloce ho trovato questa implementazione di un metodo replaceAll:
http://dumpsite.com/forum/index.php?topic=4 .msg29 # Msg29
(indicato anche qui: Metodo più veloce per sostituire tutte le istanze di un carattere in una stringa )
Alcuni risultati di performance qui:
http://jsperf.com/htmlencoderegex/25

Fornisce una stringa di risultati identica alle catene replace incorporate sopra. Sarei molto felice se qualcuno potesse spiegare perché è più veloce !?

Aggiornamento 2015-03-04:
Ho appena notato che AngularJS sta usando esattamente il metodo sopra:
https://github.com/angular /angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js#L435

Aggiungono un paio di perfezionamenti: sembrano gestire un oscurare il problema Unicode oltre a convertire tutti i caratteri non alfanumerici in entità. Ho avuto l'impressione che quest'ultimo non fosse necessario finché hai un set di caratteri UTF8 specificato per il tuo documento.

Noterò che (4 anni dopo) Django non fa ancora nessuna di queste cose, quindi non sono sicuro di quanto siano importanti:
https://github.com/django/django/ blob / 1.8b1 / django / utils / html.py # L44

Aggiornamento 2016-04-06:
Potresti anche voler sfuggire alla barra /. Questo non è necessario per una corretta codifica HTML, tuttavia è

Ecco una versione non jQuery che è considerevolmente più veloce sia della versione jQuery .html() che della .replace() versione. Ciò preserva tutti gli spazi bianchi, ma come la versione jQuery, non gestisce le virgolette.

function htmlEncode( html ) {
    return document.createElement( 'a' ).appendChild( 
        document.createTextNode( html ) ).parentNode.innerHTML;
};

Velocità: http://jsperf.com/htmlencoderegex/17

Demo:

Output:

output

Script:

function htmlEncode( html ) {
    return document.createElement( 'a' ).appendChild( 
        document.createTextNode( html ) ).parentNode.innerHTML;
};

function htmlDecode( html ) {
    var a = document.createElement( 'a' ); a.innerHTML = html;
    return a.textContent;
};

document.getElementById( 'text' ).value = htmlEncode( document.getElementById( 'hidden' ).value );

//sanity check
var html = '<div>   &amp; hello</div>';
document.getElementById( 'same' ).textContent = 
      'html === htmlDecode( htmlEncode( html ) ): ' 
    + ( html === htmlDecode( htmlEncode( html ) ) );

HTML:

<input id="hidden" type="hidden" value="chalk    &amp; cheese" />
<input id="text" value="" />
<div id="same"></div>

So che questa è vecchia, ma volevo pubblicare una variante di la risposta accettata che funzionerà in IE senza rimuovere le righe:

function multiLineHtmlEncode(value) {
    var lines = value.split(/\r\n|\r|\n/);
    for (var i = 0; i < lines.length; i++) {
        lines[i] = htmlEncode(lines[i]);
    }
    return lines.join('\r\n');
}

function htmlEncode(value) {
    return $('<div/>').text(value).html();
}

Underscore fornisce _.escape() e _.unescape() che lo fanno.

> _.unescape( "chalk &amp; cheese" );
  "chalk & cheese"

> _.escape( "chalk & cheese" );
  "chalk &amp; cheese"

Buona risposta. Nota che se il valore da codificare è undefined o null con jQuery 1.4.2 potresti ricevere errori come:

jQuery("<div/>").text(value).html is not a function

Uncaught TypeError: Object has no method 'html'

La soluzione è modificare la funzione per verificare un valore effettivo:

function htmlEncode(value){ 
    if (value) {
        return jQuery('<div/>').text(value).html(); 
    } else {
        return '';
    }
}

Per coloro che preferiscono il semplice javascript, ecco il metodo che ho usato con successo:

function escapeHTML (str)
{
    var div = document.createElement('div');
    var text = document.createTextNode(str);
    div.appendChild(text);
    return div.innerHTML;
}

FWIW, la codifica non viene persa. La codifica viene utilizzata dal parser di markup (browser) durante il caricamento della pagina. Una volta che l'origine viene letta e analizzata e il browser ha il DOM caricato in memoria, la codifica è stata analizzata in ciò che rappresenta. Quindi quando il tuo JS viene eseguito per leggere qualsiasi cosa in memoria, il carattere che ottiene è ciò che rappresentava la codifica.

Potrei operare strettamente sulla semantica qui, ma volevo che capissi lo scopo della codifica. La parola & Quot; persa & Quot; fa sembrare che qualcosa non funzioni come dovrebbe.

Prototype ha incorporato Classe stringa . Quindi se stai usando / plan per usare Prototype, fa qualcosa del tipo:

'<div class="article">This is an article</div>'.escapeHTML();
// -> "&lt;div class="article"&gt;This is an article&lt;/div&gt;"

Più veloce senza Jquery. Puoi codificare tutti i caratteri nella tua stringa:

function encode(e){return e.replace(/[^]/g,function(e){return"&#"+e.charCodeAt(0)+";"})}

Oppure scegli come target i personaggi principali di cui preoccuparti (& amp ;, inebreaks, < ;, > ;, " e ') come:

function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}

test.value=encode('Encode HTML entities!\n\n"Safe" escape <script id=\'\'> & useful in <pre> tags!');

testing.innerHTML=test.value;

/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/

<textarea id=test rows="9" cols="55"></textarea>

<div id="testing">www.WHAK.com</div>

Ecco una semplice soluzione javascript. Estende l'oggetto String con un metodo & Quot; HTMLEncode & Quot; che può essere utilizzato su un oggetto senza parametro o con un parametro.

String.prototype.HTMLEncode = function(str) {
  var result = "";
  var str = (arguments.length===1) ? str : this;
  for(var i=0; i<str.length; i++) {
     var chrcode = str.charCodeAt(i);
     result+=(chrcode>128) ? "&#"+chrcode+";" : str.substr(i,1)
   }
   return result;
}
// TEST
console.log("stetaewteaw æø".HTMLEncode());
console.log("stetaewteaw æø".HTMLEncode("æåøåæå"))

Ho creato un gist " metodo HTMLEncode per javascript " .

Basato su angular's sanitize ... (sintassi del modulo es6)

// ref: https://github.com/angular/angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js
const SURROGATE_PAIR_REGEXP = /[\uD800-\uDBFF][\uDC00-\uDFFF]/g;
const NON_ALPHANUMERIC_REGEXP = /([^\#-~| |!])/g;

const decodeElem = document.createElement('pre');


/**
 * Decodes html encoded text, so that the actual string may
 * be used.
 * @param value
 * @returns {string} decoded text
 */
export function decode(value) {
  if (!value) return '';
  decodeElem.innerHTML = value.replace(/</g, '&lt;');
  return decodeElem.textContent;
}


/**
 * Encodes all potentially dangerous characters, so that the
 * resulting string can be safely inserted into attribute or
 * element text.
 * @param value
 * @returns {string} encoded text
 */
export function encode(value) {
  if (value === null || value === undefined) return '';
  return String(value).
    replace(/&/g, '&amp;').
    replace(SURROGATE_PAIR_REGEXP, value => {
      var hi = value.charCodeAt(0);
      var low = value.charCodeAt(1);
      return '&#' + (((hi - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000) + ';';
    }).
    replace(NON_ALPHANUMERIC_REGEXP, value => {
      return '&#' + value.charCodeAt(0) + ';';
    }).
    replace(/</g, '&lt;').
    replace(/>/g, '&gt;');
}

export default {encode,decode};

afaik non esiste alcun metodo di codifica / decodifica HTML semplice in javascript.

Tuttavia, ciò che puoi fare è usare JS per creare un elemento arbitrario, impostarne il testo interno, quindi leggerlo usando innerHTML.

dire, con jQuery dovrebbe funzionare:

var helper = $('chalk & cheese').hide().appendTo('body');
var htmled = helper.html();
helper.remove();

o qualcosa del genere

Non dovresti dover scappare / codificare i valori per spostarli da un campo di input a un altro.

<form>
 <input id="button" type="button" value="Click me">
 <input type="hidden" id="hiddenId" name="hiddenId" value="I like cheese">
 <input type="text" id="output" name="output">
</form>
<script>
    $(document).ready(function(e) {
        $('#button').click(function(e) {
            $('#output').val($('#hiddenId').val());
        });
    });
</script>

JS non va inserendo HTML grezzo o altro; dice semplicemente al DOM di impostare la proprietà value (o l'attributo; non sono sicuro). In entrambi i casi, il DOM gestisce eventuali problemi di codifica. A meno che tu non stia facendo qualcosa di strano come usare document.write o eval, la codifica HTML sarà effettivamente trasparente.

Se stai parlando di generare una nuova casella di testo per contenere il risultato ... è ancora facile. Basta passare la parte statica dell'HTML a jQuery, quindi impostare il resto delle proprietà / attributi sull'oggetto che ti viene restituito.

$box = $('<input type="text" name="whatever">').val($('#hiddenId').val());

Ho riscontrato un problema simile e risolvilo utilizzando la funzione encodeURIComponent da JavaScript ( documentazione )

Ad esempio, nel tuo caso se usi:

<input id='hiddenId' type='hidden' value='chalk & cheese' />

encodeURIComponent($('#hiddenId').attr('value'))

otterrai chalk%20%26%20cheese. Anche gli spazi vengono mantenuti.

Nel mio caso, ho dovuto codificare una barra rovesciata e questo codice funziona perfettamente

encodeURIComponent('name/surname')

e ho ricevuto name%2Fsurname

La mia funzione pure-JS:

/**
 * HTML entities encode
 *
 * @param {string} str Input text
 * @return {string} Filtered text
 */
function htmlencode (str){

  var div = document.createElement('div');
  div.appendChild(document.createTextNode(str));
  return div.innerHTML;
}

Codifica entità HTML JavaScript & amp; Decodifica

Se si desidera utilizzare jQuery. Ho trovato questo:

http://www.jquerysdk.com/api/jQuery.htmlspecialchars

(parte del plugin jquery.string offerto da jQuery SDK)

Credo che il problema con Prototype sia che estende gli oggetti base in JavaScript e sarà incompatibile con qualsiasi jQuery che potresti aver usato. Naturalmente, se stai già utilizzando Prototype e non jQuery, non sarà un problema.

EDIT: c'è anche questo, che è una porta delle utility di stringa di Prototype per jQuery:

http://stilldesigning.com/dotstring/

var htmlEnDeCode = (function() {
    var charToEntityRegex,
        entityToCharRegex,
        charToEntity,
        entityToChar;

    function resetCharacterEntities() {
        charToEntity = {};
        entityToChar = {};
        // add the default set
        addCharacterEntities({
            '&amp;'     :   '&',
            '&gt;'      :   '>',
            '&lt;'      :   '<',
            '&quot;'    :   '"',
            '&#39;'     :   "'"
        });
    }

    function addCharacterEntities(newEntities) {
        var charKeys = [],
            entityKeys = [],
            key, echar;
        for (key in newEntities) {
            echar = newEntities[key];
            entityToChar[key] = echar;
            charToEntity[echar] = key;
            charKeys.push(echar);
            entityKeys.push(key);
        }
        charToEntityRegex = new RegExp('(' + charKeys.join('|') + ')', 'g');
        entityToCharRegex = new RegExp('(' + entityKeys.join('|') + '|&#[0-9]{1,5};' + ')', 'g');
    }

    function htmlEncode(value){
        var htmlEncodeReplaceFn = function(match, capture) {
            return charToEntity[capture];
        };

        return (!value) ? value : String(value).replace(charToEntityRegex, htmlEncodeReplaceFn);
    }

    function htmlDecode(value) {
        var htmlDecodeReplaceFn = function(match, capture) {
            return (capture in entityToChar) ? entityToChar[capture] : String.fromCharCode(parseInt(capture.substr(2), 10));
        };

        return (!value) ? value : String(value).replace(entityToCharRegex, htmlDecodeReplaceFn);
    }

    resetCharacterEntities();

    return {
        htmlEncode: htmlEncode,
        htmlDecode: htmlDecode
    };
})();

Questo deriva dal codice sorgente ExtJS.

<script>
String.prototype.htmlEncode = function () {
    return String(this)
        .replace(/&/g, '&amp;')
        .replace(/"/g, '&quot;')
        .replace(/'/g, '&#39;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;');

}

var aString = '<script>alert("I hack your site")</script>';
console.log(aString.htmlEncode());
</script>

Produrrà: <script>alert("I hack your site")</script>

.htmlEncode () sarà accessibile su tutte le stringhe una volta definite.

HtmlCodifica il valore specificato

  var htmlEncodeContainer = $('<div />');
  function htmlEncode(value) {
    if (value) {
      return htmlEncodeContainer.text(value).html();
    } else {
      return '';
    }
  }

Ho riscontrato alcuni problemi con la barra rovesciata nella mia stringa Domain \ User.

Ho aggiunto questo alle altre fughe dalla risposta di Anentropic

.replace(/\\/g, '&#92;')

Che ho trovato qui: Come sfuggire alla barra rovesciata in JavaScript?

Ecco un po 'che emula la funzione Server.HTMLEncode dall'ASP di Microsoft, scritta in puro JavaScript:

function htmlEncode(s) {
  var ntable = {
    "&": "amp",
    "<": "lt",
    ">": "gt",
    "\"": "quot"
  };
  s = s.replace(/[&<>"]/g, function(ch) {
    return "&" + ntable[ch] + ";";
  })
  s = s.replace(/[^ -\x7e]/g, function(ch) {
    return "&#" + ch.charCodeAt(0).toString() + ";";
  });
  return s;
}

Il risultato non codifica gli apostrofi, ma codifica le altre speciali HTML e qualsiasi carattere al di fuori dell'intervallo 0x20-0x7e.

Scegliere cosa sta facendo escapeHTML() in prototype.js

L'aggiunta di questo script ti aiuta a sfuggire a HTML:

String.prototype.escapeHTML = function() { 
    return this.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;')
}

ora puoi chiamare il metodo escapeHTML sulle stringhe del tuo script, come:

var escapedString = "<h1>this is HTML</h1>".escapeHTML();
// gives: "&lt;h1&gt;this is HTML&lt;/h1&gt;"

Spero che aiuti chiunque cerchi una soluzione semplice senza dover includere l'intero prototype.js

Usando alcune delle altre risposte qui ho realizzato una versione che sostituisce tutti i caratteri pertinenti in un passaggio indipendentemente dal numero di caratteri codificati distinti (solo una chiamata a replace()), quindi sarà più veloce per stringhe più grandi.

Non si basa sull'esistenza dell'API DOM o su altre librerie.

window.encodeHTML = (function() {
    function escapeRegex(s) {
        return s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
    }
    var encodings = {
        '&'  : '&amp;',
        '"'  : '&quot;',
        '\'' : '&#39;',
        '<'  : '&lt;',
        '>'  : '&gt;',
        '\\' : '&#x2F;'
    };
    function encode(what) { return encodings[what]; };
    var specialChars = new RegExp('[' +
        escapeRegex(Object.keys(encodings).join('')) +
    ']', 'g');

    return function(text) { return text.replace(specialChars, encode); };
})();

Avendolo eseguito una volta, ora puoi chiamare

encodeHTML('<>&"\'')

Per ottenere <>&"'

function encodeHTML(str) {
    return document.createElement("a").appendChild( 
        document.createTextNode(str)).parentNode.innerHTML;
};

function decodeHTML(str) {
    var element = document.createElement("a"); 
    element.innerHTML = str;
    return element.textContent;
};
var str = "<"
var enc = encodeHTML(str);
var dec = decodeHTML(enc);
console.log("str: " + str, "\nenc: " + enc, "\ndec: " + dec);

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow