Question

I've a JavaScript code that extracts JSON strings from other pages from my blog (Blogger), but many special characters in those strings are as &#?????;, where ????? is a number up to 5 digits, or as something like \74br /\76, which should be a <br />.

Both come mixed in the same string, and both seem to be ASCII, the first one being decimal/html and the second one being octal.

How can I decode this mess to their respective characters by using JavaScript? Is there any existing function or proper solution for this?

Was it helpful?

Solution

These should get you started

function decodeHtmlNumeric( str ) {
    return str.replace( /&#([0-9]{1,7});/g, function( g, m1 ){
        return String.fromCharCode( parseInt( m1, 10 ) );
    }).replace( /&#[xX]([0-9a-fA-F]{1,6});/g, function( g, m1 ){
        return String.fromCharCode( parseInt( m1, 16 ) );
    });
}

function decodeOctal( str ) {
    return str.replace( /\\([0-7]+)/g, function( g, m1 ) {
        return String.fromCharCode( parseInt( m1, 8 ) );
    });
}
           //Double \\ = one backslash 
decodeOctal("\\74br /\\76"); //"<br />"
decodeHtmlNumeric("&#255;"); //"ÿ"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top