Question

I have unicode characters in MySQL tables. I will print the data in the web pages. While printing it in the pages I am generating the 'Share This' buttons dynamically to share each record in that table (which is in Punjabi).

So the output in the page looks fine. But while sharing the same content in 'Share This' the destination page shows some unknown characters. Later I found that the data sent across the websites should be in Unicode Entity format (like ਆ this will print this '').

Now my tables have the values like ਜ ਝ ਞ ਟ ਠ ਡ ਢ.

I want to convert these data like ਜ ਝ ਞ ਟ ਠ ਡ ਢ in PHP.

Please help me in this.


The above issue fixed. But the Share This is still making issues while displaying the Unicode characters. Below is the output in the browser.

<script type="text/javascript" src="http://w.sharethis.com/button/buttons.js"></script>`
<script type="text/javascript">
stLight.options({
    publisher:'12345'
});
</script>
<span class="st_facebook" st_title="ੴਸਤਿ ਨਾਮੁ ਕਰਤਾ ਪੁਰਖੁ ਨਿਰਭਉ ਨਿਰਵੈਰੁ ਅਕਾਲ ਮੂਰਤਿ ਅਜੂਨੀ ਸੈਭੰ ਗੁਰ ਪ੍ਰਸਾਦਿ ॥" st_url="http://sitelink/"></span>

Thanks in advance, Abu Sithik

Was it helpful?

Solution

You should probably use url_encode the text when outputting the href of your <a> tag.

OTHER TIPS

A while ago, I wrote a polyfill for missing multibyte versions of ord and chr with the following in mind:

  • It defines functions mb_ord and mb_chr only if they don't already exist. If they do exist in your framework or some future version of PHP, the polyfill will be ignored.

  • It uses the widely used mbstring extension to do the conversion. If the mbstring extension is not loaded, it will use the iconv extension instead.

I also added functions for HTMLentities encoding / decoding and encoding / decoding to JSON format as well as some demo code for how to use these functions


Code :

if (!function_exists('codepoint_encode')) {

    function codepoint_encode($str) {
        return substr(json_encode($str), 1, -1);
    }

}

if (!function_exists('codepoint_decode')) {

    function codepoint_decode($str) {
        return json_decode(sprintf('"%s"', $str));
    }

}

if (!function_exists('mb_internal_encoding')) {

    function mb_internal_encoding($encoding = NULL) {
        return ($from_encoding === NULL) ? iconv_get_encoding() : iconv_set_encoding($encoding);
    }

}

if (!function_exists('mb_convert_encoding')) {

    function mb_convert_encoding($str, $to_encoding, $from_encoding = NULL) {
        return iconv(($from_encoding === NULL) ? mb_internal_encoding() : $from_encoding, $to_encoding, $str);
    }

}

if (!function_exists('mb_chr')) {

    function mb_chr($ord, $encoding = 'UTF-8') {
        if ($encoding === 'UCS-4BE') {
            return pack("N", $ord);
        } else {
            return mb_convert_encoding(mb_chr($ord, 'UCS-4BE'), $encoding, 'UCS-4BE');
        }
    }

}

if (!function_exists('mb_ord')) {

    function mb_ord($char, $encoding = 'UTF-8') {
        if ($encoding === 'UCS-4BE') {
            list(, $ord) = (strlen($char) === 4) ? @unpack('N', $char) : @unpack('n', $char);
            return $ord;
        } else {
            return mb_ord(mb_convert_encoding($char, 'UCS-4BE', $encoding), 'UCS-4BE');
        }
    }

}

if (!function_exists('mb_htmlentities')) {

    function mb_htmlentities($string, $hex = true, $encoding = 'UTF-8') {
        return preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) use ($hex) {
            return sprintf($hex ? '&#x%X;' : '&#%d;', mb_ord($match[0]));
        }, $string);
    }

}

if (!function_exists('mb_html_entity_decode')) {

    function mb_html_entity_decode($string, $flags = null, $encoding = 'UTF-8') {
        return html_entity_decode($string, ($flags === NULL) ? ENT_COMPAT | ENT_HTML401 : $flags, $encoding);
    }

}

How to use :

echo "\nGet string from numeric DEC value\n";
var_dump(mb_chr(25105));
var_dump(mb_chr(22909));

echo "\nGet string from numeric HEX value\n";
var_dump(mb_chr(0x6211));
var_dump(mb_chr(0x597D));

echo "\nGet numeric value of character as DEC int\n";
var_dump(mb_ord('我'));
var_dump(mb_ord('好'));

echo "\nGet numeric value of character as HEX string\n";
var_dump(dechex(mb_ord('我')));
var_dump(dechex(mb_ord('好')));

echo "\nEncode / decode to DEC based HTML entities\n";
var_dump(mb_htmlentities('我好', false));
var_dump(mb_html_entity_decode('&#25105;&#22909;'));

echo "\nEncode / decode to HEX based HTML entities\n";
var_dump(mb_htmlentities('我好'));
var_dump(mb_html_entity_decode('&#x6211;&#x597D;'));

echo "\nUse JSON encoding / decoding\n";
var_dump(codepoint_encode("我好"));
var_dump(codepoint_decode('\u6211\u597d'));

Output :

Get string from numeric DEC value
string(3) "我"
string(3) "好"

Get string from numeric HEX value
string(3) "我"
string(3) "好"

Get numeric value of character as DEC string
int(25105)
int(22909)

Get numeric value of character as HEX string
string(4) "6211"
string(4) "597d"

Encode / decode to DEC based HTML entities
string(16) "&#25105;&#22909;"
string(6) "我好"

Encode / decode to HEX based HTML entities
string(16) "&#x6211;&#x597D;"
string(6) "我好"

Use JSON encoding / decoding
string(12) "\u6211\u597d"
string(6) "我好"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top