
Can you please explain how this line of code is equivalent to the next code:

$string = chr( ( $number >> 6 ) + 192 ).chr( ( $number & 63 ) + 128 );

Its equivalent to :

if ( $number >=128 && $number <=2047 ){

   $byte1 = 192 + (int)($number / 64); //= 192 + ( $number >> 6 )
   $byte2 = 128 + ($number % 64);      //= 128 + ( $number & 63 )
   $utf = chr($byte1).chr($byte2);

for example entering number 1989 both produces ߅

These codes are used for converting UNICODE Entities back to original UTF-8 characters.

Was it helpful?


The code on top uses binary operators. >> is right shift operator. It shifts the bit in the number to the right (towards more significant bits).

So 11110000 >> 2 = 00111100

It's equivalent to division by powers of 2 $number >> $n is the same as $number / pow(2,$n).

The & is the "bitwise and" operator. It compares respective bits on both numbers, and sets in result those, that are 1 in both numbers.

11110000 & 01010101 = 01010000

By and'ing $number with 63 (001111111) you get the remainder of dividing $number by 64 (aka the modulus), which is written $number % 64.


$number >> 6 is a binary shift-right operation, ie: 11000000 >> 6 == 00000011 equivalent to $number / pow(2,6) aka $number / 64

$number & 63 is a binary AND with 00111111

Both are much faster to do as binary operations since they deal with powers or two.

Adding to @Mchl's answer the reason for adding 192 in UTF sequence is to signal the start of byte information

192 - 11000000 - Start of 2 Byte sequence ( 128 + 64)

224 - 11100000 - Start of 3 Byte sequence ( 128 + 64 + 32)

240 - 11110000 - Start of 4 Byte sequence ( 128 + 64 + 32 + 16)

248 - 11111000 - Start of 5 Byte sequence (Restricted) (... + 8)

252 - 11111100 - Start of 6 Byte sequence (Restricted) (... + 4)

254 - 11111110 - Invalid

Table Reference :

UTF-8 byte range table

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top