you can get a bytearray by unpacking the utf8_encoded string $a:
$a = utf8_encode('Fön');
$b = unpack('C*', $a);
var_dump($b);
used format C* for "unsigned char"
References
Question
Let's say (for simplicity's sake) that I have a multibyte, UTF-8 encoded string variable with 3 letters (consisting of 4 bytes):
$original = 'Fön';
Since it's UTF-8, the bytes' hex values are (excluding the BOM):
46 C3 B6 6E
As the $original
variable is user-defined, I will need to hande two things:
I would tend to use strlen()
to handle "1.", and access the $original
variable's bytes with a simple `$original[$byteposition]
like this:
<?php
header('Content-Type: text/html; charset=UTF-8');
$original = 'Fön';
$totalbytes = strlen($original);
for($byteposition = 0; $byteposition < $totalbytes; $byteposition++)
{
$currentbyte = $original[$byteposition];
/*
Doesn't work since var_dump shows 3 bytes.
*/
var_dump($currentbyte);
/*
Fails too since "ord" only works on ASCII chars.
It returns "46 F6 6E"
*/
printf("%02X", ord($currentbyte));
echo('<br>');
}
exit();
?>
This proves my initial idea is not working:
How can I get the single bytes from a multibyte PHP string variable in a binary-safe way?
What I am looking for is a binary-safe way to convert UTF-8 string(s) into byte-array(s).
Solution
you can get a bytearray by unpacking the utf8_encoded string $a:
$a = utf8_encode('Fön');
$b = unpack('C*', $a);
var_dump($b);
used format C* for "unsigned char"
References
OTHER TIPS
I actually wrote my own class for this problem.
I was trying to make the javascript new TextEncoder("utf-8").encode(...)
in PHP.
So this is what i came up with:
It uses the PHP
ord()
function for getting the bytes
and the chr()
function for building the utf8 message back
class Uint8Array{
public $val = array();
public $length = 0;
function from($string, $mode = "utf8"){
if($mode == "utf8"){
$arr = [];
foreach (str_split($string) as $chr) {
$arr[] = ord($chr);
}
$this->val = $arr;
$this->length = count($arr);
return $arr;
}
elseif($mode == "hex"){
$arr = [];
for($i=0;$i<strlen($string);$i++){
if($i%2 == 0)
$arr[] = hexdec($string[$i].$string[$i+1]);
}
$this->val = $arr;
$this->length = count($arr);
return $arr;
}
}
function toString($enc = "utf8"){
if($enc == "utf8"){
$str = "";
foreach($this->val as $byte){
$str .= chr($byte);
}
return $str;
}
elseif($enc == "hex"){
$str = "";
foreach($this->val as $byte){
$str .= str_pad(dechex($byte),2,"0",STR_PAD_LEFT);
}
return $str;
}
}
}
use it like this:
create instance:
$handle = new Uint8Array;
input with ->from(string, encoding)
like this:
1)utf8 2)hex bytes(without spaces)
$handle->from("Fön","utf8");
//or with hex bytes
$handle->from("46c3b66e","hex");
output with ->toString(encoding)
hex/utf8:
$to_utf8 = $handle->toString("utf8");
//Fön
$to_hex = $handle->toString("hex");
//46c3b66e
the byte-array itself can be found at ->val
as you can see here:
$bytearray = $handle->val;
//[70, 195, 182, 110]
$arrayleng = $handle->length;
//4
that is all, be free to use this!