With PHP, I'm Trying to determine the length (number of characters) in strings such as these:
1
1.1
1.1.1
1.1.2
1.1.3
1.1.3.1
1.1.3.2
1.1.4
1.1.5
1.1.6
1.1.7
etc.
When the length of these strings are measured with mb_strlen() or strlen(), the results are
------------------------------
value | mb_strlen() | strlen()
------------------------------
1 | 1 | 1
------------------------------
1.1 | 5 | 5
------------------------------
1.1.1 | 9 | 9
------------------------------
1.1.1.1 | 13 | 13
------------------------------
1.1.1.2 | 13 | 13
------------------------------
1.1.1.3 | 13 | 13
------------------------------
It appears that it's counting "." as 3 characters? I'm wondering about just doing a small function to compensate for the predictable "miscount", but am wondering why it's counting the "." as 3 characters to begin with.
I have already looked through several places including this SO article and read the article mentioned, adding the suggested conversions to the page:
mb_language('uni');
mb_internal_encoding('UTF-8');
$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');
What gives?
EDIT:
The strings are imported as part of a csv import.
Here is code:
<?
$f = fopen("s2db.csv", "r");
while (($line = fgetcsv($f)) !== false) {
$colcount = 0;
foreach ($line as $cell) {
//lets get the lines into variables first
//there only five, so just count
switch ($colcount) {
case '0':
$item = $cell;
break;
case '1':
$itemtitle = htmlspecialchars($cell);
break;
case '2':
$itemsubject = htmlspecialchars($cell);
break;
case '3':
$itemnumber = htmlspecialchars($cell);
break;
case '4':
$itemqty = htmlspecialchars($cell);
break;
case '5':
$itemfilename = htmlspecialchars($cell);
break;
}
$colcount++;
}
$itemlen = strlen($item);
echo "Value = " . $item . " | strlen() Length = " . $itemlen . "| mb_strlen() = " . mb_strlen($item) . "</br>";
}
?>
Here are results
Value = 1 | strlen() Length = 3| mb_strlen() = 3
Value = 1.1 | strlen() Length = 7| mb_strlen() = 7
Value = 1.1.1 | strlen() Length = 11| mb_strlen() = 11
Value = 1.1.1.1 | strlen() Length = 15| mb_strlen() = 15
Value = 1.1.1.2 | strlen() Length = 15| mb_strlen() = 15
Value = 1.1.1.3 | strlen() Length = 15| mb_strlen() = 15
Value = 1.1.1.3.1 | strlen() Length = 19| mb_strlen() = 19
Value = 1.1.1.3.2 | strlen() Length = 19| mb_strlen() = 19
Value = 1.1.1.3.3 | strlen() Length = 19| mb_strlen() = 19
Value = 1.1.1.4 | strlen() Length = 15| mb_strlen() = 15
SOLUTION:
I gave @hek2mgl the vote because his hexdump helped me determine that I wasn't crazy and it really was counting the "." as 3, as shown here.
Nothing I can do about the import format, so I'm just going to add code to compensate:
Thanks everyone for the help!