Question

This code:

setlocale(LC_ALL, 'pl_PL', 'pl', 'Polish_Poland.28592');
$result = mb_stripos("ĘÓĄŚŁŻŹĆŃ",'ęóąśłżźćń');

returns false;

How to fix that?

P.S. This stripos returns false when special characters is used is not correct answer.


UPDATE: I made a test:

function test() {
    $search = "zawór"; $searchlen=strlen($search);
    $opentag="<valve>"; $opentaglen=strlen($opentag);
    $closetag="</valve>"; $closetaglen=strlen($closetag);
    $test[0]['input']="test ZAWÓR test"; //normal test
    $test[1]['input']="X\nX\nX ZAWÓR X\nX\nX"; //white char test
    $test[2]['input']="<br> ZAWÓR <br>"; //html newline test
    $test[3]['input']="ĄąĄą ZAWÓR ĄąĄą"; //polish diacritical test
    $test[4]['input']="テスト ZAWÓR テスト"; //japanese katakana test
    foreach ($test as $key => $val) {
        $position = mb_stripos($val['input'],$search,0,'UTF-8');
        if($position!=false) {
            $output = $val['input'];
            $output = substr_replace($output, $opentag, $position, 0);
            $output = substr_replace($output, $closetag, $position+$opentaglen+$searchlen, 0);
            $test[$key]['output'] = $output;
        }
        else {
            $test[$key]['output'] = null;
        }
    }
    return $test;
}

FIREFOX OUTPUT:

$test[0]['output'] == "test <valve>ZAWÓR</valve> test"        // ok
$test[1]['output'] == "X\nX\nX <valve>ZAWÓR</valve> X\nX\nX"  // ok
$test[2]['output'] == "<br> <valve>ZAWÓR</valve> <br>"        // ok
$test[3]['output'] == "Ąą�<valve>�ą ZA</valve>WÓR ĄąĄą"       // WTF??
$test[4]['output'] == "テ�<valve>��ト </valve>ZAWÓR テスト"    // WTF??

Solution https://drupal.org/node/1107268 does not change anything.

Was it helpful?

Solution

The function works fine when told what encoding your strings are in:

var_dump(mb_stripos("ĘÓĄŚŁŻŹĆŃ",'ęóąśłżźćń', 0, 'UTF-8'));  // 0
                                                ^^^^^^^

Without the explicit encoding argument, it may assume the wrong encoding and cannot treat your string correctly.


The problem with your test code is that you're mixing character-based indices with byte-offset-based indices. mb_strpos returns offsets in characters, while substr_replace works with byte offsets. Read about the topic here: What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.

If you want to wrap a certain word in tags in a multi-byte string, I'd rather suggest this approach:

preg_replace('/zawór/iu', '<valve>$0</valve>', $text)

Note that $text must be UTF-8 encoded, /u regular expressions only work with UTF-8.

OTHER TIPS

I'm not sure why mb_stripos function dose not worked but workaround will work as below,

$str = mb_convert_case("ęóąśłżźćń", MB_CASE_UPPER, "UTF-8");
$result = mb_strrichr($str,"ĘÓĄŚŁŻŹĆŃ");
var_dump($result);

DEMO.

Using your tip, dear Rikesh, I wrote that:

function patched_mb_stripos($content,$search) {
    $content=mb_convert_case($content, MB_CASE_LOWER, "UTF-8");
    $search=mb_convert_case($search, MB_CASE_LOWER, "UTF-8");
    return mb_stripos($content,$search);
}

and it seems to work :)

Solution from https://gist.github.com/stemar/8287074 :

function mb_substr_replace($string, $replacement, $start, $length=NULL) {
if (is_array($string)) {
$num = count($string);
// $replacement
$replacement = is_array($replacement) ? array_slice($replacement, 0, $num) : array_pad(array($replacement), $num, $replacement);
// $start
if (is_array($start)) {
$start = array_slice($start, 0, $num);
foreach ($start as $key => $value)
$start[$key] = is_int($value) ? $value : 0;
}
else {
$start = array_pad(array($start), $num, $start);
}
// $length
if (!isset($length)) {
$length = array_fill(0, $num, 0);
}
elseif (is_array($length)) {
$length = array_slice($length, 0, $num);
foreach ($length as $key => $value)
$length[$key] = isset($value) ? (is_int($value) ? $value : $num) : 0;
}
else {
$length = array_pad(array($length), $num, $length);
}
// Recursive call
return array_map(__FUNCTION__, $string, $replacement, $start, $length);
}
preg_match_all('/./us', (string)$string, $smatches);
preg_match_all('/./us', (string)$replacement, $rmatches);
if ($length === NULL) $length = mb_strlen($string);
array_splice($smatches[0], $start, $length, $rmatches[0]);
return join("",$smatches[0]);
}

solves the problem with function test()

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top