Question

How can match similar words in array_diff count

Problem of multiple name for single words like TV-Television,Inches-Inch,Mobile-Mobile Phones,Mobile-Phones.So create wrong percentage in array_diff count

Example :

    $str1 = "Samsung Television 21 Inches LED BH005DE";
    $str2 = "Samsung 21 Inch LED TV";

    $arr1 = explode(' ', $str1);
    $arr2 = explode(' ', $str2);

    $differenceCount = count(array_diff($arr2, $arr1));

In above str1 and str2 contain Television,TV and Inches,Inch words..How can solve this problem

Was it helpful?

Solution

The most obvious way is to use synonyms for that:

$str1 = "Samsung Television 21 Inches LED BH005DE";
$str2 = "Samsung 21 Inch LED TV";

//synonyms:
$syns = [
   'TV'   => ['TV', 'Television'],
   'Inch' => ['Inch', 'Inches']
];

//replace:
$str1 = array_reduce(array_keys($syns), function($c, $x) use ($syns)
{
   return $c = preg_replace('/\b'.join('\b|\b', $syns[$x]).'\b/', $x, $c);
}, $str1);
//now, str1 looks like "Samsung TV 21 Inch LED BH005DE"

$str2 = array_reduce(array_keys($syns), function($c, $x) use ($syns)
{
   return $c = preg_replace('/\b'.join('\b|\b', $syns[$x]).'\b/', $x, $c);
}, $str2);
//now, str2 looks like "Samsung 21 Inch LED TV"

$arr1 = explode(' ', $str1);
$arr2 = explode(' ', $str2);


//var_dump(array_diff($arr1, $arr2));//['BH005DE']

In your case you can't rely on some sort of word forms (like Inch - Inches) - because you need to parse abbreviations too - and those are cases with specific meanings. Thus, using synonyms may be the only way to resolve the matter for all cases.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top