Question

I'm looking for a simple way to find matching portions of two strings in PHP (specifically in the context of a URI)

For example, consider the two strings:

http://2.2.2.2/~machinehost/deployment_folder/

and

/~machinehost/deployment_folder/users/bob/settings

What I need is to chop off the matching portion of these two strings from the second string, resulting in:

users/bob/settings

before appending the first string as a prefix, forming an absolute URI.

Is there some simple way (in PHP) to compare two arbitrary strings for matching substrings within them?

EDIT: as pointed out, I meant the longest matching substring common to both strings

Was it helpful?

Solution

This would be the answer. Ready-to-use PHP function.

OTHER TIPS

Assuming your strings are $a and $b, respectively, you can use this:

$a = 'http://2.2.2.2/~machinehost/deployment_folder/';
$b = '/~machinehost/deployment_folder/users/bob/settings';

$len_a = strlen($a);
$len_b = strlen($b);

for ($p = max(0, $len_a - $len_b); $p < $len_b; $p++)
    if (substr($a, $len_a - ($len_b - $p)) == substr($b, 0, $len_b - $p))
        break;

$result = $a.substr($b, $len_b - $p);

echo $result;

This result is http://2.2.2.2/~machinehost/deployment_folder/users/bob/settings.

Finding the longest common match can also be done using regex.

The below function will take two strings, use one to create a regex, and execute it against the other.

/**
 * Determine the longest common match within two strings
 *
 * @param string $str1
 * @param string $str2 Two strings in any order.
 * @param boolean $case_sensitive Set to true to force
 * case sensitivity. Default: false (case insensitive).
 * @return string The longest string - first match.
 */
function get_longest_common_subsequence( $str1, $str2, $case_sensitive = false ) {
    // First check to see if one string is the same as the other.
    if ( $str1 === $str2 ) return $str1;
    if ( ! $case_sensitive && strtolower( $str1 ) === strtolower( $str2 ) ) return $str1;

    // We'll use '#' as our regex delimiter. Any character can be used as we'll quote the string anyway,
    $delimiter = '#';

    // We'll find the shortest string and use that to check substrings and create our regex.
    $l1 = strlen( $str1 );
    $l2 = strlen( $str2 );
    $str = $l1 <= $l2 ? $str1 : $str2;
    $str2 = $l1 <= $l2 ? $str2 : $str1;
    $l = min( $l1, $l2 );

    // Next check to see if one string is a substring of the other.
    if ( $case_sensitive ) {
        if ( strpos( $str2, $str ) !== false ) {
            return $str;
        }
    }
    else {
        if ( stripos( $str2, $str ) !== false ) {
            return $str;
        }
    }

    // Regex for each character will be of the format (?:a(?=b))?
    // We also need to capture the last character, but this prevents us from matching strings with a single character. (?:.|c)?
    $reg = $delimiter;
    for ( $i = 0; $i < $l; $i++ ) {
        $a = preg_quote( $str[ $i ], $delimiter );
        $b = $i + 1 < $l ? preg_quote( $str[ $i + 1 ], $delimiter ) : false;
        $reg .= sprintf( $b !== false ? '(?:%s(?=%s))?' : '(?:.|%s)?', $a, $b );
    }
    $reg .= $delimiter;
    if ( ! $case_sensitive ) {
        $reg .= 'i';
    }
    // Resulting example regex from a string 'abbc':
    // '#(?:a(?=b))?(?:b(?=b))?(?:b(?=c))?(?:.|c)?#i';

    // Perform our regex on the remaining string
    $str = $l1 <= $l2 ? $str2 : $str1;
    if ( preg_match_all( $reg, $str, $matches ) ) {
        // $matches is an array with a single array with all the matches.
        return array_reduce( $matches[0], function( $a, $b ) {
            $al = strlen( $a );
            $bl = strlen( $b );
            // Return the longest string, as long as it's not a single character.
            return $al >= $bl || $bl <= 1 ? $a : $b;
        }, '' );
    }

    // No match - Return an empty string.
    return '';
}

It'll generate a regex using the shorter of the two strings, although performance will most likely be the same either way. It may incorrectly match strings with recurring substrings, and we're limited to matching strings of two characters or more, unless they are equal or one is a substring of the other. For Instance:

// Works as intended.
get_longest_common_subsequence( 'abbc', 'abc' ) === 'ab';

// Returns incorrect substring based on string length and recurring substrings.
get_longest_common_subsequence( 'abbc', 'abcdef' ) === 'abc';

// Does not return any matches, as all recurring strings are only a single character long.
get_longest_common_subsequence( 'abc', 'ace' ) === '';

// One of the strings is a substring of the other.
get_longest_common_subsequence( 'abc', 'a' ) === 'a';

Regardless, it functions using an alternate method and the regex can be refined to tackle additional situations.

I'm not sure to understand your full request, but the idea is:

Let A be your URL and B your "/~machinehost/deployment_folder/users/bob/settings"

  • search B in A -> you get an index i (where i is the position of the first / of B in A)
  • let l = length(A)
  • You need to cut B from (l-i) to length(B) to grab the last part of B (/users/bob/settings)

I have not tested yet, but if you really need, I can help you make this brilliant (ironical) solution work.

Note that it may be possible with regular expressions like

$pattern = "$B(.*?)"
$res = array();
preg_match_all($pattern, $A, $res);

Edit: I think your last comment invalidates my response. But what you want is finding substrings. So you can first start with a heavy algorithm trying to find B[1:i] in A for i in {2, length(B)} and then use some dynamic programming stuffs.

it does not seem to be an out of the box code out there for your requirement. So lets look for a simple way.

For this exercise I utilized two methods, one for finding the longest match, and another one to chop off the matching portion.

The FindLongestMatch() method, takes apart a path, piece by piece seeks for a match in the other path, keeping just one match, the longest one (no arrays, no sorting). The RemoveLongestMatch() method takes the suffix or 'remainder' after the longest match found position.

Here the full source code:

<?php

function FindLongestMatch($relativePath, $absolutePath)
{
    static $_separator = '/';
    $splitted = array_reverse(explode($_separator, $absolutePath));

    foreach ($splitted as &$value)
    {
        $matchTest = $value.$_separator.$match;
        if(IsSubstring($relativePath, $matchTest))
            $match = $matchTest;

        if (!empty($value) && IsNewMatchLonger($match, $longestMatch))
            $longestMatch = $match;
    }

    return $longestMatch;
}

//Removes from the first string the longest match.
function RemoveLongestMatch($relativePath, $absolutePath)
{
    $match = findLongestMatch($relativePath, $absolutePath);
    $positionFound = strpos($relativePath, $match);     
    $suffix = substr($relativePath, $positionFound + strlen($match));

    return $suffix;
}

function IsNewMatchLonger($match, $longestMatch)
{
    return strlen($match) > strlen($longestMatch);
}

function IsSubstring($string, $subString)
{
    return strpos($string, $subString) > 0;
}

This is a representative subset of Test Cases:

//TEST CASES
echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://1.1.1.1/root/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/root/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/users/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/subDirectory/deployment_folderX/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

Running previous Test Cases provides the following output:

http://2.2.2.2/~machinehost/deployment_folder/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/
Suffix: users/bob/settings

http://1.1.1.1/root/~machinehost/deployment_folder/
/root/~machinehost/deployment_folder/users/bob/settings
Longuest match: root/~machinehost/deployment_folder/
Suffix: users/bob/settings

http://2.2.2.2/~machinehost/deployment_folder/users/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/users/
Suffix: bob/settings

http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/
/~machinehost/subDirectory/deployment_folderX/users/bob/settings
Longuest match: ~machinehost/subDirectory/
Suffix: deployment_folderX/users/bob/settings

Maybe you can take the idea of this piece of code and turn it into something that you find useful for your current project. Let me know if it worked for you too. By the way, Mr. oreX answer looks good too.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top