Question

Right now I have a function that searches all posts of a certain user for key words (specified by the user), and return any posts that have matches for all of the key words.

public function fullTextSearch($text, $userId, $offset = 0, $limit = 0) {
    $tokens = explode(' ', trim($text,' '));
    $requiredMatches = count($tokens);
    $matchingId = array();
    $result = false;

    $sql = "SELECT posts.content "
            . "FROM  posts "
            . "WHERE posts.user_id = '" . $userId . "'";
    $primaryResults = $db->fetchAll($sql);

    foreach ($primaryResults as $primaryResult) { //results from query
        $postTokens = explode(' ', $primaryResult['ent_posts_content']);
        $foundMatches = 0;
        foreach ($tokens as $token) { //each of the required words

            foreach ($postTokens as $postToken) { //each of the words in the post


                $distance = levenshtein(strtolower($token), strtolower(rtrim($postToken)));

                if ($distance < 2) {
                    $foundMatches++;
                }
            }
            if ($foundMatches >= $requiredMatches) {
                $matchingId[] = $primaryResult['id'];
            }
        }
    }

the issue I am having with this is that one of my users likes to title his posts, and search for those posts by his makeshift 'title', for example;

My Radio

It plays all the music

As you can see in the code I rtrim the tokens from the contents of the post to try and avoid this issue. But when I go to search for Radio in the provided code I don't get that post as a result, I thought it had to do with using the levenshtein and the whitespace character at the end of radio throwing it off, but it doesn't seem to be the case as I am rtrimm-ing the post token for radio.

Was it helpful?

Solution

I ended up using a regular expression to find and replace any whitespace with a " " in the string so it would tokenize properly.

        $pregTokens = $pregText = preg_replace('/\s+/', ' ', $primaryResult['ent_posts_content']);
        $postTokens = explode(' ', $pregTokens);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top