Question

I have written down a simple function that determines if str1 is a prefix of str2. It's a very simple function, that looks like this (in JS):

function isPrefix(str1, str2) // determine if str1 is a prefix of a candidate string
{
    if(str2.length < str1.length) // candidate string can't be smaller than prefix string 
        return false;

    var i = 0;
    while(str1.charAt(i) == str2.charAt(i) && i <= str1.length)
        i++;
   if(i < str1.length) // i terminated => str 1 is smaller than str 2
        return false;
    return true;
}

As you can see, it loops through the entire length of the prefix string to gauge if it is a prefix of the candidate string. This means it's complexity is O(N), which isn't bad but this becomes a problem when I have a huge data set to consider looping through to determine which strings have the prefix string as a part of the prefix. This makes the complexity multiple like O(M*N) where M is the total number of strings in a given data set. Not good.

I explored the Internet a bit to determine that the best answer would be a Patricia/Radix trie. Where strings are stored as prefixes. Even then, when I attempt to insert/look-up a string, there will be a considerable overhead in string matching if I use the aforementioned prefix gauging function.

Say I had a prefix string 'rom' and a set of candidate words

var dataset =["random","rapid","romance","romania","rome","rose"];

that would like this in a radix trie :

         r
       /    \
     a       o
    / \     / \
ndom pid  se  m
             / \
           an   e
          /  \
        ia   ce

This means, for every node, I will be using the prefix match function, to determine which node has a value that matches the prefix string at the index. Somehow, this solution still seems arduous and does not sit too well with me. Is there something better or anyway I can improve the core prefix matching function ?

Was it helpful?

Solution

Looks like you've got two different problems.

One is to determine if a string is contained as a prefix in another string. For this I would suggest using a function already implemented in the language's string library. In JavaScript you could do this

if (str2.indexOf(str1) === 0) {
    // string str1 is a prefix of str2
}

See documentation for String.indexOf here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf

For the other problem, in a bunch of strings, find out which ones have a given string as a prefix, building a data structure like a Trie or the one you mention seems like the way to go, if you want fast look-ups.

OTHER TIPS

Check out this thread on stackoverflow - How to check if a string "StartsWith" another string? . Mark Byers solution seems to be very efficient. Also for Java there are built in String functions "endsWith" and "startsWith" - http://docs.oracle.com/javase/tutorial/java/data/comparestrings.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top