I am trying to solve the following problem

you are given two strings. A of size n, B of size m. m is a very very small number compared to n. find out if A contains a substring which is anagram of B.

The approach which I took is as follows

public static boolean ana_check(String a, String b)
{
    int n=a.length();
    int m=b.length();
    boolean k;
    for(int i=0;i<=(n-m);i++){
        k= anagram((a.substring(i,i+m)),b);
        if(k)
            return true;
}
    return false;
}

As you can see I extract each string of length m starting from the beginning of the string A and check if its an anagram of B or not. For checking the anagram I build a frequency map for each string and if they are found to be the same I return true. The code is given below:

  public static boolean anagram(String s, String t) {
        // Strings of unequal lengths can't be anagrams
        if(s.length() != t.length()) {
            return false;
        }

        // They're anagrams if both produce the same 'frequency map'
        return frequencyMap(s).equals(frequencyMap(t));
    }

    private static Map<Character, Integer> frequencyMap(String str) {
        Map<Character, Integer> map = new HashMap<Character, Integer>();
        for(char c : str.toLowerCase().toCharArray()) {
            Integer frequency = map.get(c);
            map.put(c, frequency == null ? 1 : frequency+1);
        }
        return map;
    }

I believe the anagram method runs at O(n) time. What is the time complexity of the method ana_check? Is the overall code linear or quadratic?

有帮助吗?

解决方案

Well, lets see...

Assuming that the length() method runs in constant time (ie: it doesn't work like strlen()). Your method frequencyMap is o(m), and anagram calls it twice. anagram is called n-m times. Total complexity is on the order of o(2*m*n). With m << n, 'big o' is O(n).

I can suggest a couple of optimizations. First you are re-generating the frequency map for string b at every call to anagram. Do it once at the beginning of ana_check. You can have an anagram method that takes a string and a frequency map instead of two strings.

The other thing I would do is to remove the length checks from anagram. Yes, it's a safety feature, but you already know the strings you passed in are the same size. And anyway, if they are different lengths the frequency maps will still not match, so the function is correct.

A trickier optimization would be to modify string a's frequency map instead of re-doing it every time. For the first substring, you do it as usual. But then you move ahead one character, subtracting the first character from the map and adding the new one. Sure, if m is <= 3 it won't make a difference, but anything larger than that would be a win.

其他提示

You don't need to compare the whole map for every position.

Start by creating a signed frequency map and subtracting every letter in B. Keep a counter c of how many non-zero entries are contained in the map.

Next add the first m (length of B) letters of A into the map. For each letter you add, if that count used to be zero then increment c, or if it became zero after you added the letter then decrement c.

If c is now zero then you've found an anagram (every negative count from B has been balanced by a positive count from A), otherwise carry on.

Add the next letter of A to the frequency map, and remove the letter m letters prior to that, adjusting c appropriately for both operations.

Repeat those last two steps until c becomes zero or you run out of letters in A.

You might try to optimise this further by recognising that every time you add a character that doesn't appear in B, you're guaranteed a non-match for the next m characters (this is distinct from where the count simply goes positive, as other characters you've passed may cancel that before m). So you can restart the precondition from after that letter. The complexity of the operations this allows you to skip isn't very high, though, and this special-case code might not be any faster.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top