Is the Time complexity of this function o(n) ? Can it be optimized any further?

https://softwareengineering.stackexchange.com/questions/337271

02-01-2021
|

Вопрос

This is a very simple function to find if a string is composed of unique characters. I believe that this is a O(n) time complexity as it loops once and has a single if condition. Am I correct? Is there any way it can be optimized further?

public class Program
 {
    public static void Main(string[] args)
    {
        DuplicateChars(args[0].ToLower());
    }

    public static bool DuplicateChars(string str)
    {
        bool[] charfound = new bool[26];
        int index = 0;

        for(int i = 0; i <= str.Length ; ++i)
        {
            index = (byte)(str[i]);
            //subtract by ascii value of small a 97.
            index -= 97;
            if(charfound[index])
                return true;
            else
                charfound[index] = true;
        }  

        return false;
    }
}

Решение

Given your sample code I take it that the following assumption is true:

str only contains character values from 'a' to 'z'

Given that, we can immediately see an optimization opportunity: if str.Length is greater than charfound.Length, there will be a duplicated char, so we can include a check for that at the beginning of the function.

public class Program
{
    public static void Main(string[] args)
    {
        DuplicateChars(args[0].ToLower());
    }

    public static bool DuplicateChars(string str)
    {
        bool[] charfound = new bool[26];
        int index = 0;

        if(str.Length > charfound.Length) return true;

        for(int i = 0; i <= str.Length ; ++i)
        {
            index = (byte)(str[i]);
            //subtract by ascii value of small a 97.
            index -= 97;
            if(charfound[index])
                return true;
            else
                charfound[index] = true;
        }  

        return false;
    }
}

After this change, the worst case input would be a string consisting of a permutation of "abcdefghijklmnopqrstuvwxyz", which would mean that the function is O(1) in the worst case. (Proof of this is left as an exercise for the reader.)

EDIT: As pointed out by @Pieter B in the comments, you'd probably want to move the call to ToLower() from Main to right after the line if(str.Length > charfound.Length) return true; so you don't spend O(n) time in total.

Другие советы

You can improve it if you have extra information about the size of your alphabet.

Let's assume your string can ONLY have ASCII characters. That means that you can have at most 128 unique characters in your string. Any string with more than 128 characters will have duplicate characters.

What that means is that you only have to execute DuplicateChars if the string length is smaller or equal to 128, which places a constant upper bound on n and makes the algorithm O(1).

Still O(n) but a lot less code

public static bool StringHasDup(string s)
{
    return (s.Length != new HashSet<char>(s.ToCharArray()).Count);
}

Лицензировано под: CC-BY-SA с атрибуция

Не связан с softwareengineering.stackexchange