从字典条目中创建给定的字符串

https://stackoverflow.com/questions/4563228

14-10-2019
|

题

在最近的工作面试中，我被要求解决以下问题：

给定一个字符串 s （没有空格）和词典，返回构成字符串的字典中的单词。

例如， s= peachpie, dic= {peach, pie}, result={peach, pie}.

我会问这个问题的决策变化：

如果 s 可以在字典返回中由单词组成 yes, ，否则返回 no.

我对此的解决方案是回溯（用爪哇编写）

public static boolean words(String s, Set<String> dictionary)
{
    if ("".equals(s))
        return true;

    for (int i=0; i <= s.length(); i++)
    {
        String pre = prefix(s,i); // returns s[0..i-1]
        String suf = suffix(s,i); // returns s[i..s.len]
        if (dictionary.contains(pre) && words(suf, dictionary))
            return true;
    }
    return false;
}

public static void main(String[] args) {
    Set<String> dic = new HashSet<String>();
    dic.add("peach");
    dic.add("pie");
    dic.add("1");

    System.out.println(words("peachpie1", dic)); // true
    System.out.println(words("peachpie2", dic)); // false
}

该解决方案的时间复杂性是多少？我在for循环中递归地打电话，但仅用于字典中的前缀。

有任何想法吗？

解决方案

您可以轻松地创建一个案例，使程序至少需要指数时间才能完成。让我们说一句话 aaa...aaab, ，在哪里 a 重复 n 时代。字典只包含两个单词， a 和 aa.

b 最后确保该功能永远不会找到匹配，因此永远不会过早退出。

在各个 words 执行，将产生两个递归电话： suffix(s, 1) 和 suffix(s, 2). 。因此，执行时间像斐波那契数字一样生长： t(n) = t(n - 1) + t(n - 2). 。（您可以通过插入计数器来验证它。）因此，复杂性肯定不是多项式。（这甚至不是最糟糕的输入）

但是您可以轻松地改善解决方案记忆. 。注意，功能输出 words 仅取决于一件事：我们开始在原始字符串中的哪个位置。 EE，如果我们有一个字符串 abcdefg 和 words(5) 被称为，到底没关系 abcde 组成（如 ab+c+de 或者 a+b+c+d+e 或者是其他东西）。因此，我们不必重新计算 words("fg") 每一次。
在原始版本中，可以这样完成

public static boolean words(String s, Set<String> dictionary) {
    if (processed.contains(s)) {
        // we've already processed string 's' with no luck
        return false;
    }

    // your normal computations
    // ...

    // if no match found, add 's' to the list of checked inputs
    processed.add(s);
    return false;
}

ps，我鼓励你改变 words(String) 至 words(int). 。这样，您将能够将结果存储在数组中，甚至可以将整个算法转换为DP（这将使其更简单）。

编辑2
由于除了工作以外，我没有什么可做的，这是DP（动态编程）解决方案。与上面相同的想法。

    String s = "peachpie1";
    int n = s.length();
    boolean[] a = new boolean[n + 1];
    // a[i] tells whether s[i..n-1] can be composed from words in the dictionary
    a[n] = true; // always can compose empty string

    for (int start = n - 1; start >= 0; --start) {
        for (String word : dictionary) {
            if (start + word.length() <= n && a[start + word.length()]) {
                // check if 'word' is a prefix of s[start..n-1]
                String test = s.substring(start, start + word.length());
                if (test.equals(word)) {
                    a[start] = true;
                    break;
                }
            }
        }
    }

    System.out.println(a[0]);

其他提示

这是一个动态编程解决方案，它计算将字符串分解为单词的总数。它可以解决您的原始问题，因为如果分解数为正，则字符串是分解的。

def count_decompositions(dictionary, word):
    n = len(word)
    results = [1] + [0] * n
    for i in xrange(1, n + 1):
        for j in xrange(i):
            if word[n - i:n - j] in dictionary:
                results[i] += results[j]
    return results[n]

存储O（n）和运行时间O（n^2）。

所有字符串上的循环将采用 n. 。查找所有后缀和前缀将需要 n + (n - 1) + (n - 2) + .... + 1 (n 首次呼叫 words, (n - 1) 对于第二等），

n^2 - SUM(1..n) = n^2 - (n^2 + n)/2 = n^2 / 2 - n / 2

复杂性理论等于n^2。

在正常情况下检查哈希集中存在的存在是theta（1），但在最坏的情况下是o（n）。

因此，算法的正常情况复杂性是theta（n^2），最坏的情况-O（n^3）。

编辑： 我混淆了递归和迭代的顺序，所以这个答案是错误的。实际时间取决于 n 指数（例如，与斐波那契数的计算进行比较）。

更有趣的是如何改善算法的问题。传统上用于弦乐操作 后缀树 用来。您可以用字符串构建后缀树，并将所有节点标记为“未跟踪”的算法。然后在集合中浏览字符串，每次使用一些节点时，都将其标记为“跟踪”。如果集合中的所有字符串都在树上找到，则意味着原始字符串包含所有从集合的子字符串。如果所有节点都标记为跟踪，则意味着字符串组成只要 set的子字符串。

这种方法的实际复杂性取决于许多因素，例如树木构建算法，但至少可以将问题分为几个独立的子任务，因此可以通过最昂贵的子任务的复杂性来测量最终复杂性。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow