Algorithm to find words spelled out by a number

https://stackoverflow.com/questions/1628123

06-07-2019
|

Question

I'm trying to find a way to determine all possible words that can be spelled out by a given number, given a mapping of alphabets to values.

I eventually want to find a solution that works for any 1- or 2- digit value mapping for a letter, but for illustration, assume A=1, B=2, ... Z=26.

Example: 12322 can be equal to abcbb (1,2,3,2,2), lcbb (12,3,2,2), awbb (1,23,2,2), abcv (1,2,3,22), awv (1,23,22), or lcv (12,3,22).

Here's what I have thought of so far:

I will build a tree of all possible words using the number.

To do this, I will start out with a tree with one root node with dummy data.

I will parse then the number digit-by-digit starting from the least significant digit.

At each step, I will take the last digit of the remaining part of the number and insert it into the left subtree of the current node, and remove that digit from the number for that node's left subtree. For the same node, I will then check if the previous TWO digits together form a valid alphabet, and if so, I will put them into the right subtree (and remove the 2 digits from the number for that node's right subtree).

I will then repeat the above steps recursively for each node, using the part of the number that's left, until there are no more digits left.

To illustrate, for 12322 my tree will look something like this:

                *
             /     \
            /       \
           2         22
         /         /   \
        2         3     23
      /   \      / \    /
    3      23   2   12 1
   / \    /    /
  2   12 1    1
 /
1

To get the words then, I will traverse all possible paths from the leaves to the nodes.

This seems to be an overly complex solution for what I thought would be a fairly simple problem, and I'm trying to find if there's a simpler way to solve this.

Solution

Suppose you aleady have all the possible combination of [2, 3, 2, 2] ,what would be the combination of [1, 2, 3, 2, 2] (add [1] to the head)? It is not difficult the deduce it should be:

  A1: put [1] to the head of all_the_combinations_of[1,2,3,2,2] and 
  A2: put [1*10 + 2] to the head of all_the_combinations_of[2,3,2,2] if [1*10 + 2 <=26]

Once we got this , the following should be easy. I implemented an Ruby version with the recusion trace for your reference.

def comb a
    c = []
    puts a.inspect
    return [a] if a.length <= 1

    c =  comb(a[1..-1]).map {|e| [a[0]] + e}
    if a[0] * 10 + a[1] <= 26
            c += comb(a[2..-1]).map { |f| [a[0] * 10 + a[1]] + f }
    end
    c
end

h = Hash[*(1..26).to_a.zip(('A'..'Z').to_a).flatten]
#h.keys.sort.each {|k| puts "#{k}=>#{h[k]}"}

comb([1,2,3,2,2]).each do |comb|
    puts comb.map {|k| h[k]}.join
end

[1, 2, 3, 2, 2]
  A1 [2, 3, 2, 2]
      [3, 2, 2]
         [2, 2]
            [2]
             []
      [2, 2]
         [2]
          []
  A2 [3, 2, 2]
      [2, 2]
         [2]
          []
ABCBB
ABCV
AWBB
AWV
LCBB
LCV

OTHER TIPS

You need not actually construct a tree - just recurse:

Take a single digit. See if we can form a word considering it as a letter in itself, and recurse.
When we return from the recursion, try adding another digit (if we were 1 or 2 previously), and re-recursing.

A brute-force solution would be to dynamically fill the array from 1 to N, where a[i] element contains a set of strings that form a[1]a[2]a[3]...a[i] after expansion. You can fill a[1] from stratch, then fill a[2], based on a[1] set and second character in the string. Then you fill a[3], etc. At each sted you only have to look back to a[i-1] and a[i-2] (and to s[i-1] and s[i], where s is your number sequence).

Finally, after you fill a[n], it will contain the answer.

For the example '12322', the sequence becomes:

a[1] = { "a" }
a[2] = { a + 'b' | a in a[1] } union { "l" } = { "ab", "l" }
a[3] = { a + 'c' | a in a[2] } union { a + 'w' | a in a[1] } = { "abc", "lc", "aw" }
a[4] = { a + 'b' | a in a[3] } union { } = { "abcb", "lcb", "awb" }
a[5] = { a + 'b' | a in a[4] } union { a + 'v' | a in a[3] } = { "abcbb", "lcbb", "awbb", "abcv", "lcv", "awv" }

This is essentially the dynamic programming version of the recursive solution above.

An alternative way to do this would be to reverse the problem:

Given a dictionary of words, calculate the numeric strings that would be generated, and store this data into a map/dictionary structure, i.e. table['85'] = 'hi'
For each of the first x digits of the number you are looking up, see if it's in the table, i.e. table.ContainsKey('1'), table.ContainsKey('12'), ...
If you're trying to find the word sequences, generate the words that start at each location in the numeric string, and then do a recursive lookup to find all phrases from that.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow