Question

I know how to generate a sentences to n-gram. Ex: unigram and bigram (using number sequence)

1 2 3 4 5 (original senctence)
=>
1,2,3,4,5 (unigram)
12,23,34,45 (bigram)

How to combine unigram and bigram (or greater n-gram) to make all possible sentences with same original length.

1,2,3,4,5 (unigram)
12,23,34,45 (bigram)
=> 
1 2 3 4 5
1 2 3 45
1 2 34 5
1 23 4 5
1 23 45
12 3 4 5
12 3 45
12 34 5

I want to find algorithm to solve this problem. Thank you!

Was it helpful?

Solution

Here is the tip:

  • Assuming you have 5 numbers [1 2 3 4 5]
  • There are 4 places to insert space [1-2, 2-3, 3-4, 4-5]
  • 4-digit binary number represents combination (0 - no space, 1 - space)
  • For example: code 0110 matches [1 (0) 2 (1) 3 (1) 4 (0) 5] == [12 3 45]
  • Iterate through all n-1 binary codes.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top