Question

I have a list of words and i have to find the minimum word sub-segment of words, neglecting any special characters and digits, this sub-segment can have words in any order and case-insensitive search and this coding is to be done in java

for e.g.

aaaa aaaa cccc cccc bbbb bbbb bbbb bbbb Bbbb Aaaa Cccc

and have words

aaaa
bbbb
cccc

then it should give me output as

Bbbb Aaaa Cccc

i am aware about regular expression (regex in java) but, new to it so any help would be of great use..

Was it helpful?

Solution

What you could do is to construct a regex like this:

(?i)\b(aaaa|bbbb|cccc)(?=\W+(\w+)\W+(\w+)\b)
\__/  \_____________/    \______/         \__ makes sure it's a complete word
 |           |               \____ repeat N-1 times (N = number of words)
 |           \___ all words alternated to match the first word
 \__ case insensitive matching

Then with Java check that the capturing groups contain all the words. If they do you found a match, if not, search for next match and repeat.


You could take this all the way to be solved with regex only, but you have to construct the proper expression:

(?i)\b(words)\W+(?!\1\b)(words)\W+(?!(?:\1|\2)\b)(words)\b
       \___/ \________________/   \_____________/
         |           |                  |
  list of all the    |         lookahead has to include
  words alternated   |         all previous capturing groups
                     |
             repeat N-1 times but you have to 
             change the lookahead each time

This would be a pretty big expression for many words, altho words can be any expression that matches all allowed words (doesn't have to be alternations).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top