Question

Basically I want to iterate through all sentence, for example:

string sentence = "How was your day - Andrew, Jane?";
string[] separated = SeparateSentence(sentence);

separated output is following:

[1] = "How"

[2] = " "

[3] = "was"

[4] = " "

[5] = "your"

[6] = " "

[7] = "day"

[8] = " "

[9] = "-"

[10] = " "

[11] = "Andrew"

[12] = ","

[13] = " "

[14] = "Jane"

[15] = "?"

As of currently I can only grab words, using "\w(?<!\d)[\w'-]*" Regex. How to separate sentence into smaller parts, according to output example?

Edit: The string doesn't have any of the following:

  • i.e.

  • solid-form

  • 8th, 1st, 2nd

Was it helpful?

Solution

Check this out:

        string pattern = @"^(\s+|\d+|\w+|[^\d\s\w])+$";
        string input = "How was your 7 day - Andrew, Jane?";

        List<string> words = new List<string>();

        Regex regex = new Regex(pattern);

        if (regex.IsMatch(input))
        {
            Match match = regex.Match(input);

            foreach (Capture capture in match.Groups[1].Captures)
                words.Add(capture.Value);
        }

OTHER TIPS

I suggest you implement a simple lexer (If such a thing exists) that will read the sentence one character at a time and generate the output you are looking for. Although not the simplest solution, it has the advantage of being scalable in case your use cases get more complicated as @AndreCalil suggested.

Why not something like this? It's tailored to your test case, but if you add punctuation this might be what you're looking for.

(\w+|[,-?])

EDIT: Ah, to steal from Andre's response, this is what I was envisioning:

string pattern = @"(\w+|[,-?])";
string input = "How was your 7 day - Andrew, Jane?";

List<string> words = new List<string>();

Regex regex = new Regex(pattern);

if (regex.IsMatch(input))
{
    MatchCollection matches = regex.Matches(input);

    foreach (Match m in matches)
        words.Add(m.Groups[1].Value);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top