Question

I want to check the quality of sentence formation. Specifically, I am looking to see if the end-user types a space after a punctuation. I am okay with a NLP library, or a simple java regex solution too.

For example:

  1. "Hi, my name is Tom Cruise. I like movies"
  2. "Hi,my name is Tom Cruise. I like movies"
  3. "Hi,my name is Tom Cruise.I like movies"

Sentence 1 is perfect, sentence 2 is bad since it has 1 punctuation without a space after it, and sentence 3 is the worst since none of the punctuations are succeeded with a space.

Can you please suggest a java approach to this? I tried the languagetool API but didn't work.

Était-ce utile?

La solution

Why don't you try Patterns and Unicode categories?

For instance:

Pattern pattern = Pattern.compile("\\p{P} ");
        Matcher matcher = pattern.matcher("Hi, my name is Tom Cruise. I like movies");
        while (matcher.find()) {
            System.out.println(matcher.group());
        }

The Pattern here searches for any punctuation followed by a space. The output will be:

, 
. 

(notice the space after the comma and the dot)

You could probably refine your Pattern by specifying which exact punctuation characters are eligible to be followed by a space.

Finally, in order to check for the opposite (a punctuation character not followed by whitespace):

Pattern otherPattern = Pattern.compile("\\p{P}\\S");

Autres conseils

Pattern pattern = Pattern.compile("\\p{P}\\S");

String[] tests = new String[] {
    "Hi, my name is Tom Cruise. I like movies",
    "Hi,my name is Tom Cruise. I like movies",
    "Hi,my name is Tom Cruise.I like movies"
};

int[] results = new int[] { 0, 0, 0 };

for (int i = 0; i < tests.length; i++) {
    Matcher matcher = pattern.matcher(tests[i]);
    while(matcher.find()) {
        results[i] += 1;
    }
    if (results[i] == 0) {
        System.out.println("Sentence " + (i + 1) + " is perfect");
    } else if (results[i] > 1 && results[i] < 3) {
        System.out.println("Sentence " + (i + 1) + " is good");
    } else {
        System.out.println("Sentence " + (i + 1) + " is bad");
    }
}
// now you know how many violations there were on every line.
// do whatever you want with them.
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top