سؤال

A long time ago I wrote a method called detectBadChars(String) that inspects the String argument for instances of so-called "bad" characters.

The original list of bad characters was:

  • '~'
  • '#'
  • '@'
  • '*'
  • '+'
  • '%'

My method, which works great, is:

// Detects for the existence of bad chars in a string and returns the
// bad chars that were found.
protected String detectBadChars(String text) {
    Pattern pattern = Pattern.compile("[~#@*+%]");
    Matcher matcher = pattern.matcher(text);

    StringBuilder violatorsBuilder = new StringBuilder();

    if(matcher.find()) {
        String group = matcher.group();
        if (!violatorsBuilder.toString().contains(group))
            violatorsBuilder.append(group);
    }

    return violatorsBuilder.toString();
}

The business logic has now changed, and the following are now also considered to be bad:

  • Carriage returns (\r)
  • New lines (\n)
  • Tabs (\t)
  • Any consecutive whitespaces (" ", " ", etc.)

So I am trying to modify the regex to accomodate the new bad characters. Changing the regex to:

    Pattern pattern = Pattern.compile("[~#@*+%\n\t\r[ ]+]");

...throws exceptions. My thinking was that adding "\n\t\r" to the regex would allot for newlines, tabs and CRs respectively. And then adding "[ ]+" adds a new "class/group" consisting of whitespaces, and then quantitfies that group as allowing 1+ of those whitespaces, effectively taking care of consecutive whitespaces.

Where am I going awyre and what should my regex be (and why)? Thanks in advance!

هل كانت مفيدة؟

المحلول

Just using \\s will account for all of them. And add the + quantifier on entire character class, to match 1 or more repetition:

Pattern.compile("[~#@*+%\\s]+");

Note that in Java, you need to escape the backslashes. So it's \\s and not \s.

نصائح أخرى

I think this should work.

Pattern.compile("[~#@*+%\n\t\r\\s{2,}]");

You need \\s{2,} to match any consecutive whitespaces.

Edit: I did a mistake above. Thanks to Alan Moore for pointing it out. Here is the new solution.

Pattern.compile("[~#@*+%\n\t\r]|\\s{2,}")
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top