Question

I am putting together a basic profanity filter in java to detect profanity on user input. I am not trying to handle all possible scenarios which I know that is probably impossible to solve using a computer only. However, I do want to handle few basic scenarios which a computer should be suitable to handle. In this particular case I am trying to detect a user trying to break the filter by using spaces between letters. for example: "hello, I am using a s m u r f word here". (smurf being the "bad" word here).

In my current implementation I keep list of words which I check the input text against:

public boolean containsBadWords (String text) {

    for (String word : badWords) {
        if (text.matches (".*\\b" + word  +"\\b.*")) {
            return (true);
        }
    }

    return (false);
}

But this would not handle the spaced letters issue I described above.

Anybody knows how to collapse these words using Java so I can process them using a basic text matching algorithm?

Était-ce utile?

La solution

Prepare a list of forbidden words, go over the words, convert words into regex, eg "smurf" -> " s *m *u *r *f * "

String regex = " " + word.replaceAll("(.)", "$1 *") + " ";

and try to find it in the text

boolean found = Pattern.compile(regex).matcher(text).find();
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top