Question

I am building a very basic profanity filter that I only want to apply on some fields on my application (fullName, userDescription) on the serverside.

Does anyone have experience with a profanity filter in production? I only want it to:

'ass hello' <- match
'asster' <- NOT match

Below is my current code but it returns true and false on in succession for some reason.

var badWords = [ 'ass', 'whore', 'slut' ]
  , check = new Regexp(badWords.join('|'), 'gi');

function filterString(string) {
  return check.test(string);
}

filterString('ass'); // Returns true / false in succession.

How can I fix this "in succession" bug?

Was it helpful?

Solution

The test method sets the lastIndex property of the regex to the current matched position, so that further invocations will match further occurrences (if there were any).

check.lastIndex // 0 (init)
filterString('ass'); // true
check.lastIndex // 3
filterString('ass'); // false
check.lastIndex // now 0 again

So, you will need to reset it manually in your filterString function if you don't recreate the RegExp each time:

function filterString(string) {
    check.lastIndex = 0;
    return check.test(string);
}

Btw, to match only full words (like "ass", but not "asster"), you should wrap your matches in word boundaries like WTK suggested, i.e.

var check = new Regexp("\\b(?:"+badWords.join('|')+")\\b", 'gi');

OTHER TIPS

You are matching via a substring comparison. Your Regex needs to be modified to match for whole words instead

How about with fixed regexp:

check = new Regexp('(^|\b)'+badWords.join('|')+'($|\b)', 'gi');

check.test('ass') // true
check.test('suckass') // false
check.test('mass of whore') // true
check.test('massive') // false
check.test('slut is massive') // true

I'm using \b match here to match for word boundry (and start or end of whole string).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top