Question

I wanted to write a javascript function to sanitize user input and remove any unwanted and dangerous characters.

It must allow only the following characters:

  • Alfanumeric characters (case insentitive): [a-z][0-9].
  • Inner whitespace, like "word1 word2".
  • Spanish characters (case insentitive): [áéíóúñü].
  • Underscore and hyphen [_-].
  • Dot and comma [.,].
  • Finally, the string must be trimmed with trim().

My first attempt was:

function sanitizeString(str){
str = str.replace(/[^a-z0-9áéíóúñü_-\s\.,]/gim,"");
return str.trim();
}

But if I did:

sanitizeString("word1\nword2")

it returns:

"word1
word2"

So I had to rewrite the function to remove explícitly \t\n\f\r\v\0:

function sanitizeString(str){
str = str.replace(/([^a-z0-9áéíóúñü_-\s\.,]|[\t\n\f\r\v\0])/gim,"");
return str.trim();
}

I'd like to know:

  1. Is there a better way to sanitize input with javascript?
  2. Why \n and \t doesn't matches in the first version RegExp?
Was it helpful?

Solution

The new version of the sanitizeString function:

function sanitizeString(str){
    str = str.replace(/[^a-z0-9áéíóúñü \.,_-]/gim,"");
    return str.trim();
}

The main problem was mentioned by @RobG and @Derek: (@RobG write your comment as an answer and I will accept it) \s doesn't mean what now w3Schools says

Find a whitespace character

It means what MDN says

Matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​​\u202f\u205f​\u3000].

I trusted in w3Schools when I wrote the function.

A second change was to move the dash character (-) to the end in order to avoid it's range separator meaning.

  • Note 1: This is a server side validation using javascript.
  • Note 2: (for IBM Notes XPagers) I love javascript in XPages SSJS. This is simpler for me than the Java way.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top