Pregunta

I am working on validating my commenting script, and I need to strip down all non-alphanumeric chars except those used in Western Europe.

My plan is to regex out all non-alphanumeric characters with:

preg_replace("/[^A-Za-z0-9 ]/", '', $string);

But that so far strips out all European characters and a £ sign, so "Café Rouge" becomes "Caf Rouge".

How can I add an array of Euro chars to the above regex.

The array is:

£, €, 
á, à, â, ä, æ, ã, å,
è, é, ê, ë,
î, ï, í, ì,
ô, ö, ò, ó, ø, õ,
û, ü, ù, ú,
ÿ,
ñ,
ß

I use UTF-8

SOLUTION:

$comment = preg_replace('/[^\p{Latin}\d\s\p{P}]/u', '', $comment);

and

$name = preg_replace('/[^\p{Latin}]/u', '', $name);

$name aslo removes punctuation marks and spaces

Thanks for quick replies

¿Fue útil?

Solución

preg_replace('/[^\p{Latin}\d ]/u', '', $str);

Otros consejos

echo preg_replace('/[^A-Z0-9 £€áàâä...]/ui', '', $string);

The important part is the /u flag. Make sure your source code and $string are UTF-8 encoded.

I still think it's the wrong approach, because it severely limits what your users can enter and it will annoy some, but whatever floats your boat... BTW, your list contains no punctuation characters.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top