Question

On my registration page I need to validate the usernames as alphanumeric only, but also with optional underscores. I've come up with this:

function validate_alphanumeric_underscore($str) 
{
    return preg_match('/^\w+$/',$str);
}

Which seems to work okay, but I'm not a regex expert! Does anyone spot any problem?

Was it helpful?

Solution

The actual matched characters of \w depend on the locale that is being used:

A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.

So you should better explicitly specify what characters you want to allow:

/^[A-Za-z0-9_]+$/

This allows just alphanumeric characters and the underscore.

And if you want to allow underscore only as concatenation character and want to force that the username must start with a alphabet character:

/^[A-Za-z][A-Za-z0-9]*(?:_[A-Za-z0-9]+)*$/

OTHER TIPS

Here's a custom function to validate the string by using the PHP ctype_alnum in conjunction with an array of allowed chars:

<?php

$str = "";
function validate_username($str) {

  // each array entry is an special char allowed
  // besides the ones from ctype_alnum
  $allowed = array(".", "-", "_");

  if ( ctype_alnum( str_replace($allowed, '', $str ) ) ) {
    return $str;
  } else {
    $str = "Invalid Username";
    return $str;
  }
}

?>

try

function validate_alphanumeric_underscore($str) 
{
    return preg_match('/^[a-zA-Z0-9_]+$/',$str);
}

Looks fine to me. Note that you make no requirement for the placement of the underscore, so "username_" and "___username" would both pass.

I would take gumbo's secondary regex, to only allow underscore as concatenation, but add a + after the _ so a user can be like "special__username", just a minor tweak.

/^[A-Za-z][A-Za-z0-9]*(?:_+[A-Za-z0-9]+)*$/

Your own solution is perfectly fine.

preg_match uses Perl-like regular expressions, in which the character class \w defined to match exactly what you need:

\w - Match a "word" character (alphanumeric plus "_")

(source)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top