Question

With PHP, I'd like to use a preg_replace() filter for passwords such that the only characters available for passwords are US ASCII typable, minus control codes and NULL.

What's the RegEx to achieve that which I can plugin to preg_replace()?

EDIT:

I've been advised to edit this question since I "get it" now and won't be doing this terribly unpopular technique and will permit any typable character even ones I might not have on my keyboard, just as long as they aren't control codes.

Was it helpful?

Solution

As others have said, don't restrict the set of characters that are allowed in passwords. Just because your keyboard doesn't have ä, å, or ö on it is no reason to stop those of us who do have them (or know how to type them anyhow) from using those letters. You're going to be storing the password as a cryptographic hash anyhow (or at least as an encrypted string), aren't you? If so, then it doesn't matter whether your database can successfully/safely store the actual characters in the password anyhow, only the characters output by your crypto algorithm. (And if not, then storing passwords in plaintext is a far bigger problem than what characters the passwords may or may not contain - don't do that!)

Your apparent intent to enforce your character set restrictions by silently stripping the characters you dislike rather than by telling the user "Try again and, this time, only use these characters: a, e, i, o, u." makes your proposed method truly atrocious, as it means that if I attempt to use, say, the password fäîry (not incredibly secure, but should hold up against lightweight dictionary attacks), my actual password, unknown to me, will be fry (if your password is a three-letter word, straight out of the dictionary and in common use, you may as well not even bother). Ouch!

OTHER TIPS

Personally, I've always found it highly disturbing when a web site or service tried to force me to use passwords that follow a certain (usually downright stupid) limitation.

Isn't it the whole point of passwords that they are not too easily guessable? Why would you want them to be less complex than your users want them to be? I can't imagine a technical limitation that would require the use of "ASCII only" for passwords.

Let your users use any password they like, hash them and store them as Base64 strings. These are ASCII only.

Here you go:

^[ -~]+$

assuming you don't want empty passwords; otherwise it's:

^[ -~]*$

to allow empty ones.

I'm not sure why you're asking about preg_replace - I'd be wary of manipulating the passwords that people type. Better to enforce the rule that you only accept printable ASCII, and tell the user if they break that rule (or, as others have said, to not have any rules, but I assume you have reasons for them).

If you're thinking of quietly removing the characters that don't match, and someone comes along with a password of Úéåæ, then you'll be storing an empty password for them without their knowledge.

Please don't filter your user passwords. That defeats a whole lot of the point. I wrote more about this here: http://www.evanfosmark.com/2009/06/why-do-so-many-websites-fail-with-password-restrictions/

I disagree that there is no reason to reject non-ascii characters, although it's up to you to decide whether the pros outweigh the cons.

If you allow non-ascii characters, then you are in fact committing to properly internationalize that portion of your web application. For many applications, internationalization is an afterthought. For web applications, it's a very non-trivial matter.

If you don't explicitly control the character encoding when you go between characters and bytes, then you are basically relying on whatever the defaults happen to be for your deployment. If your configuration ever changes (e.g. migrating from Windows to Linux, or switching to another web server), then your defaults have a good chance of changing from under you, and then the non-ascii characters will serialize to a different byte sequence. So, all of a sudden, the hashes of people using them in their passwords will not match what's in the database, and they'll get locked out of their accounts.

I do, of course, agree that it's completely unacceptable to just filter out those characters; you have to either accept or reject the password.

/[\p{Cc}]/ to get control characters (I think this covers 0-31)

I agree with Richie. Use preg_match instead of preg_replace.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top