Regex for keyboard mashing

https://stackoverflow.com/questions/1159690

18-09-2019
|

Question

When signing up for new accounts, web apps often ask for the answer to a 'security question', i.e. Dog's name, etc.

I'd like to go through our database and look for instances where users just mashed the keyboard instead of providing a legitimate answer - this is a high indicator of an abusive/fraudulent account.

"Mother's maiden name?" lakdsjflkaj

Any suggestions as to how I should go about doing this?

Note: I'm not ONLY using regular expressions on these 'security question answers'

The 'answers' can be:

Selected from a db using a few basic sql regexes
Analyzed as many times as necessary using python regexes
Compared/pruned/scored as needed

This is a technical question, not a philosophical one ;-)

Thanks!

Solution

You're probably better off analyzing n-gram distribution, similar to language detection.

This code is an example of language detection using trigrams. My guess is the keyboard smashing trigrams are pretty unique and don't appear in normal language.

OTHER TIPS

I would not do this - in my opinion these questions weaken the security, so as a user I always try to provide another semi-password as an answer - for you it would like mashed. Well, it is mashed, but that is exactly what I want to do.

Btw. I am not sure about the fact, that you can query the answers. Since they overcome your password protection they should be handled like passwords = stored as a hash!

Edit:
When I read this article I instantly remembered this questions ;-)

The whole approach of security questions is quite flawed.

I have always found people put security answers weaker than the passwords they use.
Security questions are just one more link in a security chain -- the weaker link!

IMO, a better way to go would be to allow the user to request a new-password sent to their registered e-mail id. This has two advantages.

the brute-force attempt has to locate and break the e-mail service first (and, you will never help them there -- keep the registration e-mail id very protected)
- the user of your service will always get an indication when someone tries a brute-force (they get a mail saying they tried to regenerate their password)

If you MUST have secret questions, let them trigger a re-generated (never send the user's password, regenerate a temporary, preferably one-time forced) password dispatch to the e-mail id they registered with -- and, do not show that at all.

Another trick is to make the secret question ITSELF their registered e-mail id.
If they put it right, you send a re-generated temporary password to that e-mail id.

There's no way to do this with a regex. Actually, I can't think of a reasonable way to do this at all -- where would you draw the line between suspicious and unsuspicious? I, for once, often answer the security questions with an obfuscated answer. After all, my mother's maiden name isn't the hardest thing to find out.

If you can find a list of letter-pair probabilities in English, you could construct an approximate probability for the word not being a "real" English word, using the least possible pairs and pairs that are not in the list. Unfortunately, if you have names or other "non-words" then you can't force them to be English words.

Maybe you could check for an abundance of consonants. So for example, in your example lakdsjflkaj there are 2 vowels ( a ) and 9 consonants. Usually the probability of hitting a vowel when randomly pressing keys is much lower than the one of hitting a consonant.

Dejunk is a Ruby library from which you can draw inspiration. It implements a few of the suggestions in other answers. It considers input to be keyboard mashing if the input:

Contains character bigrams that are unlikely to appear in real text, but that are close together on a keyboard. (The library includes a list of such bigrams.)
Starts with an unexpected punctuation mark.
Has too many very short words.
Has no vowels.
Has characters that are repeated an unreasonable number of times.

You could check for a capital letter at the start.... that will get you some false positives for sure.

A quick google gave me this, you could compare each against a name in that list.

Obviously only works for the security question you stated.

Have you also seen this:

Anatomy of the twitter attack

I'm going to think hard next time i implement a security question.

If your question is ever something related to a real, human name, this is impossible. Consider Asian names typed with roman characters; they may very well trip whatever filter you come up with, but are still perfectly legitimate.

You could look for patterns that don't make sense phonetically. Such as:

'q' not followed by a 'u'.

asdf

qwer

zxcv

asdlasd

Basically, try mashing on your own keyboard, see what you get, and plug that in your filter. Also plug in various grammatical rules. However, since it's names you're dealing with, you'll always get 'that guy' with the weird name who will cause a false positive.

Instead of regular expressions, why not just compare with a list of known good values? For example, compare Mother's maiden name with census data, or pet name with any of the pet name lists you can find online. For a much simpler version of this, just do a Google search for whatever is entered. Legitimate names should have plenty of results, while keyboard mashing should result in very few if any.

As with any other method, you will still need to handle false positives.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow