Allow subscriber list registration but prevent scripts

https://stackoverflow.com/questions/11537919

21-06-2021
|

Question

Users on my site have a publicly-visible profile where they accept subscriptions via a simple HTML form. These subscriptions are merged into this user's email list.

Someone could write a script that registers emails constantly to destroy/flood a user's list. This could be mitigated by using IP-based rate-limiting, but this solution does not work if the script runs in a distributed environment.

The only strategy I can think of is using a CAPTCHA, but I'd really like to avoid doing this. What else can I try?

Solution

Your question essentially boils down to "How can I tell humans and computers apart without using a CAPTCHA?"

This is indeed quite a complex question with a lot of different answers and approaches. In the following I'll try to name a few. Some of the ideas were taken from this article (German).

Personally I think some kind of CAPTCHA would be a perfect solution. This doesn't have to be necessarily warped text in an image, you could also use logic puzzles or simple calculations. But with the following methods you could try to avoid CAPTCHAs; keep in mind that these methods will always be easier to bypass than CAPTCHAs which require user interaction.

Use a hidden field as a honeypot in your form (either type=hidden or use CSS). If this field is filled out (or has another value than you'd expect), you have detected a bot (spam bots usually don't perform semantic analyses, so they fill out everything they find). However this won't work correctly if the bot is specifically targeted at you or simply learns the name of the field and avoids it.
Use JavaScript to check how fast the form is submitted. Of course humans need some time (at least a few seconds) to fill in a form whereas bots are a lot faster. You should also check if the form is submitted more than once in a short time. This could be done via JavaScript if you use AJAX forms and/or server-side. The drawback is (as you mentioned yourself), it won't work in distributed systems.
Use JavaScript to detect focus events, clicks or other mouse events that indicate you're dealing with a human. This method is described in this blog article (including some source code examples).
Check if the user works with a standard web browser; spammers sometimes use self-written programs. You could check the user agent string, but this can be manipulated easily. Feature detection would be another possibility.

Of course methods 2-4 won't work if a user has JavaScript disabled. In this case you could display a regular CAPTCHA in <noscript> tags for example. In any case you should always combine several methods to get an effective and user friendly test.

What finally comes to my mind (in your specific case) is checking the validity of the email addresses entered (not only syntactically but also check if the addresses really exist). This can be done in several ways (see this question on SO) - none of them is really reliable, though. So, again, you will have to combine different methods in order to reliably tell humans and bots apart.

OTHER TIPS

Assuming that whoever starts spamming your website specifically targets your website (not a random spam-bot) and will try actively work around all countermeasures then the only option is some kind of captcha, as anything else can be automatically avoided.

All non-captcha methods of preventing fake/spam submissions work either by exploiting flaws in script doing the automated submissions or analyzing the content submitted. With the type of submissions content analysis isn't really an option here. So what is left is a wide variety of automated submission prevention used in fighting for example spam comments:

CSS based solutions ( such as this one: http://wordpress.org/extend/plugins/spam-honeypot/ )
JS based solutions: hidden field is filled by data computed by javascript - if the content is submitted by something as simple as spam script that doesn't support java script it's easily detectable

It's possible to work around those two if the attacker knows they are there - for example when your website is a selected, not random, target.

To summarize: there are plenty solutions that will quite successfully stop random spam submissions, but if someone is specifically targeting your website the only real thing that will work is something that computers are bad at - CAPTCHA.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow