Managing Unregistered User Posts by Screening

https://stackoverflow.com/questions/670370

21-08-2019
|

Question

I am considering allowing users to post to my site without having them register or provide any identifying information. If each post is sent to a db queue and I then manually screen these posts, what sort of issues might I run into? How might I handle those issues?

Solution

Screening every post would be tedious and tiresome. And prone to annoying admin spam. My suggestion would be to automate as much of screening as possible. And besides, providing any identifying information does nothing to prevent spam (a bot will just generate it).

A lot of projects implement recognition system: first the user has to post 1-2 posts that are approved, then by IP and (maybe) a cookie he's identified as a trusted poster, so his posts appear automatically (and later can be marked as spam).

Also some heuristics on the content of the post could be used (like amount of links in the post) to automatically discard potential spam posts.

OTHER TIPS

The most obvious issue is that you'll get overwhelmed by the number of submissions to screen, if your site is sufficiently popular.

I would make sure to add some admin tools, so you can automatically kill all posts from a particular IP address, or that match a particular regex. That should help get rid of obvious spam faster, but again, you'd have to be behind the wheel for all of that.

Tedium seems to be the greatest concern – screening posts manually is effective against spam (I'm assuming this is what you want to weed out) but very boring.

It could be best fixed with a cup of coffee and nice music to listen to while weeding?

I've found that asking for the answer to a simple question sent the browser as an image (like "2 + 3 - 4 =", a varient of a 'captcha' but not so annoying), with a wee bit of Javascript does quite well.

Send your form with the image and answer field, and a hidden field with a "challenge" (some randomly generated string). When the user submits the form, hash the challenge and the answer, and send the result back to the server. The server can check for a valid answer before adding it to the database for review.

It seems like a lot of work up front, but it will save hours of review time. Using jQuery:

<script type="text/javascript">
//   Hash function to mask the answer
function answerMask()
{
  var a = $('#a').val();
  var c = $('#c').val();
  var h = hex_md5(hex_md5(a) + c);
  $('#a').val(h);
}
</script>
  <form onsubmit="answerMask()" action="/cgi-bin/comment.py" method="POST">
    <table>
      <tr><td>Comment</td><td><input type="text" name="comment" /></td></tr>
      <tr><td># put image here #</td><td><input id="p" type="text" name="a" size="30" /></td></tr>
      <tr><td><input id="c" type="hidden" value="ddd8c315d759a74c75421055a16f6c52" name="c" /></td><td><input type="submit" value=" Go "></td></tr>
    </p>
  </form>

Edit update...

I saw this technique on a web site, I'm not sure which one, so this idea isn't mine but you might find it useful.

Provide a form with a challenge field and a comment field. Prefix the challenge with "Pick the third word from: glark snerm hork morf" so the words, and which one to pick, are easy to generate on the server and easy to validate when the form contents come back.

The point is to make the user do something, apply a few brain cells, and more work than it's worth for a script kiddie.

posts that attempt to look legit but aren't
the sheer volume

These are the issues that I see on my blog.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow