October 28, 2002
Comment Spammers redux
Seems to be a technology day today.
Phil caught a comment spammer who was trying to dump spam comments in all of his posts. This process would work within any weblog that sequentially numbers weblog posts (ie Movable Type).
I'm going to try and tweak my mt-comments.cgi to stop POSTs from pages outside of my root URL. This is my way of warning you all that the comments, web pages, weblog may be a tad more behaviorally challenged than normal.
Update: I added checks on referers and this will prevent posts from locations other than my own weblog server. Unfortunately, as Phil pointed out, http referers are fairly easy to fake. I also wrote a test script that did so, and my checks failed to catch a 'fake' referer.
Still, it's a start...
If you attempt to post a comment and fail, please send me an email and I'll check to see what the problem is. Unless, of course, you're the spammer. In which case: Eat dirt and die scuzzbucket!
Ahem. Thank you.
Posted by Bb at October 28, 2002 09:52 AM
I've thought about that (as a way to prevent hijacking of a remotely hosted comment service, before this), but I've never been willing to prohibit comments from people running "privacy" programs that stop the browser from sending referrers. One thing you might try, until comment spammers get a little smarter, is to allow POSTs with no referrer, and only block ones with a referrer that's not you: my moron spammer was sending "http://www.google.com" as a referrer on all his POSTs.
D'oh. What am I saying? Referrers are easy to fake: it'll take less time for a spammer to change to using the URL he's POSTing to as the referrer than it will take you to change to blocking by referrer.
Phil, I did add in blank referer as well as test for my server, and it's about the best I can do until someone comes up with something better.
As you said, this person isn't sophisticated, and faking ENV values requires _some_ level of sophistication.
I'll test it for a bit, and wait and see if someone comes up with something else (or I think of something else). If not, I'll post my mods and let people at least use them for now.
1) Put some data into a hidden field into the comment form that is easily obtainable only on your server, like the last modified time (slightly obscured perhaps) of your parent blog entry, and verify it.
2) Add a hidden timestamp field (again, possibly slight obscured), and only allow comments that are issued between 10 seconds and 20 minutes after retrieval of the form. Would slow down automated spam in any case.
3) Require the data retrieved from the url to contain the string "burningbird" someplace in the contents (possibly visibly, possibly not). This acts as a distributed registration, and has other social consequences that are hard to predict.
2) worries me a bit, as I sometimes spend several minutes on a comment.
I like 1), though. Hard to spoof.
1) is easy to automate, but probably beyond the abilities of Phil's nemesis.
2) adjusting the upper bound to 180 minutes probably wouldn't cause any undo harm. Removing the bound entirely means that a sufficiently early timestamp is a universal key.
Sam, thanks for your suggestions! I like the idea of hidden data that I validate in addition to the other checks. And this should be doable in templates and within mt-comments.cgi, without having to crack into the 'guts' of Ben and Mena's code.
Looking for the hidden data would require deliberation, and spammers aren't deliberate -- they favor shot gun approaches and hitting whatever they can get.
And I'd hate to frustrate my readers who want to take time on comments. They might stop commenting, and then I'd be left, alone, crying into my double shots n creme.
So, is there an ETD (Estimated Time of Delivery) for code that those of us in the non-programming contingent can paste into their MT templates? I'm already thinking this might call for a Tim Tam or Dishmatique reward challenge...
Tomorrow morning when you rise, we'll have an interim solution. I would post now, but I want to test the mt-comments.cgi code just a bit more. And I wouldn't mind Sam and Phil's vetting on the solution -- though I supposed they'd get a share of the goodies, wouldn't they?
I got it! I got it I got it I got it.
I think I've got something that will be extremely difficult to crack. I hope.
More in a few hours...
Just do me a favor. Give away the source. All of it. And if you feel so inclined, give me a little credit.
Same spammer hit my site, same methods. I eagerly await your solution!