October 29, 2002
Comment Spam Problem Continued
In regards to the comment spam problem mentioned earlier, one idea kicked around was checking the http_referer to make sure that the comment post came from the same server as the form.
We taked about the possibility of empty http_referers -- not all browsers send a referer and proxy servers can strip out the referer. The solution would be to allow empty referers in addition to referers from the server. Unfortunately, though, allowing for empty http_referers will also allow in the comment spammer.
The reason why allowing empty referers opens the door to the spammer is the comment spamming code would invoke my comment code directly, not through a link from an HTML page. In this case http_referer would be empty.
I could become more restrictive, remove the permission for empty referer, but if I do, I won't be letting some of you through (as you've been kind enough to let me know via email tonight).
Sam Ruby had some good ideas such as putting hidden form fields into the comment forms and testing for these and this will be a next step. This means adding form fields to all templates related to comments, and then adding code to mt-comments.cgi. Doable, and many appreciations to Mr. Ruby for excellent ideas. (If you don't know Sam, he works on some weird sounding things such as "Comanche" and "SOUP" -- stuff like that).
A really nifty and difficult to crack approach (IMO) would be to take the person's login name and the comment id for each comment and use these to create an encrypted value. Stuff this into an HTML form field. When the form is processed, test to see if the encrypted value checks out. If the person's login name isn't exposed, which is should NEVER be, it becomes a 'key' for the encryption, easily accessible to the MT program and the MT user, NOT to the spammer. And the different comment identifiers would make sure that the encrypted values changed with each comment.
Only problem with this solution is it would require cracking into the MT internal code.
Question: what do you think of this as a solution, and is it worth the time to do it?
(However, by now, Phil or someone else of like cailber will have found and coded a solution and have it half way distributed throughout the world. I should just leave these little challenges to others -- what do I know?)
Posted by Bb at October 29, 2002 12:08 AM
If you see some suspicious activity around entries 630 through 635 (or a badly typoed version of their URL, oops), don't worry, it's just me with my script kiddie/spammer hat on.
Well, I'm a bit slow since I always have to look up the syntax for preg_match, but a script to get five entries and parse out the entry_id (as a proxy for your hidden key) took me ten minutes to write, and takes one second to run. Couple of minutes to have an array of every comment form with every hidden input's name and value, and that's using unthreaded PHP. I know there's at least one open-source threaded spider available, since I see Larbin passing through every now and then, and it claims to be able to get 5 million pages a day on a fast enough network.
So, I'd say go ahead and do it if you want, but don't think of it as anything but a very temporary stopgap measure: any spammer who actually takes a look at what he's getting can defeat it in a few minutes, and someone targeting weblogs for spam isn't going to have much trouble finding your explanation of how to do it (the downside of being on the side of the angels is that you have to do everything out in the open: individuals can quietly hack fixes in to slow down a spammer enough that he won't bother to work around them, but anything you advertise or anything that MT distributes has to be able to work even though the spammers know every detail). Even my pet idea of comparing each new comment to the last n comments, looking for similarities in email, link, and text when there are too many comments coming in in too short a time, will end up being as much a recipe for how to get around it as it will be a deterrent.
I really wish there was some more accessible, less annoying alternative to those blurry gifs of numbers that you have to type in to sign up for free email accounts: I'd like to return the form with one of those when the "I might be being spammed" switch has been flipped, but not at the expense of being completely inaccessible to the visually impaired. Maybe "I think I'm being spammed: please either enter this blurry number, or wait 30 seconds before resubmitting"? That would be pretty easy to script, though. Um, "someone from your IP address already commented in the last 30 seconds, and I don't think you can say something useful that quickly; try again in less than another minute and your IP address will be banned for 24 hours"? I know people say that it's easy to fake your IP address, but is it? Especially if the URL you post to returns a 302, so that you have to get the reply and act on it to post? Dunno.
But whatever else you do, don't stop thinking about it and leave it up to someone else. I've done that too many times myself, but even if someone else knows more and writes better code than you, they don't care like you do, so they're way too likely to just drop it.
I think what is appropriate here is more akin to the glorified padlock that you probably have on your front door and not the full blown vault that your bank uses. And you will have to find a way to sleep nights knowing that somebody with the modest scripting skills of Phil or myself can break in if we really want to. This is definately an area where there are people who are both smarter and have studied this particular problem domain longer than either Phil or I.
View source on my comment entry page. Search for secretToken.
I'd ask the user to use a magic word in her comment."This post's magic word is 'banana', so your comment has to include 'banana' in it" - it promotes creative writing, too :-)