The Phantom Menace: Is Comment Spam Killing Your Site?
Comment spam is a thing of the past. Or so I thought. Then, one Friday night an alert popped up saying one of our web servers was down. Whilst hunting the problem, I discovered that it might still be a major problem for websites and had a go at some code to tackle it.
Why bother? Over the last week 1,774 IP addresses posted 16,149 comments. Or at least they tried. Luckily, they were mostly blocked by anti-spam tools and CAPTCHAs. At that volume, the comment spam is at best slowing down the server and unnecessarily using bandwidth, at worst it’s equivalent to a DDOS attack, effectively taking a website down.
Searching back several months through the logs it’d been getting worse and worse over time. The same thing could be affecting your site, too. Have you checked? It might be worth a look, speed is critical with one site suggesting each second of page load time costs 7% of sales.
Although the rise of Twitter, Facebook and Linkedin have created many ways for writers and publishers to connect with their audience, good old-fashioned comments still have their place providing a stronger direct link.
Most popular content management systems (CMS) and blogging tools, like Drupal or WordPress have commenting built in. Unfortunately, it’s also been a way for spammers to advertise their wares from erectile dysfunction drugs to knock-off designer goods.
Fortunately, those same blogging tools have also developed anti-spam tools like Akismet and Mollom that keep a beady eye on the bad guys, blocking or filtering those messages. And like email spam, the arms race continues, as the spammers find ways round the tools, the tools get better, etc, etc.
What Can You Do?
Let’s assume you’ve already got anti-spam tools enabled on your site. If not, do this first or as soon as the spammers discover the site, you’ll start seeing plenty of rubbish in your comments.
To stop the spammers wasting your, and more importantly, your users’ time, there’s two options:
- Block the spammers’ IP addresses at the firewall so they don’t trouble the web server
- Use a 3rd party commenting tool like Disqus, Livefyre or Facebook (see References).
The second option is a great way to delegate the hassle, but it does mean that your site’s comments live on another server. Potentially this method is also less effective for SEO.
Blocking spammers’ IP address stops them accessing your website. It’s more work, but means they can’t post comments. Also, they can’t scan the website for pages where they might be able to submit comments. But…and it’s a big but, there’s more work involved.
There may be a lot of solutions available, but I couldn’t find one, so decided to experiment and put something together.
An Attempt at Blocking the Spammers
So, what to do?
- Identify the spammers’ IP addresses
- Check to see if they’re legit or bad guys/gals
- Add them to a block list on the firewall
The first part is easy. Web servers keep logs. Those logs list each IP address and it’s easy enough to look for the string that indicates someone tried to submit a comment. For Drupal, it’s POST /comment/reply.
Turns out, 1,000 IP addresses had tried to post 12,000+ comments. Hmm, for a normal site that’s an awful lot of traffic and I certainly didn’t remember that number of comments on the site. Over the same period only 4 legitimate comments were posted.
OK, so there’s lots of bad guys (or gals) posting to the site, but how do we know who’s legit?
The last part is more tricky and varies depending on servers. On Linux, which is used by a huge chunk of the web has a handy tool called ipset which makes it easy and efficient to block thousands of IP addresses.
If you’re running Drupal on Linux, I wrote a script that might be useful – although if you’re gonna use it, please be careful as it changes your firewall settings. It might not be terribly elegant (go easy, it’s been a long time since I wrote code and am new to Python), but it does work (at least on our server):
It could easily adapted to work for WordPress and most likely other blogging tools, too. There should be enough comments in the code to make it easily adaptable. Comments/suggestions are more than welcome, but no spammers, eh?!
- Facebook Comments