I get a fair amount of spam on my blog, and as a result whenever I get a decent amount of it gathered, I look at the apache logs and try to find ways to block it. Methods I've used, along with how much I think they're chopping things out:
- Block individual IPs at the iptables level from all communication with my machine - I'd guess this chops out about 10% of my spam, but it's hard to know because I don't get logs of any sort from these attempts.
- Reject comments where the IP of the machine that gets the comment form differs from that which posts the form - This cuts out the majority of my spam (probably about 75%) - it's not unusual for me to see the form retrieved from thailand and posted from china (or other similarly distant places. This has a cost - any proxy pools won't be able to comment on my blog (chopping out AOL at least). That's fine by me.
- Keyword-based rejections - blocking mention of viagra, cialis, and a number of other products that spammers like to mention cuts out about 10% after the above.
Things I am considering doing:
- Block Tor. In recent times, I've come to the conclusion that tor is a bad thing (it diminishes responsibility for one's computer), and I found a nice set of scripts to find IPs that would likely talk to my servers as tor proxies. I probably would just block tor from posting, as I don't like the idea of automating things that talk to iptables.
- Block all of China, Korea, India, and other asian countries from commenting. I've only once had a proper comment from outside the Americas and Europe, and the majority of my spam comes from IP addresses in that sphere. This is definitely just WRT comments - I know I have readers elsewhere in the world.
I'm not currently looking at CAPTCHAs - it'd take some effort to implement, and would not stop the poor people who are paid to spam from doing so.
Slightly orthogonally, I found this on the topic of email spam. I've never seriously thought about such a system, but I've been on an anti-spam kick for the last few weeks...