Blog Spam

Blog spam is completely different from e-mail spam. The objective of the e-mail spammer is for you to read their message and respond quickly. The opposite holds true of the weblog spammer. The spammer needs their comments to remain undetected (or at least undeleted) to boost and maintain the pagerank of the site that they are spamming for. This type of spam is not limited to blogging systems and can easily be expanded to take into account other collaborative portals (e.g., wiki, forums, etc.). In regards to blogs, the main ports of entry for spam are:

The last of these two do not require any human interaction at all and are more automated processes of communication. While Pingbacks require links back to the system being commented on, they too can be spoofed. While many systems have anti-spam measures on the web interface to prevent automated comment spam (e.g., Captcha, arithmetic or logic questions, obfuscated javascript code, etc.), the main concern of this article is for processing spam that gets beyond the UI.

SpamAssassin Integration

While there are difference between e-mail spam and blog spam, SpamAssassin is a strong candidate as a basis for preventing blog spam. There have already been several attempts to integrate SpamAssassin with a blog (WordPress and Moveable Type):

These plugins basically take the content from a blog, tests it with SpamAssassin, and flags it as needing moderation if deemed unsafe.

Miscellaneous Notes

Other Resources

PHSDL

Sharing its methodology with BlogSpamAssassin under PHSDL GNU.

Project Framework Constraints

BlogSpamAssassin Directives

Avoid Problemetic Honeypots

Minimize resource consumption

Adopt Standarized Filtering Techniques

BlogSpamAssassin Sub Algorithm PHSDL Filter Test

Mailing List

A mailing list has been created to begin work on this project.

blogspam@spamassassin.apache.org

You can subscribe via:

blogspam-subscribe@spamassassin.apache.org