PHP Filters
Drupal
http://www.kerneltrap.org/jeremy/drupal/spam/
SK (Wordpress)
http://unknowngenius.com/blog/wordpress/spam-karma/
PHP profanity filter (basic)
http://www.experts-exchange.com/Web/Web_Languages/PHP/Q_21084772.html
http://us3.php.net/manual/en/function.str-replace.php#58317
Profanity Filters
http://dev.wp-plugins.org/file/text-filter-suite/tfs-core.php
http://freepss.googlecode.com/svn/Source_files/Release_0_3b/swear_filter.php
http://www.php.net/manual/en/function.preg-replace.php#63063
Recommended Reading
Proving Which Spam Filters work Best (slashdot.org comment)
popfile recommendation (slashdot.org comment)
brightmail recommendation (slashdot.org comment)
SK Algorithm
1. is_autheticated_commentor
2. check_excessive_links (SK basic)
3. check_html_entities (SK basic)
4. check blacklist - ip, commentor url, links (SK blacklist)
5. check whitelist - ip (SK blacklist)
6. check javascript (SK javascript)
7. check watermark (SK payload)
8. check_post_age (SK basic)
9. check rbl (SK rbl)
10. check_commentor_history (SK snowball)
11. check referrer (SK referrer)
Drupal
Algorithm
1. get title.body hash (to check for repeaters)
2. sanity check
3. filter 1: check for duplicates
4. filter 2: custom filter
5. filter 3: check for spam urls
6. filter 4: too many urls
7. external filters?
8. Bayesian filter
9. get final score
10. log
11. filter IP
Features
* Written in PHP specifically for Drupal.
* Highly configurable.
* Automatically detects and unpublishes spam comments and other spam
content.
* Automatically learns to detect spam in any language using Bayesian
logic.
* Automatically learns and blocks spammer URLs.
* Automatically blacklists IPs of learned spammers, preventing them
from posting additional spam and wasting database resources.
* Detects repeated postings of the same identical content.
* Detects content containing too many links, or the same link over
and over.
* Supports the creation of custom filters using powerful regular
expressions.
* Can notify the user that his or her content was determined to be
spam, preventing confusion over why their content doesn't show up.
* Can notify the site administrator in an email when spam is
detected.
* Provides simple administrative interfaces for reviewing spam
content.
* Provides comprehensive logging to offer an understanding as to how
and why content is determined to be or not to be spam.
Comments (0)
You don't have permission to comment on this page.