Add other spam scanners to amavisd-new

Posted: Feb 8, 2006

by Felix Schwarz

This document describes how you write plugins for other spam scanners than SpamAssassin. If you are searching for information regarding configuration or use of SpamAssassin or the combination of SpamAssassin and DSPAM, this text is not for you.

Although SpamAssassin does its job quite good, there are other spam scanners around which may give you a better accuracy or just use less RAM. Another disadvantage of the current SpamAssassin integration in amavisd-new is that there is only one global bayes database where you may like to use multiple profiles for different users/customers. And last but not least I think it is a good if there are multiple spam scanners around so spammer are not able to focus just on one scanner, spending time to create a mail that will slip through all spam scanners.

Since January 2006 there is a plugin interface in amavisd-new which allows you to add more scanners without fiddling around with amavisd's internals too much. This interface will be included from amavisd-new 2.4.0 on. The implementation in 2.4.0 has its limitations so you have modify the Amavis::SpamControl package in order to load a different scanner - I have created some patches but these are not (yet) released publically. There is no (easy) way to get other spam scanners into amavisd-new versions before 2.4.0 (such as 2.3.3).

Warning: I created this page while writing the DSPAM plugin. This documentation should not be considered "official" in any way. I am no amavisd-new expert by any means!

API overview

This is the API which every spam scanner module has to implement:

package Amavis::SpamControl::YourSpamScanner;
sub new { }
sub init_pre_chroot { }
sub init_pre_fork { }
sub check {
    my($self,$msginfo) = @_;
    # do your spam check here
    $msginfo->spam_level($spam_level);
    $msginfo->spam_status($spam_status);
    $msginfo->spam_report($spam_report);
    $msginfo->autolearn_status($autolearn_status);
    return ($spam_level);
}

API explanation

new is the normal blessing. Bigger initializations such as loading of modules or establishing database connections should be done in init_pre_chroot or init_pre_fork.

init_pre_chroot: Load all necessary modules here. Bigger initialization should go into init_pre_fork because this method is (probably) run with root privileges.

init_pre_fork: Initialize all modules. When this method will be called, the chroot has taken place and amavisd-new runs under the user id it will use for its normal work.

check($msginfo): This routine will do the actual spam scanning. You get a msginfo reference where the original message and metadata is being stored. The method is expected to return a spam level that means a float value which defines the "spaminess" of the message (see wiki:#spaminess?). Further information can be stored in the msginfo structure for later reference.

additional notes

work in progress

The spam scanners API is very new. Currently there is no other spam scanner plugin besides SpamAssassin (I am working on a wiki:DSPAM? plugin for amavisd-new). That means that it is likely that you will have to modify the amavisd-new sources in order to remove implicit assumptions. Please use the established refactoring methods so that no functionality is lost.

Some areas that are known to need work are:

caching of spam levels (statistical results may depend on all previous mails)
modifications of mail bodies for multiple recipients.
group multiple recipients which share the same spam profile.

forking

After the initial loading amavisd-new forks several children which do the real work. In order to save RAM you should load all modules before the forking as this will save RAM due to the copy-on-write mechanisms of your operating system - therefore SpamAssassin is only loaded once even if you are running 10 amavisd-new children.

However, please be careful to share one database connection which was opened before the fork with many children as your db module may not be thread-safe.

chroot

amavisd-new can be run chrooted for maximal security. This means that not all parts of the file system may be accessible after the chroot. You should take care that you load all modules before chrooting! Furthermore you should be aware that not all files (e.g. additional configuration files) you need may be accessible afterwards. Please take this into consideration and test with amavisd-new running chrooted (or change your documentation accordingly).

spaminess

You may wonder why all scanners are expected to return a single numerical value where especially pure statistical scanners will give you a likelihood and maybe a confidence value. The spam level came from SpamAssassin which works that way and the amavisd-new author Mark Martinec decided that an artifical three-level scale (ham, spam, definitely spam) does not give the users enough control over the different actions such as tag_level, tag2_level, quaratine_cutoff etc. Additionally amavisd-new works internally with this numerical value in many places so it would require much work to change that.

Therefore you should return a numerical value. You should think about the range your values will be in and document it. If you are using a special distribution of values, please mention that too (e.g. 10.0 means that only there is only a likelihood of 10<sup>-3</sup> that this message is ham).

In my opinion the perpetuation of the numerical scale means that spam levels from different plugins can't be combined or compared in any way. Just treat them as arbitrary values where higher numbers means 'more likely spam'. Zero is a good value for "no spam", five is the default SpamAssassin threshold.

other integration methods

I should note that not everyone is happy with integrating more spam scanners into amavisd-new (example) if other ways exist such as daemonized versions of the spam scanner. At least you should ask yourself if you really want to integrate your code into amavisd-new and if there is a better place for it in your mail setup.

Examples

Amavis::SpamControl::SpamAssassin