SpamAssassin 3.1.0 installation guide

apps

#1

I finally completed the installation guide for version 3.1.0. My goal was to be complete but not verbose. I hope I succeeded. Also, if you’ve got SA installed and have any tips, I’m sure everyone would be curious to hear them. We all hate SPAM.

The link: http://www.unsaturated.com/projects.spamassassinMySQL.html

Regards,
Matt


#2

What does your MySQL usage report look like? I thought of using MySQL for the Bayes db, but I thought this would require a database connection for each message, and DH considers connections to be “expensive”.


#3

I’m surprised the usage is so low. Here’s a summary report generated by the DH Panel.

-Matt

Day Disk Usage Connects Queries (Conueries) (Ratio) 2005-09-23 MB 7 21 0.000 MCn 0.120 2005-09-24 4.410 MB 98 264138 0.267 MCn 107.811 2005-09-25 4.461 MB 49 781 0.002 MCn 0.638 2005-09-26 4.480 MB 75 7419 0.009 MCn 3.957 2005-09-27 4.566 MB 71 265056 0.267 MCn 149.327 2005-09-28 7.789 MB 36 580 0.001 MCn 0.644 Totals: 5.141 MB 336 537995 0.546 MCn 64.047


#4

Great guide, Matthew. Couple questions. Do you know how to feed the database from the flat files I’ve been using to store the bayes info for a couple years now?

Also, why do you suggest that SA not autolearn the bayes and AWL info? I’ve found that to be quite nice.


#5

Thanks. I tried your instructions for setting up SA to use MySQL and it worked perfectly. (I already had SA installed.)


#6

I don’t use AWL or autolearning for Bayes because there’s a gray area. I prefer to know exactly who is considered to be on my whitelist or blacklist. Bayes autolearning can mean your message is learned as spam or ham. If it works well for you then stay with it. Spam is a little different for everyone.

For old database files I think you need to use the --backup option when running sa-learn. That would be in your old configuration. Then use --import and the file name in the new configuration. I’ve never tried it because I keep mail on the server. With the amount of space DH provides, it’s just easier to re-scan your inbox, wait for your Trash folder to fill up, and scan that for spam. Spam is always changing so those really old tokens have probably lost some effectiveness.

-Matt


#7

Sent Matt a personal message, but figured I’d throw it out to the group. I installed SA by the excellent directions and have it working for the shell account under which it was installed – million dollar question – is it possible to use this same install for other mailboxes/shell users? Do I have to do an install for every user that I wish to configure spamassassin for?

Ultimately, I’m just trying to have a SA settings file I can edit for an entire domain or for a few indivudal users – since DreamHost stores the settings files in their DB, I can’t use their SA install…


#8

It should work in many / most cases, since your users should be on the same cluster of mail machines most of the time[1] – but it’s not totally safe. If you don’t mind that, referencing /home/USER/sausr/local/bin/spamassassin (where USER is the user spamassassin is installed under) should work. This will only work for shell / ftp users, not for mail-only users (since you have no way of editing their .procmailrcs).

If another user is on the same group of mail machines but a different user machine, they won’t be able to use “sa-learn” and other training commands.

I think with the junk filter providing a lot of the same functionality (albeit in a slightly less effective way, and with less control), DH is probably not wild about the idea of encouraging people to setup SA, except on an individual “per-account” basis. Hopefully, they’ll eventually add functionality for adaptive filtering (with the bayes-sql module) into the junk filter itself.

Side note… once the Sarge upgrades happen on the mail machines as well, or if you don’t mind not being able to train the filter, using .db files should work Ok.

[1] Unless things have changed recently, users / domains on the same account will generally be provisioned on the same machine or group of machines, however users / domains across accounts may be on totally different “clusters”.


#9

Thanks for the advice will! I was contemplating a path-based solution as I washed dishes and was thrilled to return and find confirmation and directions!

I’m actually using SA literally to produce formatted SA reports and forward them to other addresses, otherwise I would have been fine with the defaults available from DH. I was expecting to simply be able to edit the SA settings for the DH install, but since that is not possible, this solution gives me an even better options than I expected when I signed up!


#10

Thanks for posting these instructions…

I now have bayes and AWL hooked up and working again… (I just noticed that I too was having mismatched db versions between the mail server and my crontab’d sa-learn client).

'Byron


#11

I ran into some problems using the system provided spamassassin so decided to give the install of version 3.1.0 a try - nice job on that complete set of instructions posted.

My scheme with procmail pipes data through spamassassin and then through a home-brew perl filter that does some specific mail processing. The procmail entry looks like:

:0fw

  • < 524288
    | $HOME/sausr/bin/spamassassin

:0fw

  • < 1048576
    | $HOME/p4

:0
$HOME/Maildir/

The plan is that up to half a meg we use spamassassing as a filter, then up to a meg my perl script “p4” filters, and then deliver mail.

When I built spamassassin v3.1.0 I found that my procmail log file contained messages:

warn: dns: Net::DNS version is 0.19, but need 0.34 at /home/acmail/sausr/share/perl/5.8.4/Mail/SpamAssassin/Dns.pm line 589.

I only see this these complaints about the Dns.pm module until I add my whitelist to the user_prefs file. When I pull the whitelist entries (687 of them) from my old user_prefs file and insert them in the new user_prefs I see my procmail log file full of errors about problems opening the database file. I can duplicate this with the --lint option to spamassassin. Example:

$HOME/sausr/bin/spamassassin --lint
[22235] warn: bayes: cannot open bayes databases /home/acmail/.spamassassin/bayes_* R/O: tie failed:
[22235] warn: bayes: cannot open bayes databases /home/acmail/.spamassassin/bayes_* R/O: tie failed:
[22235] warn: bayes: cannot open bayes databases /home/acmail/.spamassassin/bayes_* R/O: tie failed: Bad file descriptor
[22235] warn: auto-whitelist: open of auto-whitelist file failed: auto-whitelist: cannot open auto_whitelist_path /home/acmail/.spamassassin/auto-whitelist: Inappropriate ioctl for device

Some digging with google gives a db version mismatch as a possible explantion for the perl “cannot tie” message. But these db files were created with the version 3.1.0 version of spamassassin (unless perhaps they are template files copied somewhere on first use).

My whole desire for running a local copy of spamassasin was because I was getting some errors trying to use the system version. The typical error in my procmail log file was:

Could not lock /home/acmail/Maildir/: File exists at /usr/share/perl5/Mail/SpamAssassin/NoMailAudit.pm line 381.
procmail: Program failure (70) of "spamassassin"
procmail: Rescue of unfiltered data succeeded

I found that the file “NoMailAudit.pm” did not exist under this path on my login machine. I wondered if perhaps this was a file on the system processing my incoming mail, cross mounting my home dir, but not found on my login system. I don’t understand why spamassassin would be trying to lock any files, as I expect it to be purely a filter.

Any thoughts as to the failures here?


#12

Matthew:

These instructions worked fine. The only problem I have is that when I run spamassassin manually against a message the bayes tests run fine, but when it runs via procmail on an incoming message, the bayes tests don’t seem to run at all.

Any ideas?
Dave


#13

Check the number of spam and ham messages SA has learned. If both are less than 200, then the Bayes tests won’t run automatically. I’m not sure how/why the manual test works that way. I would run in debug mode and see SA testing a message against the Bayes db, yet I had trained it with zero messages.

Run this to check your nspam and nham levels:

%/> ~/sausr/bin/sa-learn --dump magic

-Matt


#14

Should we use the “trusted network” in the local.cf file, and if so, how do I find out what value to set it to?


#15

An excellent guide! Thank you Matthew!!
I followed the instructions. Formatting suggestion: make the bits that are individual bold? (MySpamDB, spam.sa.com, spam_user, itsasecret)

Question: In step 15, when I run the “sa-learn” command, I get an error. After repeating the command with the -D switch (debug) I found out that it’s trying to log into MySQL with my shell login name rather than the DB username. I doublechecked that the config files I made really state my DB username and it appears OK. What did I miss?

[27472] dbg: bayes: using username: torbengb [27472] dbg: bayes: unable to connect to database: Can't connect to MySQL server on 'satest.subdomain.torbengbs-domain.com' (111)
/TorbenGB
(edit: added debug output)


#16

Nevermind the above post; I repeated the entire installation on a new user account where I made sure that account username = database username. Interestingly, there is still the same error: “dbg: bayes: unable to connect to database: Can’t connect to MySQL server on ‘torbengb.domain.com’ (111)” This error is given regardless what I want to do with sa-learn; whether it is --ham or --spam or --dump magic. It must be something about my DB settings.

I can provide the debug output as a text file, plus my account settings, in case you want to take a look.


#17

If you know of IP addresses that will not be sending SPAM, then you could add that to your configuration file (user_prefs or local.cf). Personally, I think it’s a lot of work to manually determine a trusted host. Sometimes an e-mail gets bounced around through relays. To make certain you receive an e-mail from a specific address, I would stick with white- and blacklisting good or bad e-mail addresses.

-Matt


#18

I can’t recall having that problem. However, I noticed the DH username is inserted into the “bayes_vars” table–not the MySQL username. Here’s my best guess: although the DH user is running a local copy of SA, the program doesn’t know that. It could be a site-wide installation, so SA associates Bayes activity with the current user account. I think this is behavior associated with SpamAssassin and not MySQL.

Torben, perhaps you could reset the MySQL password for that MySQL account and make certain the dB user has all privileges (insert, select, update, etc). This is all done through the Panel. Then make certain the MySQL username and password are the same in your local.cf file. If it turns out it was just a typo, let us know. Otherwise, this might be another Dreamhost quirk with SpamAssassin.

-Matt


#19

False-negatives: 0% (0 in 800 total hams)
False-positives: 1,15% (15 in 1300 total spams)

These numbers are what survives the DH mandatory spam filter, and the DH junk mail filter that I have set to conservative limits. I think it’s truly impressive to have only 15 false-positives with so little training.

The 0% false-negatives excludes DH’s Junk Mail Reports which are very much rated as junk since the report includes the topics of all quarantined spam. This is obviously acceptable and not included in the training. I’ll now remove the Junk Filter entirely in order to eliminate this quirk – and to get more spam for training my SA.

I’m sending a copy of all my mail to this new SA-enabled mail account and I am monitoring its performance (but still using my regular (spammy) mail account). have 4 folders for ham; spam; false-pos; false-negatives, so I can track performance. Once mail has been taught to SA, it’s are archived so as to not be taught again (but kept if I ever need to start over).

[color=#CC0000]Get $50 off[/color] with promo code “LESS”! See here what else you get!


#20

I used to use FastMail for all my e-mail, and their SpamAssassin+ClamAV system is simply excellent at filtering spam. The only false positives I ever get are from the occasional mailing list post that resembles spam, or legitimate promotional e-mail for shopping sites that also resemble spam. FastMail uses just about every test SA has, including DNS blacklists and Bayes tests.

I recently decided that, for a few more dollars a month, I could get an account with Dreamhost, and have a practically identical (from my view) e-mail system, plus a lot more. So I signed up and migrated all of my IMAP mail. Then I discovered that the DH mail servers are using an old version of SpamAssassin; no, not 3.0.3, it’s 2.20; 3.0.3 is installed on the other servers, but is not run if you execute “spamassassin” from your ~/.procmailrc file. SA 2.20 does a horrible job of catching spam compared to 3.x–at least, the way DH has it configured. So I decided to install SA on my DH account.

Well, I did, and all is well, except for a few things. With the way DH has things clustered, and the fact that the mail servers use a different version of SA, it results in this error, for one:

warn: dns: Net::DNS version is 0.19, but need 0.34 at /home/username/share/perl/5.8.4/Mail/SpamAssassin/Dns.pm line 589.

I spent quite some time installing CPAN modules to my home folder, but it seems like SA is running the version installed on the mail server, which I think is running Woody, instead of using the version from my home folder, even though I’m executing /home/username/bin/spamassassin. I’ve tried exporting…

PERLLIB=/home/username/lib/perl/5.8.4:/home/username/share/perl/5.8.4

and

PERLLIB=/home/.enoch/username/lib/perl/5.8.4:/home/.enoch/username/share/perl/5.8.4

…and tried putting those in ~/.procmailrc, but it keeps looking for those Perl modules on the mail server (I think that’s what it’s doing).

The result is that none of the tests that use DNS blacklists work, including the URIBL tests and SpamCop tests, which catch a lot of spams that would not otherwise be caught; I’m talking increasing a score from 0.5 to 17.5.

But the weird part is, if I SSH in to my account and run “~/bin/spamassassin -td ~/Maildir/…” on the same message, all of the tests work, including the DNS tests, and the e-mail is correctly scored as spam. I don’t understand it; if I run a command in a shell and it works, shouldn’t it do exactly the same thing if run by procmail from the same executable file in my account?

This is really frustrating; I’m so close, but it’s like I just can’t get the last six inches. As a result, SA is useless for me on DH; completely useless. It catches about 5% of spam, the rest is scored around 0.5-1. Maybe the Bayes tests will work once I’ve trained enough spam and ham, but the DNS tests work great, and I’d really like to use them.

Any help? DH said they plan to upgrade their mail servers to Sarge in a few months, but that’s a few months from now. :confused: