Why won’t DreamHost just implement SpamAssassin on it’s servers instead of stuff like Vipul’s Razor? What’s the problem with offering a much more complete anti-spam solution?
This thread is now failry old, but might give you some insight:
From what I remember, the main points are:
- It’s too resource-hungry. It’d eat up processing power like nothing else.
- Although the underlying goal is preventing spam, you are technically screening customers emails. When no system is 100% guaranteed to work correctly, this could be a problematic gray area.
I’m sure there are another hundred reasons, but I think those two are big enough show stoppers in themselves.
There is nothing stopping you from installing it locally.
Hey Wil, thanks! I’ll bookmark the thread link for a later visit! Regarding point #1, if SA is so bad in terms of being resource-intensive, then why does DH allow it to run locally? If lots of users were to install SA on their plans, won’t that turn out to be a worse resource hog than installing it once as an optional free anti-spam service like Razor?
Just a few notes about why this problem is so much more complicated than it might sound.
It’s unlikely that lots of users will start installing SA themselves - it requires a certain degree of technical sophistication that lots of users don’t have – and also, it won’t work on the mxxxxxx type accounts - only on mail users that have corresponding shell / ftp users – and I’ve already seen some resource consumption problems even with the small number of users that currently do have it setup. Having a single installation by itself doesn’t really help, although running spamc / spamd, or using some sort of inline content filter would be a little more efficient.
Setting it up globally is also a lot trickier than setting it up in an officially non-supported way… you really want to use a content-filter type mechanism rather than invoking the filter from procmail, you have to deal with support headaches when users “don’t receive” important messages (i.e., when users get false matches and don’t check their spam folder), you need some sort of mechanism to let users train the bayesian filters, some sort of interface to effect changes to user preferences… the list goes on and on. And once we set something up so that users can do it from the panel, it becomes yet another thing we have to support; support staff has to understand how it works and how to troubleshoot it, if there are any problems, it’s up to us to fix them… this is a big part of the “cost” of implementing something like this.
SpamAssassin is mostly effective because it takes a “kitchen sink” approach. It does a whole lot of dns lookups, checking headers and body against regular expressions, etc. - all of which are pretty resource intensive. Adaptive filters (like bogofilter, spamprobe, etc.) tend to be effective and consume way less resources - but are much trickier to deal with when you have a lot of users. We have almost 100 thousand users across our 4 clusters of mail machines, so we’re talking about a lot of work here, and a high liklihood of slowdowns and other problems when there’s an extra heavy mail load, or even if there’s some weird sort of mail loop.
You might consider a client-side (or server-side, if you use a console based mailer to read your email, or if you want to write a “training” script) adaptive / Bayesian filter. Mozilla and Apple’s “Mail.app” both have adaptive filters which are supposed to work pretty well.
We are working on some options for effective spam and virus filtering… this is actually one of the next projects I’ll probably be working on. If there were an easy, reasonably priced, and effective solution, we’d be in there.
Will, thank you very much for that thoughtful and very comprehensive explanation! I suppose that those sorts of considerations go for just about every service that DreamHost considers adding to it’s roster. Excellent insight. Thanks!
Incidentally, if you could list any other effective and efficient spam fighting server- or client-side scripts or programs to look into, that would be fantastic!
Thanks Will for that excellent explanation. In fact I’ve been thinking about installing SA on my domain (as per instructions posted in the forum) as a way to stop the heaps of spam I get, because I feel that my client-side tools aren’t good enough (including Mozilla, sadly).
But … if SA is so resource-intensive that it’s noticeable with even just the few users who currently run it, perhaps I shouldn’t? DH hasn’t officially asked users not to run it, but that day may come if more and more users install SA. Any comments on this?
Is there a comparable server-side alternative you’d wish people were using instead of SA?
I’ve used bogofilter, and some of my co-workers have checked out other server-side adaptive filters (like spamprobe). These are pretty effective, but require constant training to work well. The people that don’t read their mail directly on the server use some little perl scripts (I could probably post them if anyone is interested) which run from cron and train from special IMAP folders (much the way our Razor setup works). I do think that for most people, SA will be easier to use and more effective.
Now that SA has its own bayesian module built in, I’ve found it to be pretty effective, and I have switched back to using it (using both SA and bogofilter turned out to be a training nightmare). I get very little spam to my inbox, and I have very conservative settings enabled (5.5 hits required, and 8 or more to send it directly to my spam folder).
At this point, I wouldn’t worry too much about the resource consumption issue, based on the number of people who are installing it (though I have seen a lot more people who are installing it recently).
I am thinking about doing an upgrade of our crusty old SA install to one of the semi-official backports - I think this has been discussed in other threads here. Part of the problem with this is that we still have some users that are using the old 2.20 version, and there are some changes to the configuration directives and command line flags that might require some coordination when we switch over.
Believe me, though, I do understand peoples’ frustration with this whole issue - between the abuse box and other role accounts (which I can’t filter), personal email accounts, and work box, I get ridiculous amounts of spam each day, and if I didn’t have a decent spam filter I would literally go insane. In general, we’ve always tried to filter and block as conservatively as possible, but I realize that we’re a little behind the curve at this point; users are demanding relief from the piles and piles of spam they’re getting.
Of course, I would also recommend that people follow these guidelines to help reduce the amount of spam they get in the first place. These are mostly common sense / common knowledge, but I’ll go over them anyway.
- Don’t use a catchall alias if you can possibly avoid it.
- Create unique addresses when signing up for anything online, enable a catchall alias on a “throwaway” subdomain (junk.example.com or something like that unlikely to be spoofed or discovered in a dictionary attack) or use a service like sneakemail.
- Use throwaway email addresses for domain registration information (valid addresses, but ones that don’t route to your main inbox).
- Don’t click remove links in spam or request removal via email.
Good comments, thanks. I think I will eventually go ahead and install SA on my domain after all.
Regarding good rules-of-thumb, your shortlist is good. Out of frustration over spam and various hoaxes, we’ve made this set of pages: http://g-b.dk/mail
I did some [more] anti-spam solutions searching, this time by using the links at the bottom of Paul Graham’s excellent “[color=#0000CC]A Plan For Spam[/color]” page as a starting point, and came across an Open Source Bayesian Python script called [color=#0000CC]SpamBayes[/color], whose reported effectiveness, coupled with low-maintenance “training” needs, sound rather fantastic, so I’m going try running it client-side it on a Win 98 system and see what happens.
Actually, it was intended to be a client-side script, but, being Python code, can also be run server side, and there are a couple of reports on the [color=#0000CC]SpamBayes[/color] site of sysadmins who are happilly running it on their company’s mail servers. They explain how they installed it on their servers and how they implemented it for their user base. Still, since many times one person’s Spam is another person’s Ham, it is probably best left as a client-side app in a diverse shared hosting environment such as DH.
Oh, yeah, another nice thingy about it is that is has an IMAP module, so one can clasify email into folders on the server instead of having to flag it in the subject line or in the mail headers and [i]then[/i] download and process it locally through one’s email client.
Here’s another, slightly easier way that works pretty well: [color=#0000CC]URL encoded format[/color]
Another tip for throwaway addresses:
- Use DreamHost’s own http://spam.la service.