Spam Stats

Executive Summary: On average, a user on this server sees less than 10% of the email with their address in their Inbox.  The other 90% of email with their address is marked as a virus or spam and rejected or put in their Spam folder. About 90% of delivery attempts are rejected outright as a blatant spam or virus.

Classification Action Count Average SpamAssassin Score Threshold % of total scanned % of total email
Ham Accepted 1351 2.06 < 5.0 9.04 4.41
Spam Accepted 2075 10.7 < 15.0 13.9 6.77
Spam Rejected 9035 29.9 >= 15.0 60.5 29.5
Virus Rejected 2479 16.6 8.08
Total Scanned 14940 23.7 100 48.7
Bad Address Rejected 15727 105 51.3
Total 30667 205 100

Above are the stats for the mail server I run for the past week.  I use a combination of SpamAssassin, sa-exim, ClamAV, and sa-update to block most spam from even getting delivered.  This is the first time I’ve really looked at these numbers (relying, until now on an intuitive “feel” for how well spam blocking was working). Following is a description of the numbers, reading each line from the bottom up. Bad Addresses: Most delivery attempts have bad addresses.  Most of these are opportunistic spammers; spammers who just run through a dictionary of possible email addresses and attempt to send them email.  Some are viruses, too, I’m sure.  Rejecting them without scanning keeps the server load reasonable. Virus: Once a delivery attempt has given us a valid address and the body of the message, sa-exim jumps into action and scans the message.  First it scans for viruses.  ClamAV marked about 20% of the email it scanned as a virus.  We reject these at delivery time in order to (hopefully) minimize delivery failure notifications.  Usually, you are talking directly to the virus (since ISPs scan outgoing email for viruses) so a rejection at delivery time means no annoying NDN messages for mail you didn’t send. Rejected Spam: Next, sa-exim hands the email to SpamAssassin (SA).  Once SA has scanned the message and returned a score, sa-exim makes a delivery decision.  If the SA score is above 15, the mail is rejected.  Again, rejecting spam at delivery time results in fewer bogus NDN messages since spam often uses a forged Envelope Address. By this time, we’ve rejected almost 90% of delivery attempts because of bad addresses, viruses, or really obvious spam attempts. Spam in the Spam folder: Of the remaining 10%, SA has marked 60% of it as spam and it is put in the user’s spam folder. Ham: The remaining email (~40% of accepted email, ~4% of all email) is classified as non-spam (or Ham) email. This is put in the user’s Inbox. This is fairly effective, but for people like myself who have had the same email account for the past 7 years and have posted it everywhere, that still isn’t enough. The short answer is another layer of email filtering client side. I’ll try to write more on what I’ve done there, later.

