The Anti-Spam Filter
On this page:
Introduction
Stanford's spam filtering system works by scanning incoming email for spam before it gets delivered to your email Inbox. When the system finds email that matches verifiable spam message patterns, it adds a key word to the Subject line indicating how certain it is that the message qualifies as spam, and then delivers the email as usual. This lets you decide how to deal with spam.
You can automatically filter or delete incoming mail that has been tagged as spam at the server level (before it reaches your email Inbox), by using Webmail fltering. This can eliminate the need the need for your email client to do the filtering.
If you choose not to use Webmail filtering, you can use the built-in filtering functions of your email program.See the Filtering Spam with Your Email Program page to learn how to configure your email program so it processes spam away from your Inbox.
The spam filter scans all email sent to @stanford.edu addresses from non-Stanford domains. However, email from a Stanford machine that is sent to the Stanford email gateways is not scanned. This means that outside mail sent first to a department server then forwarded to @stanford addresses will not be checked by the spam filter.
How Will This Affect My Email?
- By default, all email marked with five #
symbols is discarded before it reaches your mailbox.
Also, all mail marked
with four # symbols is filed into a folder labeled "Junk" that
can be
accessed via Webmail or an email program configured
for IMAP.
To change the settings for email marked with four # symbols and below, see Creating a Spam Filter in Webmail.
IMPORTANT: Messages that are more than 30 days old are automatically purged from your "Junk" folder. - Email that might be spam will have a [SPAM:] tag
added to its Subject
line. For example:
Before
Subject: Get What You Want
From: eDiets Motivation <motivation@EDIETS.COM>After
Subject: [SPAM:####] Get What You Want
From: eDiets Motivation <motivation@EDIETS.COM> - The number of "#" signs (pound or number signs) after the word "Spam:" indicates how sure the system is that your email qualifies as spam. To get one "#", the system must be 50% certain that it has found spam. Each "#" after that is another 5%-10% of certainty.
- Email that has been tagged as spam will carry a line of "X-SPAM" evaluation in the header. This tells you what spam patterns
were discovered by the system when evaluating your email.
Here is an example:
Subject: [SPAM:####] Get What You Want
From: eDiets Motivation <motivation@EDIETS.COM>
-Perlmx-Spam: Gauge=XXXXXXXXIIIIIII, Probability=87%, Report="BIG_FONT, CLICK_BELOW, CLICK_HERE_LINK, COPYRIGHT_CLAIMED, CTYPE_JUST_HTML, EXCUSE_6, MSG_ID_ADDED_BY_MTA_2, PORN_3, RCVD_IN_OSIRUSOFT_COM, REMOVE_PAGE, SUPER LONG_LINE"
The "Probability" number represents how certain the system is that your email constitutes spam. If you're curious, more information about how to make sense of this evaluation code can be found in the Frequently Asked Questions.
Untagged Spam and Incorrectly Tagged Messages
Spammers are continually varying their techniques to get their messages past systems like ours. Our vendor delivers us continual updates to the spam detection definitions in order to keep the tagging as effective and accurate as possible. From time to time as spammers develop new techniques you may see a temporary increase in untagged spam arriving in your Inbox. You can go ahead and just delete these messages as they come in and expect that before long the vendor will have updated the product to catch these new flavors of spam. We no longer collect samples of these messages because we found the vendor was supplying updates for them on their own.
If you get a false positive that you want to keep — email that gets marked as spam but that you want to continue receiving — just configure your email program so that it makes an exception for that particular kind of email and does not filter it out.
How Does the System Work?
As the spam filter analyzes messages, it calculates a score based on established server-side pre-configured rules. The rules are applied to a message and a score is assigned based on the number of rules that match the message. A Realtime Block List (RBL) and various other "black hole" lists are also checked by the spam filter. (These are sites that keep track of spam and the various places spam comes from.) A score is applied if a message comes from an RBL list. The message's final score is the sum of the scores from each rule that matches the message.
After the score is calculated, it is mapped against an exponential function that converts it into a percentage. The percentage score for the message is compared to spam level settings configured on the server. Based on that comparison, the message will be classified as "not spam" (spam content probability below 15%), "possible spam" (between 15% and 55%) or "definite spam" (spam content probability above 55%).
The software doing all this work is Sophos PureMessage (http://www.sophos.com).
What if I Need Help?
If you have problems with and/or questions about these anti-spam procedures send a help request to: http://helpsu.stanford.edu/


