PEP: Headers
Every e-mail message contains a set of "headers". A header has a name (like "From", "To", or "Subject") and a value.
This header is named From and its value is bob@hotmail.com:
This header is named Subject and its value is Have you read any good books lately?:
Subject: Have you read any good books lately? |
PEP rules work by testing header values to see if they match values you specify. PEP can test any header value that appears in an e-mail message. There are also several values that you can test that aren't actual message headers, but they can be treated like headers anyway.
It would be impossible to list every possible e-mail header you may encounter, but here is a list of the more useful ones:
FROM This header is commonly misunderstood. Most people think that it contains the name and/or e-mail address of the person who sent the message. While this is usually the case, it doesn't have to be.
It is possible to send an e-mail message with just about anything you want in the FROM: header. In fact, some spammers will even put the recipient's e-mail address in here to confuse them.
Example:
From: spammer@hotmail.com (Joe Spammer) |
TO Just like the FROM header, this header can actually contain just about anything. If you are the only recipient of the message, then it probably contains your e-mail address. Messages to a list of people may or may not include all the addresses here though.
Examples:
SUBJECT This header contains a brief title or description of the message. This is a good place to look for certain spam key words or phrases.
Examples:
Subject: ADV: Cable Descrambler
Subject: Make Money Fast!! |
Special PEP Values
There are several special values that PEP can use that aren't actual message headers, but you can test them as if they were:
ORIGIN This value is a shortcut that is the same as typing from, message-id, reply-to, senderaddress, return-path, x-sender, ip, apparently-from.
DESTINATION This value is a shortcut that is the same as typing to,cc,bcc,envelope-to,apparently-to.
TOP This value refers to the first four kilobytes of the message body. It is not possible to test the entire message body if it is over 8K in size.
BOTTOM This value refers to the last four kilobytes of the message body. It is not possible to test the entire message body if it is over 8K in size.
PLAINTEXT If it is a MIME encoded message and it contains a text/plain or text/html section, then this value will contain the first four kilobytes of this section, stripped of any HTML. If no text section can be found then this is the same as the TOP value. This is useful for logging and paging.
SENDERADDRESS This contains the sender's address as provided to the SMTP server via the MAIL FROM: command. Also known as the "Envelope From" value. This will usually match the value in Return-Path: (minus any surrounding angle brackets) but not always.
SENDERLOCAL This contains the local part of SENDERADDRESS (the part to the left of the @ sign).
SENDERDOMAIN This contains the domain part of SENDERADDRESS (the part to the right of the @ sign).
FROMADDRESS Often the From: header contains more than just an e-mail address. It might include the sender's name, company, or other text that isn't part of the e-mail address. This header value contains only the e-mail address portion, if any.
So if the From: header contains the value "Bob Smith ", the FROMADDRESS will be "bob@aol.com".
RETURNADDRESS The Return-path: header usually contains the sender's address surrounded by angle brackets. RETURNADDRESS contains only the e-mail address portion, if any.
So if the Return-path: header contains the value "", the RETURNADDRESS will be "bob@aol.com".
REPLYADDRESS Occasionally the Reply-to: header contains more than just an e-mail address. It might include the sender's name, company, or other text that isn't part of the e-mail address. This header value contains only the e-mail address portion, if any.
So if the Reply-to: header contains the value "Bob Smith ", the REPLYADDRESS will be "bob@aol.com".
#header If you place a hash mark before a header name you'll get a numeric value that tells you how many occurances of the header there are in the message. For example, #from will usually have a value of 1 because there's normally a single From: header. #received will usually be more than one.
header(n) If you test a header value and there happens to be more than one instance of that header in the message, only the first one is tested. For example, using received will test only the first Received: header. If you want to test the second one you'd use received(2) instead.
To refer to the last instance of a header, use a hash mark instead of a number. So received(#) refers to the very last Received: header. You can also follow that with a negative number to indicate the Nth from the last header: received(#-1) would be the second to last Received: header, for example.
SCORE PEP maintains an internal numeric score that starts out at zero. You can use the SCORE action to add or subtract from this value. The idea is to score a message based on a variety of tests and then if the score is high enough, delete it.
MAILBOX This is a numeric value that indicates the current size of your mailbox in bytes, before delivering the current message.
MAILBOXNEW This is a numeric value that indicates how large your mailbox would be if the current message gets delivered to it.
% This value represents a random number from 1 to 100. It is different each time PEP handles a new message.
This value was created specifically for our tech support department where we needed to send 33% of the incoming support mail to one staff member, 33% to another person, and the rest to a third person.
Example:
forward if % < 33 to tech1@yourdomain.com
forward if % < 66 to tech2@yourdomain.com
forward if * matches * to tech3@yourdomain.com
|
IP This value contains the IP address of the last machine to handle the message prior to reaching the local server. This is often, but not always, the IP address of the remote mail server that relayed the message to our server.
HOSTNAME This value contains the host name that you get if you do a reverse lookup on the IP value above. Note that this value is only available if you've previously used the "resolve" command.
LINES This value is numeric and indicates the number of lines there are in the message.
BYTES This value is numeric and indicates how large the message is in bytes.
TOCOUNT This value is numeric and indicates how many addresses there are in the To: header.
CCCOUNT This value is numeric and indicates how many addresses there are in the Cc: header.
CHALLENGEID This is a unique value that is meant to be used exclusively in reply files that are sent via the challenge action.
PEP_ID This is a value that is guaranteed to be unique for every message ever processed by PEP.
RXn If you don't know what a regular expression is then don't worry about these values. This is an advanced topic.
RX values refer to substrings that were matched with the last regex test. RX0 refers to the entire matching string, RX1 refers to the first substring, RX2 refers to the second, and so on.
So given a Subject: line of
[Llamas] Any good llama jokes?
and the rule
reply if subject regex "\\[(.+)\\]"
RX0 would contain "[Llamas]" and RX1 would contain "Llamas".
SASCORE This used to trigger a Spam Assassin scan, but now it's just an alias for the "X-Spam-Score:" header that is now added automatically by our mail server.
BFSCORE - IN TESTING -
RAZOR Vipul's Razor is a shared catalogue of know spam. When you tet this value, it connects to the Razor database and returns either "yes" or "no" to indicate whether the message is listed.
Example:
DCCBODY, DCCFUZ1, and DCCFUZ2 - IN TESTING -
NUMPARTS If the message in question consists of one or more MIME attachments, this value will tell you how many there are.
ATTACHMENT When a message contains one or more attachments, each one that has a filename attribute will be assigned to a separate "attachment" value. If there are 5 attachments with filenames, then there will be 5 "attachment" values. You would normally test these by using a wildcard.
Example:
delete if attachment* matches "*.exe" |
USERNAME This value contains the username of the account that is currently accessing your mailrule file. Normally it will be your username, but if you've allowed others to include your mailrule file then it will be set to their username when PEP is processing mail for them. You can use it to implement different rule sets depending on who's using your mailrule file.
CALLBACK This is a special value that causes PEP to perform a "callback", and report the result as either "OK" or "BAD". A callback is when PEP connects back to the mail server(s) for the sender's email address and goes through the motions of sending a bounce message, without actually sending it. If the sender's address is phony, most servers will let us know about it.
So if "callback" is "bad", then you know the mail message in question is bogus because it comes from a non-existant address (or one that's been closed down by the ISP, etc). In the event that PEP is unable to connect to the sender's mail servers, or there is some other kind of error, the default is to assume that it's OK.
This is a very effective way to eliminate a lot of spam with no worries about false positives (since messages with invalid return addresses are invalid).
Example:
delete if callback is bad |
Modifiers
Modifiers can be used to perform a variety of function on a header value prior to testing it. Examples include converting a header to upper or lower case, stripping out punctuation marks or HTML tags, etc.
For example, to test a version of the Subject header that has had all punctuation symbols removed, use "pstrip:subject" instead of just "subject". To test a version of the message body that has all HTML tags removed, use "htstrip:top" instead of just "top".
Note: modifiers can only be applied to individual headers, they do not work with wildcards. Further note that they only modify a temporary copy of the header, they do not modify the original message in any way.
A list of the available modifiers follows:
lower This modifier converts the value to lower case letters.
upper This modifier converts the value to upper case letters.
d3l33t It is not uncommon for spammers to use "l33t sp33k" (elite speek) to try and get past filters. This involves replacing certain letters with punctuation marks that resemble the original letter. For example, they might spell "Viagra" as "\/|@gr@". The d3l33t (de-leet) modifier converts the most common punctuation marks back into the most likely letters. So "\/|@gr@" would be converted back into "viagra", which could then be caught in a general test for that keyword.
pstrip Another common tactic used by spammers is to spell words with lots of extra puncuation marks in between the letters. So "viagra" might become "V.I.A.G.R.A" instead. This modifier strips out all punctuation marks (anything that is not a letter from A to Z) and squeezes all the remaining characters together without spaces.
csdecode Another common technique to avoid filters is to encode the subject line (or other values) in an alternate character set. The value will look normal when viewed with most mail programs, but the actual value in the message is unreadable to a human. This modifier converts a value that is encoded in this manner back into plain text.
htstrip Yet another technique to bypass filters is to insert HTML "comments" into the middle of words in the body of a spam message. This modifier will strip out anything between sets of angle brackets, effectively removing any HTML code.
length This modifier replaces the value with the length of the value instead. So "length:subject" when the Subject is "Hi!" will be 3.
ulratio This modifier replaces the value with a number between 0 and 100 that is the ratio of upper to lower case letters. The more upper case letters the value contains, the higher the number.