Regular Expressions

Mailman gives you a powerful tool to block and accept mail, based on content in email addresses or in any header in the email. These are often called “filters” in Mailman, because they filter incoming mail.

Almost all of these filters use “regular expressions”, or, as they’re often termed on the Mailman configuration pages, “regexps”.

Regular expressions allow you to target specific kinds of messages without affecting everything else. But they are also easy to make mistakes in, and target too much mail or too little.

Whenever adding new filters, if you have the option to “discard” or “hold”, choose hold. Once you are sure you have the regular expression correct, then you can switch to discard if desired.

Spam Filter Rules

One of the most common uses of regular expressions is to filter out and discard some kinds of messages, often spam or abusive messages. In Privacy options...:Spam Filters you can add Spam Filter Rules to discard or hold incoming messages.

These rules apply to every header in the email. Where other filters apply just to email addresses, spam filter rules can match any text anywhere in the header, so you’ll have to be careful.

Match a specific header

You will almost always want to match against a specific header, often the Subject line. Begin your regular expression with a caret and then the header name to limit your match to that header. For example:

^Subject: subject to block

The caret matches the beginning of a line, and then “Subject: ” matches that header. So this regular expression will only match lines that begin with “Subject: ”. In email headers, that means the Subject of the message.

Escape special characters

Regular expressions are filled with special characters that mean something other than what they are. Subject lines, for example, often have square brackets in them, but square brackets have a special meaning in regular expressions. They mean any character appearing between the brackets. So if you try to block subjects that begin with “[SPAM]" using this:

^Subject: [SPAM]

What you’re really doing is blocking any subject that begins with S, P, A, or M. (It won’t even block the [SPAM] subjects, since those begin with square brackets.)

To avoid the special meaning for the square brackets or any other special character, put a backslash in front of them:

^Subject: \[SPAM\]

Special characters you’re likely to run into are the period (.), asterisks (*), plus symbols (+), and question marks (?), as well as parentheses and either of the square brackets. All of these need to be escaped with a backslash.

Optional matches

Often, especially when dealing with subjects, there will be slight variations on the text you’re trying to match. For example, subjects sometimes begin with “Re: ” and sometimes don’t.

You can match optional text by surrounding that text with parentheses and putting a question mark after the parentheses. Parentheses mean treat this text as one item and the question mark means the previous item is optional.

^Subject: (Re: )?\[SPAM\]

This will match subjects that begin with “Re: [SPAM]” as well as subjects that begin with “[SPAM]”.

Any character

In regular expressions, the period means “any character”. Wherever a period occurs in your regular expression, any character will match.

^Subject: \[SPA.\]

This will match subjects that begin with “[SPA”, then have any one character, then have a closing bracket. [SPAM], [SPAR], and [SPAT] will all match.

Any of a series of known characters

Square brackets will match if any one of the characters between the brackets match. For example, if the only things we want to match are SPAM and SPAT, we could use:

^Subject: \[SPA[MT]\]

You can also use “A-Z” to match any letter, and “0-9” to match any digit. You can also use subsets of those, such as“A-D” or “1-2”. Because dashes have special meanings inside of square brackets (they mean a range of characters), if you want to actually match a dash inside square brackets, put the dash at the end of the list of valid characters: “[a-e3-4-]” will match either a, b, c, d, e, 3, 4, or a dash.

Any number of characters

More useful is to pair the period with the asterisk or plus symbol. The asterisk means any number of the previous item, including zero, and the plus symbol means any number of the previous item, but at least one of them.

If you are receiving messages that contain more than one “Re:” at the beginning of the subject, for example, the question mark won’t do: it will match a single “Re: ” if it exists, but not two in a row or three in a row.

^Subject: (Re: )*\[SPAM\]

That will match any number of the text “Re: ” preceding the “[SPAM]” text.

But you can also use the asterisk with the period to match any number of any character.

^Subject: .*\[SPAM\]

This will match any subject that contains “[SPAM]” anywhere within the subject. Because the period will match any character in front of the text “[SPAM]”, and the asterisk will match zero or more occurrences of that.