General-Purpose Mail Filter

4 Mail Filtering Language

The mail filtering language, or MFL, is a special
language designed for writing filter scripts. It has a simple syntax,
similar to that of Bourne shell. In contrast to the most existing
programming languages, MFL does not have any special
terminating or separating characters (like, e.g. newlines and
semicolons in shell)9. All syntactical entities
are separated by any amount of white-space characters (i.e. spaces,
tabulations or newlines).

The following sections describe MFL syntax in detail.

4.1 Comments

Two types of comments are allowed: C-style, enclosed between
‘/*’ and ‘*/’, and shell-style, starting with ‘#’
character and extending up to the end of line:

/* This is
a comment. */
# And this too.

There are, however, several special cases, where the characters
following ‘#’ are not ignored.

If the first line begins with ‘#!/’ or ‘#! /’, this is
treated as a start of a multi-line comment, which is closed by the
characters ‘!#’ on a line by themselves. This feature allows for
writing sophisticated scripts. See top-block, for a detailed
description.

If ‘#’ is followed by word ‘include’ (with
optional whitespace between them), this statement requires inclusion
of the specified file, as in C. There are two forms of the
‘#include’ statement:

#include <file>

#include "file"

The quotes around file in the second form quotes are optional.

Both forms are equivalent if file is an absolute file name.
Otherwise, the first form will look for file in the include
search path. The second one will look for it in the current working
directory first, and, if not found there, in the include search
path.

The default include search path is:

prefix/share/mailfromd/8.7/include

prefix/share/mailfromd/include

/usr/share/mailfromd/include

/usr/local/share/mailfromd/include

Where prefix is the installation prefix.

New directories can be appended in front of it using -I
(--include) command line option, or include-path
configuration statement (see include-path).

For example, invoking

$ mailfromd -I/var/mailfromd -I/com/mailfromd

creates the following include search path

/var/mailfromd

/com/mailfromd

prefix/share/mailfromd/8.7/include

prefix/share/mailfromd/include

/usr/share/mailfromd/include

/usr/local/share/mailfromd/include

Along with #include, there is also a special form
#include_once, that has the same syntax:

#include_once <file>
#include_once "file"

This form works exactly as #include, except that, if the
file has already been included, it will not be included
again. As the name suggests, it will be included only once.

This form should be used to prevent re-inclusions of a code, which
can cause problems due to function redefinitions, variable
reassignments etc.

A line in the form

#line number "identifier"

causes the MFL compiler to believe, for purposes of error
diagnostics, that the line number of the next source line is given by
number and the current input file is named by identifier.
If the identifier is absent, the remembered file name does not change.

4.2 Pragmatic comments

If ‘#’ is immediately followed by word ‘pragma’ (with
optional whitespace between them), such a construct introduces a
pragmatic comment, i.e. an instruction that controls some
configuration setting.

The available pragma types are described in the following subsections.

4.2.1 Pragma prereq

The #pragma prereq statement ensures that the correct
mailfromd version is used to compile the source file it
appears in. It takes version number as its arguments and produces
a compilation error if the actual mailfromd version number
is earlier than that. For example, the following statement:

#pragma prereq 7.0.94

results in error if compiled with mailfromd version 7.0.93
or prior.

4.2.2 Pragma stacksize

The stacksize pragma sets the initial size of the run-time
stack and may also define the policy of its growing, in case it
becomes full. The default stack size is 4096 words. You may
need to increase this number if your configuration program uses
recursive functions or does an excessive amount of string manipulations.

File suffixes are case-insensitive, so the following two pragmas are
equivalent and set the stack size to 7*1048576 = 7340032 words:

#pragma stacksize 7m
#pragma stacksize 7M

When the MFL engine notices that there is no more stack space
available, it attempts to expand the stack. If this attempt succeeds, the
operation continues. Otherwise, a runtime error is reported and the
execution of the filter stops.

The optional incr argument to #pragma stacksize defines growth
policy for the stack. Two growth policies are implemented:
fixed increment policy, which expands stack in a fixed
number of expansion chunks, and exponential growth policy, which
duplicates the stack size until it is able to accommodate the needed
number of words. The fixed increment policy is the default. The default
chunk size is 4096 words.

If incr is the word ‘twice’, the duplicate policy is
selected. Otherwise incr must be a positive number optionally
suffixed with a size suffix (see above). This indicates the expansion
chunk size for the fixed increment policy.

The following example sets initial stack size to 10240, and
expansion chunk size to 2048 words:

#pragma stacksize 10M 2K

The pragma below enables exponential stack growth policy:

#pragma stacksize 10240 twice

In this case, when the run-time evaluator hits the stack size limit,
it expands the stack to twice the size it had before. So, in the
example above, the stack will be sequentially expanded to the
following sizes: 20480, 40960, 81920, 163840, etc.

The optional max argument defines the maximum size of
the stack. If stack grows beyond this limit, the execution of the
script will be aborted.

If you are concerned about the execution time of your script, you
may wish to avoid stack reallocations. To help you find out the
optimal stack size, each time the stack is expanded,
mailfromd issues a warning in its log file, which looks like
this:

warning: stack segment expanded, new size=8192

You can use these messages to adjust your stack size configuration
settings.

4.2.3 Pragma regex

The ‘#pragma regex’, controls
compilation of expressions. You can use any number
of such pragma directives in your mailfromd.mf. The scope of
‘#pragma regex’ extends to the next occurrence of this directive
or to the end of the script file, whichever occurs first.

pragma: regex[push|pop] flags

The optional push|pop parameter is one of the words ‘push’ or
‘pop’ and is discussed in detail below. The flags
parameter is a whitespace-separated list of regex flags. Each
regex-flag is a word specifying some regex feature. It can be
preceded by ‘+’ to enable this feature (this is the default), by
‘-’ to disable it or by ‘=’ to reset regex flags to its
value. Valid regex-flags are:

See greylisting types, for a detailed description of these
greylisting implementations.

Notice, that this pragma can be used only once. A second use of this
pragma would constitute an error, because you cannot use both greylisting
implementations in the same program.

4.2.6 Pragma miltermacros

pragma: miltermacroshandler macro …

Declare that the Milter stage handler uses MTA macro
listed as the rest of arguments. The handler must be a valid
handler name (see Handlers).

The mailfromd parser collects the names of the macros
referred to by a ‘$name’ construct within a handler
(see Sendmail Macros) and declares them automatically for
corresponding handlers. It is, however, unable to track macros
used in functions called from handler as well as those referred to
via getmacro and macro_defined functions. Such
macros should be declared using ‘#pragma miltermacros’.

During initial negotiation with the MTA,
mailfromd will ask it to export the macro names declared
automatically or by using the ‘#pragma miltermacros’. The
MTA is free to honor or to ignore this request. In
particular, Sendmail versions prior to 8.14.0 and Postfix versions
prior to 2.5 do not support this feature. If you use one of these,
you will need to export the needed macros explicitly in the
MTA configuration. For more details, refer to the section
in MTA Configuration corresponding to your MTA type.

4.2.7 Pragma provide-callout

The #pragma provide-callout statement is used in the
callout module to inform mailfromd that the
module has been loaded.

Do not use this pragma.

4.3 Data Types

The mailfromd filter script language operates on entities
of two types: numeric and string.

The numeric type is represented internally as a signed long
integer. Depending on the machine architecture, its size can vary.
For example, on machines with Intel-based CPUs it is 32 bits long.

A string is a string of characters of arbitrary length.
Strings can contain any characters except ASCIINUL.

There is also a generic pointer, which is designed to
facilitate certain operations. It appears only in body
handler. See body handler, for more information about it.

4.4 Numbers

A decimal number is any sequence of decimal digits, not
beginning with ‘0’.

An octal number is ‘0’ followed by any number of octal
digits (‘0’ through ‘7’), for example: 0340.

A hex number is ‘0x’ or ‘0X’ followed by any number
of hex digits (‘0’ through ‘9’ and ‘a’ through ‘f’
or ‘A’ through ‘F’), for example: 0x3ef1.

4.5 Literals

A literal is any sequence of characters enclosed in single or
double quotes.

After tempfail and reject actions two special kinds of
literals are recognized: three-digit numeric values represent
RFC 2821 reply codes, and literals consisting of tree digit
groups separated by dots represent an extended reply code as per
RFC 1893/2034. For example:

Double-quoted strings

Backslash interpretation is performed at compilation time. It
consists in replacing the following escape sequences with the
corresponding single characters:

Sequence

Replaced with

\a

Audible bell character (ASCII 7)

\b

Backspace character (ASCII 8)

\f

Form-feed character (ASCII 12)

\n

Newline character (ASCII 10)

\r

Carriage return character (ASCII 13)

\t

Horizontal tabulation character (ASCII 9)

\v

Vertical tabulation character (ASCII 11)

Table 4.2: Backslash escapes

In addition, the sequence ‘\newline’ has the same
effect as ‘\n’, for example:

"a string with\
embedded newline"
"a string with\n embedded newline"

Any escape sequence of the form ‘\xhh’, where h
denotes any hex digit is replaced with the character whose ASCII value
is hh. For example:

"\x61nother" ⇒ "another"

Similarly, an escape sequence of the form ‘\0ooo’, where
o is an octal digit, is replaced with the character whose ASCII
value is ooo.

Macro expansion and variable interpretation occur at run-time. During
these phases all Sendmail macros (see Sendmail Macros),
mailfromd variables (see Variables), and constants
(see Constants) referenced in the string are replaced by their
actual values. For example, if the Sendmail macro f has the
value ‘postmaster@gnu.org.ua’ and the variable last_ip
has the value ‘127.0.0.1’, then the
string10

"$f last connected from %last_ip;"

will be expanded to

"postmaster@gnu.org.ua last connected from 127.0.0.1;"

A back reference is a sequence ‘\d’, where d
is a decimal number. It refers to the dth parenthesized
subexpression in the last matches statement11. Any back reference occurring within a
double-quoted string is replaced by the value of the corresponding
subexpression. See Special comparisons, for a detailed
description of this process. Back reference interpretation is
performed at run time.

Single-quoted strings

Any characters enclosed in single quotation marks are read unmodified.

The following examples contain pairs of equivalent strings:

"a string"
'a string'
"\\(.*\\):"
'\(.*\):'

Notice the last example. Single quotes are particularly useful in writing
regular expressions (see Special comparisons).

4.6 Here Documents

Here-document is a special form of a string literal is, allowing to
specify multiline strings without having to use backslash
escapes. The format of here-documents is:

<<[flags]word
…
word

The <<word construct instructs the parser to read all
the following lines up to the line containing only word, with
possible trailing blanks. The lines thus read are concatenated
together into a single string. For example:

set str <<EOT
A multiline
string
EOT

The body of a here-document is interpreted the same way as
double-quoted strings (see Double-quoted strings). For example,
if Sendmail macro f has the value jsmith@some.com and
the variable count is set to 10, then the following string:

set s <<EOT
<$f> has tried to send %count mails.
Please see docs for more info.
EOT

will be expanded to:

<jsmith@some.com> has tried to send 10 mails.
Please see docs for more info.

If the word is quoted, either by enclosing it in single quote
characters or by prepending it with a backslash, all interpretations
and expansions within the document body are suppressed. For
example:

set s <<'EOT'
The following line is read verbatim:
<$f> has tried to send %count mails.
Please see docs for more info.
EOT

Optional flags in the here-document construct control the way
leading white space is handled. If flags is - (a dash),
then all leading tab characters are stripped from input lines and the
line containing word. Furthermore, if - is followed by a
single space, all leading whitespace is stripped from them. This
allows here-documents within configuration scripts to be indented in a
natural fashion. Examples:

<<- TEXT
<$f> has tried to send %count mails.
Please see docs for more info.
TEXT

4.7 Sendmail Macros

Sendmail macros are referenced exactly the same way they are in
sendmail.cf configuration file, i.e. ‘$name’,
where name represents the macro name. Notice, that the notation
is the same for both single-character and multi-character macro names.
For consistency with the Sendmail configuration the
‘${name}’ notation is also accepted.

Another way to reference Sendmail macros is by using function
getmacro (see Macro access).

Sendmail macros evaluate to string values.

Notice, that to reference a macro, you must properly export it in
your MTA configuration. Attempt to reference a not exported
macro will result in raising a e_macroundef exception at the run time
(see uncaught exceptions).

4.8 Constants

A constant is a symbolic name for an MFL value.
Constants are defined using const statement:

[qualifier] const nameexpr

where name is an identifier, and expr is any valid
MFL expression evaluating immediately to a constant literal
or numeric value. Optional qualifier defines the scope of
visibility for that constant (see scope of visibility): either
public or static.

After defining, any appearance of name in the program text is
replaced by its value. For example:

const x 10/5
const text "X is "

defines the numeric constant ‘x’ with the value ‘5’, and the
literal constant ‘text’ with the value ‘X is ’.

Constants can also be used in literals. To expand a constant within
a literal string, prepend a percent sign to its name, e.g.:

echo "New %text %x" ⇒ "New X is 2"

This way of expanding constants creates an ambiguity if there happen
to be a variable of the same name as the constant.
See variable--constant clashes, for more information of this case
and ways to handle it.

4.8.1 Built-in constants

Several constants are built into the MFL
compiler. To discern them from user-defined ones,
their names start and end with two underscores (‘__’).

The following constants are defined in mailfromd version
8.7:

Built-in constant: string__file__

Expands to the name of the current source file.

Built-in constant: string__function__

Expands to the name of the current lexical context,
i.e. the function or handler name.

Built-in constant: string__git__

This built-in constant is defined for alpha versions only.
Its value is the Git tag of the recent commit corresponding to that
version of the package. If the release contains some uncommitted
changes, the value of the ‘__git__’ constant ends with
the suffix ‘-dirty’.

Built-in constant: number__line__

Expands to the current line number in the input source file.

Built-in constant: number__major__

Expands to the major version number.

The following example uses __major__ constant to determine
if some version-dependent feature can be used:

Expands to the current external preprocessor command line, if the
preprocessor is used, or to an empty string if it is not. Notice,
that it equals __defpreproc__, unless the preprocessor was
redefined using --preprocessor command line option
(see –preprocessor).

Built-in constant: string__version__

Expands to the textual representation of the program version
(e.g. ‘3.0.90’)

Expands to the current value of the program state directory
(see statedir). Notice, that it is the same as
__defstatedir__ unless the state directory was redefined at run
time.

Built-in constants can be used as variables, this allows to expand them
within strings or here-documents. The following example illustrates
the common practice used for debugging configuration scripts:

If the function foo were called in line 28 of the
script file /etc/mailfromd.mf, like this:
foo(10), you will see the following string in your logs:

/etc/mailfromd.mf:28: foo called with arg 10

4.9 Variables

Variables represent regions of memory used to hold variable data.
These memory regions are identified by variable names. A
variable name must begin with a letter or underscore and must consist
of letters, digits and underscores.

Each variable is associated with its scope of visibility,
which defines the part of source code where it can be used
(see scope of visibility). Depending on the scope, we discern
three main classes of variables: public, static and automatic (or local).

Public variables have indefinite lexical scope, so they may
be referred to anywhere in the program. Static are variables
visible only within their module (see Modules). Automatic
or local variables are visible only within the given function or
handler.

Public and static variables are sometimes collectively called
global.

These variable classes occupy separate namespaces, so that an
automatic variable can have the same name as an existing public or
static one. In this case this variable is said to shadow its global
counterpart. All references to such a name will refer to the automatic
variable until the end of its scope is reached, where the global one
becomes visible again.

Likewise, a static variable may have the same name as a static
variable defined in another module. However, it may not have the
same name as a public variable.

A variable is declared using the following syntax:

[qualifiers] typename

where name is the variable name, type is the type of
the data it is supposed to hold. It is ‘string’ for string
variables and ‘number’ for numeric ones.

For example, this is a declaration of a string variable ‘var’:

string var

Optional qualifiers are allowed only in global declarations, i.e.
in the variable declarations that appear outside of functions. They
specify the scope of the variable. The public qualifier
declares the variable as public and the static qualifier
declares it as static. The default scope is ‘public’,
unless specified otherwise in the module declaration (see module structure).

Additionally, qualifiers may contain the word precious,
which instructs the compiler to mark this variable as precious.
(see precious variables). The value of the precious variable
is not affected by the SMTP ‘RSET’ command. If both
scope qualifier and precious are used, they may appear in any
order, e.g.:

static precious string rcpt_list

or

precious static string rcpt_list

The declaration can be followed by any valid MFL
expression, which supplies the initial value for the
variable, for example:

string var "test"

If a variable declaration occurs within a function
(see User-defined) or handler (see Handlers), it
declares an automatic variable, local to this function or handler.
Otherwise, it declares a global variable.

A variable is assigned a value using set statement:

set nameexpr

where name is the variable name and expr is a
mailfromd expression (see Expressions). The effect of
this statement is that the expr is evaluated and the value it
yields is assigned to the variable name.

If the set statement is located outside a function or handler
definition, the expr must be a constant expression, i.e. the
compiler should be able to evaluate it immediately. See optimizer.

It is not an error to assign a value to a variable that is not
declared. In this case the assignment first declares a global or automatic
variable having the type of expr and then assigns a value to it.
Automatic variable is created if the assignment occurs within a
function or handler, global variable is declared if it occurs at
topmost lexical level. This is called implicit variable
declaration.

Variables are referenced using the notation ‘%name’. The
variable being referenced must have been declared earlier (either
explicitly or implicitly).

4.9.1 Predefined Variables

Several variables are predefined. In mailfromd version
8.7 these are:

Variable: Predefined Variablenumbercache_used

This variable is set by stdpoll and strictpoll built-ins
(and, consequently, by the on poll statement). Its value is
‘1’ if the function used the cached data instead of directly
polling the host, and ‘0’ if the polling took place.
See SMTP Callout functions.

You can use this variable to make your reject message more informative
for the remote party. The common paradigm is to define a function,
returning empty string if the result was obtained from polling, or
some notice if cached data were used, and to use the function in the
reject text, for example:

Name of virus identified by ClamAV. Set by clamav
function (see ClamAV).

Predefined Variable: numbergreylist_seconds_left

Number of seconds left to the end of greylisting period. Set by
greylist and is_greylisted functions (see Special test functions).

Predefined Variable: stringehlo_domain

Name of the domain used by polling functions in SMTPEHLO or HELO command. Default value is the fully
qualified domain name of the host where mailfromd is run.
See Polling.

Variable: Predefined Variablestringlast_poll_greeting

Callout functions (see SMTP Callout functions) set this variable before
returning. It contains the initial SMTP reply from the last polled
host.

Variable: Predefined Variablestringlast_poll_helo

Callout functions (see SMTP Callout functions) set this variable before
returning. It contains the reply to the HELO (EHLO)
command, received from the last polled host.

Variable: Predefined Variablestringlast_poll_host

Callout functions (see SMTP Callout functions) set this variable before
returning. It contains the host name or IP address of the
last polled host.

Variable: Predefined Variablestringlast_poll_recv

Callout functions (see SMTP Callout functions) set this variable before
returning. It contains the last SMTP reply received from
the remote host. In case of multi-line replies, only the first line is
stored. If nothing was received the variable contains the string
‘nothing’.

Variable: Predefined Variablestringlast_poll_sent

Callout functions (see SMTP Callout functions) set this variable before
returning. It contains the last SMTP command sent to the
polled host. If nothing was sent, last_poll_sent contains the string
‘nothing’.

Predefined Variable: stringmailfrom_address

Email address used by polling functions in SMTPMAIL
FROM command (see Polling.). Default is ‘<>’. Here is an
example of how to change it:

set mailfrom_address "postmaster@my.domain.com"

You can set this value to a comma-separated list of email addresses,
in which case the probing will try each address until either the
remote party accepts it or the list of addresses is exhausted,
whichever happens first.

It is not necessary to enclose emails in angle brackets, as they
will be added automatically where appropriate. The only exception is
null return address, when used in a list of addresses. In this case,
it should always be written as ‘<>’. For example:

This variable controls the verbosity of the exception-safe database
functions. See safedb_verbose.

4.10 Back references

A back reference is a sequence ‘\d’, where d
is a decimal number. It refers to the dth parenthesized
subexpression in the last matches statement12. Any back reference occurring within a
double-quoted string is replaced with the value of the corresponding
subexpression. For example:

if $f matches '.*@\(.*\)\.gnu\.org\.ua'
set host \1
fi

If the value of f macro is ‘smith@unza.gnu.org.ua’, the
above code will assign the string ‘unza’ to the variable
host.

Notice, that each occurrence of matches will reset the table
of back references, so try to use them as early as possible. The
following example illustrates a common error, when the back
reference is used after the reference table has been reused by another
matching:

4.11 Handlers

Milter stage handler (or handler, for short) is a
subroutine responsible for processing a particular milter state.
There are eight handlers available. Their order of invocation and
arguments are described in Figure 3.1.

A handler is defined using the following construct:

prog handler-name
do
handler-body
done

where handler-name is the name of the handler (see handler names), handler-body is the list of filter statements composing
the handler body. Some handlers take arguments, which can be accessed
within the handler-body using the notation $n,
where n is the ordinal number of the argument. Here we describe
the available handlers and their arguments:

Handler: connect(string $1, number $2, number $3, string $4)

Invocation:

This handler is called once at the beginning of each SMTP connection.

Arguments:

string;
The host name of the message sender, as reported by MTA. Usually it
is determined by a reverse lookup on the host address. If the reverse
lookup fails, ‘$1’ will contain the message sender’s IP address
enclosed in square brackets (e.g. ‘[127.0.0.1]’).

number;
Socket address family. You need to require the ‘status’ module
to get symbolic definitions for the address families. Supported
families are:

Constant

Value

Meaning

FAMILY_STDIO

0

Standard input/output (the MTA is
run with -bs option)

FAMILY_UNIX

1

UNIX socket

FAMILY_INET

2

IPv4 protocol

FAMILY_INET6

3

IPv6 protocol

Table 4.3: Supported socket families

number;
Port number if ‘$2’ is ‘FAMILY_INET’.

string;
Remote IP address if ‘$2’ is ‘FAMILY_INET’ or full file name
of the socket if ‘$2’ is ‘FAMILY_UNIX’. If ‘$2’ is
‘FAMILY_STDIO’, ‘$4’ is an empty string.

The actions (see Actions) appearing in this handler
are handled by Sendmail in a special way. First of all, any textual
message is ignored. Secondly, the only action that immediately closes
the connection is tempfail 421. Any other reply codes result in
Sendmail switching to nullserver mode, where it accepts any
commands, but answers with a failure to any of them, except for the
following: QUIT, HELO, NOOP, which are processed
as usual.

The following table summarizes the Sendmail behavior depending on
the action used:

tempfail 421 excodemessage

The caller is returned the following error message:

421 4.7.0 hostname closing connection

Both excode and message are ignored.

tempfail 4xxexcodemessage

(where xx represents any digits, except ‘21’)
Both excode and message are ignored. Sendmail switches
to nullserver mode. Any subsequent command, excepting the ones listed above,
is answered with

An SMTP server must not intentionally close the connection except:
[…]
- After detecting the need to shut down the SMTP service and
returning a 421 response code. This response code can be issued
after the server receives any command or, if necessary,
asynchronously from command receipt (on the assumption that the
client will receive it after the next command is issued).

However, the RFC says nothing about textual messages and
extended error codes, therefore Sendmail’s ignoring of these is,
in my opinion, absurd. My practice shows that it is often reasonable,
and even necessary, to return a meaningful textual message if the
initial connection is declined. The opinion of mailfromd
users seems to support this view. Bearing this in mind,
mailfromd is shipped with a patch for Sendmail,
which makes it honor both extended return code and textual message given
with the action. Two versions are provided:
etc/sendmail-8.13.7.connect.diff, for
Sendmail versions 8.13.x, and
etc/sendmail-8.14.3.connect.diff, for Sendmail versions 8.14.3.

Handler: helo(string $1)

Invocation:

This handler is called whenever the SMTP client sends HELO or
EHLO command. Depending on the actual MTA configuration, it
can be called several times or even not at all.

Arguments:

string; Argument to HELO (EHLO) commands.

Notes:

According to RFC 28221, $1 must be domain name of the
sending host, or, in case this is not available, its IP address
enclosed in square brackets. Be careful when taking decisions based
on this value, because in practice many hosts send arbitrary strings.
We recommend to use heloarg_test function
(see heloarg_test) if you wish to analyze this value.

Handler: envfrom(string $1, string $2)

Invocation:

Called when the SMTP client sends MAIL FROM command, i.e. once
at the beginning of each message.

Arguments:

string; First argument to the MAIL FROM command,
i.e. the email address of the sender.

string; Rest of arguments to MAIL FROM separated
by space character. This argument can be ‘""’.

Notes

$1 is not the same as $f Sendmail variable, because
the latter contains the sender email after address rewriting and
normalization, while $1 contains exactly the value given by
sending party.

When the array type is implemented, $2 will contain
an array of arguments.

Handler: envrcpt(string $1, string $2)

Invocation:

Called once for each RCPT TO command, i.e. once for each
recipient, immediately after envfrom.

Arguments:

string; First argument to the RCPT TO command,
i.e. the email address of the recipient.

string; Rest of arguments to RCPT TO separated
by space character. This argument can be ‘""’.

Notes:

When the array type is implemented, $2 will contain
an array of arguments.

Handler: data()

Invocation:

Called after the MTA receives SMTP ‘DATA’
command. Notice that this handler is not supported by Sendmail
versions prior to 8.14.0 and Postfix versions prior to 2.5.

Arguments:

None

Handler: header(string $1, string $2)

Invocation:

Called once for each header line received after SMTPDATA command.

Arguments:

string; Header field name.

string; Header field value. The content of the header may
include folded white space, i.e., multiple lines with following white
space where lines are separated by LF (ASCII 10). The
trailing line terminator (CR/LF) is removed.

Handler: eoh

Invocation:

This handler is called once per message, after all headers have been
sent and processed.

Arguments:

None.

Handler: body(pointer $1, number $2)

Invocation:

This header is called zero or more times, for each piece of the
message body obtained from the remote host.

Arguments:

pointer; Piece of body text. See ‘Notes’ below.

number; Length of data pointed to by $1, in bytes.

Notes:

The first argument points to the body chunk. Its size may be quite
considerable and passing it as a string may be costly both in terms of
memory and execution time. For this reason it is not passed as a
string, but rather as a generic pointer, i.e. an object having
the same size as number, which can be used to retrieve the
actual contents of the body chunk if the need arises.

A special function body_string is provided to convert this
object to a regular MFL string (see Mail body functions). Using it you can collect the entire body text into a
single global variable, as illustrated by the following example:

string text
prog body
do
set text text . body_string($1,$2)
done

The text collected this way can then be used in the eom handler
(see below) to parse and analyze it.

If you wish to analyze both the headers and mail body, the following
code fragment will do that for you:

For your reference, the following table shows each handler with its arguments:

Handler

$1

$2

$3

$4

connect

Hostname

Socket Family

Port

Remote address

helo

HELO domain

N/A

N/A

N/A

envfrom

Sender email address

Rest of arguments

N/A

N/A

envrcpt

Recipient email address

Rest of arguments

N/A

N/A

header

Header name

Header value

N/A

N/A

eoh

N/A

N/A

N/A

N/A

body

Body segment (pointer)

Length of the segment
(numeric)

N/A

N/A

eom

N/A

N/A

N/A

N/A

Table 4.4: State Handler Arguments

4.12 The ‘begin’ and ‘end’ special handlers

Apart from the milter handlers described in the previous section, MFL
defines two special handlers, called ‘begin’ and ‘end’,
which supply startup and cleanup instructions for the filter program.

The ‘begin’ special handler is executed once for each
SMTP session, after the connection has been established but
before the first milter handler has been called. Similarly, the
‘end’ handler is executed exactly once, after the connection has
been closed. Neither of them takes any arguments.

The two handlers are defined using the following syntax:

# Begin handler
begin
do
…
done
# End handler
end
do
…
done

where ‘…’ represent any MFL statements.

An MFL program may have multiple ‘begin’ and
‘end’ definitions. They can be intermixed with other
definitions. The compiler combines all ‘begin’
statements into a single one, in the order they appear in the
sources. Similarly, all ‘end’ blocks are concatenated together.
The resulting ‘begin’ is called once, at the beginning of each
SMTP session, and ‘end’ is called once at its
termination.

Multiple ‘begin’ and ‘end’ handlers are a useful feature
for writing modules (see Modules), because each module can thus
have its own initialization and cleanup blocks. Notice, however, that
in this case the order in which subsequent ‘begin’ and ‘end’
blocks are executed is not defined. It is only warranted that all
‘begin’ blocks are executed at startup and all ‘end’ blocks
are executed at shutdown. It is also warranted that all ‘begin’
and ‘end’ blocks defined within a compilation unit (i.e. a single
abstract source file, with all #include and
#include_once statements expanded in place) are executed in
order of their appearance in the unit.

Due to their special nature, the startup and cleanup blocks impose
certain restrictions on the statements that can be used within them:

return cannot be used in ‘begin’ and ‘end’
handlers.

The following Sendmail actions cannot be used in them:
accept, continue, discard, reject,
tempfail. They can, however, be used in catch
statements, declared in ‘begin’ blocks (see example below).

The ‘begin’ handlers are the usual place to put global
initialization code to. For example, if you do not want to use
DNS caching, you can do it this way:

begin
do
db_set_active("dns", 0)
done

Additionally, you can set up global exception handling routines
there. For example, the following ‘begin’ statement disables
DNS cache and, for all exceptions not handled otherwise,
installs a handler that logs the exception along with the stack trace
and continues processing the message:

4.13 Functions

A function is a named mailfromd subroutine, which
takes zero or more parameters and optionally returns a certain
value. Depending on the return value, functions can be
subdivided into string functions and number functions.
A function may have mandatory and optional parameters.
When invoked, the function must be supplied exactly as many
actual arguments as the number of its mandatory parameters.

Functions are invoked using the following syntax:

name (args)

where name is the function name and args is a
comma-separated list of expressions. For example, the following are valid
function calls:

foo(10)
interval("1 hour")
greylist("/var/my.db", 180)

The number of parameters a function takes and their data types
compose the function signature. When actual arguments are
passed to the function, they are converted to types of the
corresponding formal parameters.

There are two major groups of functions: built-in functions,
that are implemented in the mailfromd binary, and
user-defined functions, that are written in MFL. The
invocation syntax is the same for both groups.

Mailfromd is shipped with a rich set of library
functions. These are described in Library. In addition to
these you can define your own functions.

Function definitions can appear anywhere between the handler
declarations in a filter program, the only requirement being that the
function definition occur before the place where the function is
invoked.

where name is the name of the function to define, param-decl is
a comma-separated list of parameter declarations. The syntax of the
latter is the same as that of variable declarations (see Variable declarations), i.e.:

typename

declares the parameter name having the type type. The
type is string or number.

Optional qualifier declares the scope of visibility for that
function (see scope of visibility). It is similar to that of
variables, except that functions cannot be local (i.e. you cannot
declare function within another function).

The public qualifier declares a function that may be referred
to from any module, whereas the static qualifier declares a
function that may be called only from the current module
(see Modules). The default scope is ‘public’,
unless specified otherwise in the module declaration (see module structure).

For example, the following declares a function ‘sum’, that takes
two numeric arguments and returns a numeric value:

func sum(number x, number y) returns number

Similarly, the following is a declaration of a static function:

static func sum(number x, number y) returns number

Parameters are referenced in the function-body by their name,
the same way as other variables. Similarly, the value of a parameter can be
altered using set statement.

A function can be declared to take a certain number of optional
arguments. In a function declaration, optional abstract arguments
must be placed after the mandatory ones, and must be separated from
them with a semicolon. The following example is a definition of
function foo, which takes two mandatory and two optional
arguments:

func foo(string msg, string email; number x, string pfx)

Mandatory parameters are: msg and email. Optional
parameters are: x and pfx. The actual number of
arguments supplied to the function is returned by a special construct
$#. In addition, the special construct @arg
evaluates to the ordinal number of variable arg in the list of
formal parameters (the first argument has number ‘0’). These two
constructs can be used to verify whether an argument is supplied to
the function.

When an actual argument for parameter n is supplied, the number
of actual arguments ($#) is greater than the ordinal number
of that parameter in the declaration list (@n). Thus,
the following construct can be used to check if an optional argument
arg is actually supplied:

Within a function body, optional arguments are referenced
exactly the same way as the mandatory ones. Attempt to dereference an
optional argument for which no actual parameter was supplied, results
in an undefined value, so be sure to check whether a parameter is
passed before dereferencing it.

A function can also take variable number of arguments (such
functions are called variadic). This is
indicated by the use of ellipsis as the last abstract parameter. The
statement below defines a function foo taking one mandatory, one
optional and any number of additional arguments:

func foo (string a ; string b, ...)

All actual arguments passed in a list of variable arguments are
coerced to string data type. To refer to these arguments in the
function body, the following construct is used:

$(expr)

where expr is any valid MFL expression, evaluating to
a number n. This construct refers to the value of nth
actual parameter from the variable argument list. Parameters are
numbered from ‘1’, so the first variable parameter is $(1),
and the last one is $($# - Nm - No), where Nm
and No are numbers of mandatory and optional parameters to the
function.

For example, the function below prints all its arguments:

func pargs (string text, ...)
do
echo "text=%text"
loop for number i 1,
while i <= $# - 1,
set i i + 1
do
echo "arg %i=" . $(i)
done
done

Note the loop limits. The last variable argument has number $#
- 1, because the function takes one mandatory argument.

The function-body is any list of valid mailfromd
statements. In addition to the statements discussed below
(see Statements) it can also contain the return statement,
which is used to return a value from the function. The syntax of the
return statement is

return value

As an example of this, consider the following code snippet that
defines the function ‘sum’ to return a sum of its two arguments:

func sum(number x, number y) returns number
do
return x + y
done

The returns part in the function declaration is optional. A
declaration lacking it defines a procedure, or void
function, i.e. a function that is not supposed to return any value.
Such functions cannot be used in expressions, instead they are
used as statements (see Statements). The following example
shows a function that emits a customized temporary failure notice:

func stdtf()
do
tempfail 451 4.3.5 "Try again later"
done

A function may have several names. An alternative name (or
alias) can be assigned to a function by using alias
keyword, placed after param-decl part, for example:

func foo()
alias bar
returns string
do
…
done

After this declaration, both foo() and bar() will refer
to the same function.

The number of function aliases is unlimited. The following fragment
declares a function having three names:

func foo()
alias bar
alias baz
returns string
do
…
done

Although this feature is rarely needed, there are sometimes cases when
it may be necessary.

A variable declared within a function becomes a local variable to
this function. Its lexical scope ends with the terminating
done statement.

Parameters, local variables and global variables are using
separate namespaces, so a parameter name can coincide with the name of
a global, in which case a parameter is said to shadow the
global. All references to its name will refer to the parameter,
until the end of its scope is reached, where the global one
becomes visible again. Consider the following example:

4.13.1 Some Useful Functions

To illustrate the concept of user-defined functions, this subsection
shows the definitions of some of the library functions shipped with
mailfromd13.
These functions are contained in modules installed along with the
mailfromd binary. To use any of them in your code, require
the appropriate module as described in import, e.g. to use the
revip function, do require 'revip'.

Both operators have a special form, for ‘MX’ pattern matching.
The expression:

x mx matches y

is evaluated as follows: first, the expression x is analyzed and, if
it is an email address, its domain part is selected. If it is not,
its value is used verbatim. Then the list of ‘MX’s for this domain is
looked up. Each of ‘MX’ names is then compared with the regular
expression y. If any of the names matches, the expression
returns true. Otherwise, its result is false.

Similarly, the expression:

x mx fnmatches y

returns true only if any of the ‘MX’s for (domain or email) x
match the globbing pattern y.

Both mx matches and mx fnmatches can signal the
following exceptions: e_temp_failure, e_failure.

The value of any parenthesized subexpression occurring within the
right-hand side argument to matches or mx matches can be
referenced using the notation ‘\d’, where d is the
ordinal number of the subexpression (subexpressions are numbered from
left to right, starting at 1). This notation is allowed in the
program text as well as within double-quoted strings and
here-documents, for example:

if $f matches '.*@\(.*\)\.gnu\.org\.ua'
set message "Your host name is \1;"
fi

Remember that the grouping symbols are ‘\(’ and ‘\)’ for
basic regular expressions, and ‘(’ and ‘)’ for extended
regular expressions. Also make sure you properly escape all special
characters (backslashes in particular) in double-quoted strings, or
use single-quoted strings to avoid having to do so
(see singe-vs-double, for a comparison of the two forms).

4.14.8 Boolean Expressions

A boolean expression is a combination of relational or
matching expressions using the boolean operators and, or
and not, and, eventually, parentheses to control nesting:

Expression

Result

xandy

True only if both x and
y are true.

xory

True if any of x or y
is true.

notx

True if x is false.

table 4.1: Boolean Operators

Binary boolean expressions are computed using shortcut evaluation:

x and y

If x ⇒ false, the result is false
and y is not evaluated.

x or y

If x ⇒ true, the result is true and
y is not evaluated.

4.14.9 Operator Precedence

Operator precedence is an abstract value associated with each
language operator, that determines the order in which operators are
executed when they appear together within a single expression.
Operators with higher precedence are executed first. For example,
‘*’ has a higher precedence than ‘+’, therefore the
expression a + b * c is evaluated in the following order: first
b is multiplied by c, then a is added to the
product.

When operators of equal precedence are used together they are
evaluated from left to right (i.e., they are left-associative),
except for comparison operators, which are non-associative (these are
explicitly marked as such in the table below). This means that you
cannot write:

if 5 <= x <= 10

Instead, you should write:

if 5 <= x and x <= 10

The precedences of the mailfromd operators where selected
so as to match that used in most programming languages.14

The following table lists all operators in order of decreasing precedence:

(...)

Grouping

$ %

Sendmail macros and mailfromd variables

* /

Multiplication, division

+ -

Addition, subtraction

<< >>

Bitwise shift left and right

< <= >= >

Relational operators (non-associative)

= != matches fnmatches

Equality and special comparison (non-associative)

&

Logical (bitwise) AND

^

Logical (bitwise) XOR

|

Logical (bitwise) OR

not

Boolean negation

and

Logical ‘and’.

or

Logical ‘or’

.

String concatenation

4.14.10 Type Casting

When two operands on each side of a binary expression have
different type, mailfromd evaluator coerces them to a
common type. This is known as implicit type casting. The rules
for implicit type casting are:

Both arguments to an arithmetical operation are cast to numeric
type.

Both arguments to the concatenation operation are cast to string.

Both arguments to ‘match’ or ‘fnmatch’ function are cast to string.

The argument of the unary negation (arithmetical or boolean) is
cast to numeric.

Otherwise the right-hand side argument is cast to the type of the
left-hand side argument.

The construct for explicit type cast is:

type(expr)

where type is the name of the type to coerce expr to. For
example:

string(2 + 4*8) ⇒ "34"

4.15 Variable and Constant Shadowing

When any two named entities happen to have the same name we say that a
name clash occurs. The handling of name clashes depends on
types of the entities involved in it.

function – any

A name of a constant or variable can coincide with that of a function,
it does not produce any warnings or errors because functions,
variables and constants use different namespaces. For example, the
following code is correct:

const a 4
func a()
do
echo a
done

When executed, it prints ‘4’.

function – function, handler – function, and function – handler

Redefinition of a function or using a predefined handler name
(see Handlers) as a function name results in a fatal error. For
example, compiling this code:

handler – variable

A variable name can coincide with a handler name. For example, the
following code is perfectly OK:

string envfrom "M"
prog envfrom
do
echo envfrom
done

handler – handler

If two handlers with the same name are defined, the definition that
appears further in the source text replaces the previous one. A
warning message is issued, indicating locations of both definitions,
e.g.:

mailfromd: sample.mf:116: Warning: Redefinition of handler
`envfrom'
mailfromd: sample.mf:34: Warning: This is the location of the
previous definition

variable – variable

Defining a variable having the same name as an already defined one results
in a warning message being displayed. The compilation succeeds. The
second variable shadows the first, that is any subsequent
references to the variable name will refer to the second variable.
For example:

string x "Text"
number x 1
prog envfrom
do
echo x
done

Compiling this code results in the following diagnostics:

mailfromd: sample.mf:4: Redeclaring `x' as different data type
mailfromd: sample.mf:2: This is the location of the previous
definition

Executing it prints ‘1’, i.e. the value of the last definition of
x.

The scope of the shadowing depends on storage classes of the two
variables. If both of them have external storage class (i.e. are
global ones), the shadowing remains in effect until the end of input.
In other words, the previous definition of the variable is effectively
forgotten.

If the previous definition is a global, and the shadowing definition
is an automatic variable or a function parameter, the scope of this
shadowing ends with the scope of the second variable, after which the
previous definition (global) becomes visible again. Consider the
following code:

variable – constant

If a constant is defined which has the same name as a previously
defined variable (the constant shadows the variable), the
compiler prints the following diagnostic message:

file:line: Warning: Constant name `name' clashes with a variable name
file:line: Warning: This is the location of the previous definition

A similar diagnostics is issued if a variable is defined whose name
coincides with a previously defined constant (the variable shadows
the constant).

In any case, any subsequent notation %name refers to the last
defined symbol, be it variable or constant.

Notice, that shadowing occurs only when using %name notation.
Referring to the constant using its name without ‘%’ allows to
avoid shadowing effects.

If a variable shadows a constant, the scope of the shadowing depends
on the storage class of the variable. For automatic variables and
function parameters, it ends with the final done closing the
function. For global variables, it lasts up to the end of input.

$ mailfromd --test sample.mf
mailfromd: sample.mf:3: Warning: Variable name `a' clashes with a
constant name
mailfromd: sample.mf:1: Warning: This is the location of the previous
definition
10
4
State envfrom: continue

constant – constant

Redefining a constant produces a warning message. The latter
definition shadows the former. Shadowing remains in effect until
the end of input.

4.16 Statements

Statements are language constructs, that, unlike expressions, do not
return any value. Statements execute some actions, such as assigning
a value to a variable, or serve to control the execution flow in the
program.

4.16.1 Action Statements

An action statement instructs mailfromd to
perform a certain action over the message being processed. There are
two kinds of actions: return actions and header manipulation actions.

Reply Actions

Reply actions tell Sendmail to return given response code
to the remote party. There are five such actions:

accept

Return an accept reply. The remote party will continue
transmitting its message.

reject codeexcodemessage-expr

reject (code-expr, excode-expr, message-expr)

Return a reject reply. The remote party will have to
cancel transmitting its message. The three arguments are optional,
their usage is described below.

tempfail codeexcodemessage

tempfail (code-expr, excode-expr, message-expr)

Return a ‘temporary failure’ reply. The remote party can retry
to send its message later. The three arguments are optional,
their usage is described below.

discard

Instructs Sendmail to accept the message and silently discard
it without delivering it to any recipient.

continue

Stops the current handler and instructs Sendmail to
continue processing of the message.

Two actions, reject and tempfail can take up to three
optional parameters. There are two forms of supplying these
parameters.

In the first form, called literal or traditional notation,
the arguments are supplied as additional words after the action name,
and are separated by whitespace. The first argument is a three-digit
RFC 2821 reply code. It must begin with ‘5’ for
reject and with ‘4’ for tempfail. If two arguments
are supplied, the second argument must be either an extended
reply code (RFC 1893/2034) or a textual string to be
returned along with the SMTP reply. Finally, if all three
arguments are supplied, then the second one must be an extended reply
code and the third one must give the textual string. The following
examples illustrate the possible ways of using the reject
statement:

The notion textual string, used above means either a literal
string or an MFL expression that evaluates to string.
However, both code and extended code must always be literal.

The second form of supplying arguments is called functional
notation, because it resembles the function syntax. When used in this
form, the action word is followed by a parenthesized group of exactly
three arguments, separated by commas. Each argument is a
MFL expression. The meaning and ordering of the arguments is
the same as in literal form. Any or all of these three arguments may
be absent, in which case it will be replaced by the default value. To
illustrate this, here are the statements from the previous example,
written in functional notation:

The same as add, but if the header name already
exists, it will be removed first, for example:

replace "X-Last-Processor" "Mailfromd 8.7"

delete name

Delete the header named name:

delete "X-Envelope-Date"

These actions impose some restrictions. First of all, their first
argument must be a literal string (not a variable or expression).
Secondly, there is no way to select a particular header instance
to delete or replace, which may be necessary to properly handle
multiple headers (e.g. ‘Received’). For more elaborate ways of
header modifications, see Header modification functions.

4.16.2 Variable Assignments

An assignment is a special statement that assigns a value to
the variable. It has the following syntax:

set namevalue

where name is the variable name and value is the value to
be assigned to it.

Assignment statements can appear in any part of a filter program.
If an assignment occurs outside of function or handler definition,
the value must be a literal value (see Literals). If it
occurs within a function or handler definition, value can be any
valid mailfromd expression (see Expressions). In this
case, the expression will be evaluated and its value will be assigned
to the variable. For example:

set delay 150
prog envfrom
do
set delay delay * 2
…
done

4.16.3 The pass statement

The pass statement has no effect. It is used in places
where no statement is needed, but the language syntax requires one:

on poll $f do
when success:
pass
when not_found or failure:
reject 550
done

4.16.4 The echo statement

The echo statement concatenates all its arguments into a single
string and sends it to the syslog using the priority
‘info’. It is useful for debugging your script, in
conjunction with built-in constants (see Built-in constants), for
example:

4.17 Conditional Statements

Conditional expressions, or conditionals for short, test
some conditions and alter the control flow depending on the
result. There are two kinds of conditional statements: if-else
branches and switch statements.

The syntax of an if-else branching construct is:

if conditionthen-body [else else-body] fi

Here, condition is an expression that governs control flow
within the statement. Both then-body and else-body are
lists of mailfromd statements. If condition is
true, then-body is executed, if it is false, else-body is
executed. The ‘else’ part of the statement is optional. The
condition is considered false if it evaluates to zero, otherwise it is
considered true. For example:

if $f = ""
accept
else
reject
fi

This will accept the message if the value of the Sendmail
macro $f is an empty string, and reject it otherwise. Both
then-body and else-body can be compound statements
including other if statements. Nesting level of
conditional statements is not limited.

To facilitate writing complex conditional statements, the elif
keyword can be used to introduce alternative conditions, for example:

This statement is executed as follows: the condition
expression is evaluated and if its value equals x1 or x2
(or any other x from the first case), then
stmt1 is executed. Otherwise, if condition evaluates
to y1 or y2 (or any other y from the second
case), then stmt2 is executed. Other case
branches are tried in turn. If none of them matches, stmt
(called the default branch) is executed.

There can be as many case branches as you wish. The
default branch is optional. There can be at most one
default branch.

If the value of mailfromd variable x is 2 or 3,
it will accept the message immediately, and add a ‘X-Branch: 1’
header to it. If x equals 2 or 4 or 6, this code will add
‘X-Branch: 2’ header to the message and will continue processing
it. Otherwise, it will reject the message.

The controlling condition of a switch statement may evaluate
to numeric or string type. The type of the condition governs the
type of comparisons used in case branches: for numeric types,
numeric equality will be used, whereas for string types, string
equality is used.

4.18 Loop Statements

The loop statement allows for repeated execution of a block of code,
controlled by some conditional expression. It has the following form:

where stmt1, stmt2, and stmt3 are statement lists,
expr1 and expr2 are expressions.

The control flow is as follows:

If stmt1 is specified, execute it.

Evaluate expr1. If it is zero, go to 6. Otherwise, continue.

Execute stmt3.

If stmt2 is supplied, execute it.

If expr2 is given, evaluate it. If it is zero, go to 6.
Otherwise, go to 2.

End.

Thus, stmt3 is executed until either expr1 or
expr2 yield a zero value.

The loop body – stmt3 – can contain special
statements:

break [label]

Terminates the loop immediately. Control passes to ‘6’ (End)
in the formal definition above. If label is supplied, the
statement terminates the loop statement marked with that label. This
allows to break from nested loops.

It is similar to break statement in C or shell.

next [label]

Initiates next iteration of the loop. Control passes to ‘4’ in
the formal definition above. If label is supplied, the
statement starts next iteration of the loop statement marked with that
label. This allows to request next iteration of an upper-level
loop from a nested loop statement.

The loop statement can be used to create iterative statements
of arbitrary complexity. Let’s illustrate it in comparison with C.

The statement:

loop
do
stmt-list
done

creates an infinite loop. The only way to exit from such a loop is to
call break (or return, if used within a function),
somewhere in stmt-list.

The following statement is equivalent to while (expr1)
stmt-list in C:

loop while expr
do
stmt-list
done

The C construct for (expr1; expr2; expr3)
is written in MFL as follows:

loop for stmt1, while expr2, stmt2
do
stmt3
done

For example, to repeat stmt3 10 times:

loop for set i 0, while i < 10, set i i + 1
do
stmt3
done

Finally, the C ‘do’ loop is implemented as follows:

loop
do
stmt-list
done while expr

As a real-life example of a loop statement, let’s consider the
implementation of function ptr_validate, which takes a single
argument ipstr, and checks its validity using the following algorithm:

Perform a DNS reverse-mapping for ipstr, looking up the
corresponding PTR record in ‘in-addr.arpa’. For each record
returned, look up its IP addresses (A records). If ipstr is
among the returned IP addresses, return 1 (true), otherwise
return 0 (false).

4.19 Exceptional Conditions

When the running program encounters a condition it is not able
to handle, it signals an exception. To illustrate the concept,
let’s consider the execution of the following code fragment:

if primitive_hasmx(domainpart($f))
accept
fi

The function primitive_hasmx (see primitive_hasmx) tests whether the
domain name given as its argument has any ‘MX’ records. It should
return a boolean value. However, when querying the Domain Name
System, it may fail to get a definite result. For example, the DNS
server can be down or temporary unavailable. In other words,
primitive_hasmx can be in a situation when, instead of returning
‘yes’ or ‘no’, it has to return ‘don't know’. It has
no way of doing so, therefore it signals an exception.

Each exception is identified by exception type, an integer
number associated with it.

4.19.1 Built-in Exceptions

The lowest 19 exception numbers are reserved for
built-in exceptions. These are declared in module status.mf.
The following table summarizes all built-in exception types implemented by
mailfromd version 8.7:

e_dbfailure

General database failure. For example, the database cannot be
opened. This exception can be signaled by any function that queries
any DBM database.

e_divzero

Division by zero.

e_exists

This exception is emitted by dbinsert built-in if the
requested key is already present in the database (see dbinsert).

e_eof

Function reached end of file while reading. See I/O functions,
for a description of functions that can signal this exception.

e_failure

failure

e_failure

A general failure has occurred. In particular, this exception is
signaled by DNS lookup functions when any permanent failure occurs.
This exception can be signaled by any DNS-related function
(hasmx, poll, etc.) or operation (mx matches).

e_format

Invalid input format. This exception is signaled if input data to a
function are improperly formatted. In version 8.7 it is
signaled by message_burst function if its input message is not
formatted according to RFC 934. See Message digest functions.

e_invcidr

Invalid CIDR notation. This is signaled by match_cidr function
when its second argument is not a valid CIDR.

e_invip

Invalid IP address. This is signaled by match_cidr function
when its first argument is not a valid IP address.

e_invtime

Invalid time interval specification. It is signaled by
interval function if its argument is not a valid time interval
(see time interval specification).

e_io

An error occurred during the input-output operation. See I/O functions, for a description of functions that can signal this
exception.

e_macroundef

A Sendmail macro is undefined.

e_noresolve

The argument of a DNS-related function cannot be resolved to host
name or IP address. Currently only ismx (see ismx) raises
this exception.

e_range

The supplied argument is outside the allowed range. This is
signalled, for example, by substring function (see substring).

e_regcomp

Regular expression cannot be compiled. This can happen when a
regular expression (a right-hand argument of a matches
operator) is built at the runtime and the produced string is an
invalid regex.

e_ston_conv

String-to-number conversion failed. This can be signaled when a
string is used in numeric context which cannot be converted to the numeric
data type. For example:

set x "10a"
if x / 2
…

The if condition will signal ston_conv, since ‘10a’
cannot be converted to a number.

e_temp_failure

temp_failure

e_temp_failure

A temporary failure has occurred. This can be signaled by
DNS-related functions or operations.

In addition to these, two symbols are defined that are not exception
types in the strict sense of the world, but are provided to make
writing filter scripts more convenient. These are success,
meaning successful return from a function, and not_found,
meaning that the required entity (e.g. domain name or email address)
was not found. See Figure 4.1, for an illustration on
how these can be used. For consistency with other exception codes,
these can be spelled as e_success and e_not_found.

4.19.2 User-defined Exceptions

You can define your own exception types using the dclex
statement:

dclex type

In this statement, type must be a valid MFL
identifier, not used for another constant (see Constants).
The dclex statement defines a new exception identified by
the constant type and allocates a new exception number for it.

The type can subsequently be used in throw and
catch statements, for example:

dclex myrange
number fact(number val)
returns number
do
if val < 0
throw myrange "fact argument is out of range"
fi
…
done

4.19.3 Exception Handling

Normally when an exception is signalled, the program execution is
terminated and the MTA is returned a tempfail
status. Additional information regarding the exception is then output
to the logging channel (see Logging and Debugging). However, the
user can intercept any exception by installing his own
exception-handling routines.

An exception-handling routine is introduced by a try–catch
statement, which has the following syntax:

try
do
stmtlist
done
catch exception-list
do
handler-body
done

where stmtlist and handler-body are sequences of
MFL statements and exception-list is the list of
exception types, separated by the word or. A special
exception-list ‘*’ is allowed and means all exceptions.

This construct works as follows. First, the statements from
stmtlist are executed. If the execution finishes
successfully, control is passed to the first statement after the
‘catch’ block. Otherwise, if an exception is signalled and this
exception is listed in exception-list, the execution is passed to the
handler-body. If the exception is not listed in
exception-list, it is handled as usual.

The following example shows a ‘try--catch’ construct used for
handling eventual exceptions, signalled by primitive_hasmx.

The ‘try--catch’ statement can appear anywhere inside a function or
a handler, but it cannot appear outside of them. It can also be nested
within another ‘try--catch’, in either of its parts. Upon exit from a
function or milter handler, all exceptions are restored to the state
they had when it has been entered.

A catch block can also be used alone, without preceding try
part. Such a construct is called a standalone catch. It is
mostly useful for setting global exception handlers in a begin
statement (see begin/end). When used within a usual function or
handler, the exception handlers set by a standalone catch
remain in force until either another standalone catch appears further
in the same function or handler, or an end of the function is
encountered, whichever occurs first.

A standalone catch defined within a function must return from
it by executing return statement. If it does not do that
explicitly, the default value of 1 is returned. A standalone catch
defined within a milter handler must end execution with any of the
following actions: accept, continue, discard,
reject, tempfail. By default, continue is
used.

It is not recommended to mix ‘try--catch’ constructs and
standalone catches. If a standalone catch appears within a
‘try--catch’ statement, its scope of visibility is undefined.

Upon entry to a handler-body, two implicit positional arguments
are defined, which can be referenced in handler-body as $1
and $2. The first argument gives the numeric code of the
exception that has occurred. The second argument is a textual string
containing a human-readable description of the exception.

The following is an improved version of the previous example, which
uses these parameters to supply more information about the failure:

All variables remain visible within catch body, with the
exception of positional arguments of the enclosing handler. To access
positional arguments of a handler from the catch body, assign
them to local variables prior to the ‘try--catch’ construct, e.g.:

You can also generate (or raise) exceptions explicitly in the
code, using throw statement:

throw excodedescr

The arguments correspond exactly to the positional parameters of the
catch statement: excode gives the numeric code of the
exception, descr gives its textual description. This statement
can be used in complex scripts to create non-local exits from deeply
nested statements.

Notice, that the the excode argument must be an immediate
value: an exception identifier (either a built-in one or one declared
previously using a dclex statement).

4.20 Sender Verification Tests

The filter script language provides a wide variety of functions for
sender address verification or polling, for short. These
functions, which were described in SMTP Callout functions, can be
used to implement any sender verification method. The additional data
that can be needed is normally supplied by two global variables:
ehlo_domain, keeping the default domain for the EHLO
command, and mailfrom_address, which stores the sender address
for probe messages (see Predefined variables).

Notice the way envfrom handles success and
not_found, which are not exceptions in the strict sense of the
word.

The above paradigm is so common that mailfromd provides a
special language construct to simplify it: the on statement.
Instead of manually writing the wrapper function and using it as a
switch condition, you can rewrite the above example as:

The condition is either a function call or a special poll
statement (see below). The values used in when branches are
normally symbolic exception names (see exception names).

When the compiler processes the on statement it does the
following:

Builds a unique wrapper function, similar to that described in
Figure 4.1; The name of the function is constructed
from the condition function name and an unsigned number,
called exception mask, that is unique for each combination of
exceptions used in when branches; To avoid name clashes with
the user-defined functions, the wrapper name begins and ends with
‘$’ which normally is not allowed in the identifiers;

Translates the on body to the corresponding switch
statement;

A special form of the condition is poll keyword,
whose syntax is:

poll [for] email
[host host]
[from domain]
[as email]

The order of particular keywords in the poll statement is
arbitrary, for example as email can appear before
email as well as after it.

The simplest form, poll email, performs the standard
sender verification of email address email. It is translated
to the following function call:

stdpoll(email, ehlo_domain, mailfrom_address)

The construct poll email host host, runs the
strict sender verification of address email on the given host.
It is translated to the following call:

strictpoll(host, email, ehlo_domain, mailfrom_address)

Other keywords of the poll statement modify these two basic
forms. The as keyword introduces the email address to be used
in the SMTPMAIL FROM command, instead of
mailfrom_address. The from keyword sets the domain
name to be used in EHLO command. So, for example the following
construct:

poll email host host from domain as addr

is translated to

strictpoll(host, email, domain, addr)

To summarize the above, the code described in Figure 4.2
can be written as:

4.21 Modules

A module is a logically isolated part of code that implements a
separate concern or feature and contains a collection of conceptually
united functions and/or data. Each module occupies a separate
compilation unit (i.e. file). The functionality provided by
a module is incorporated into another module or the main program by
requiring this module or by importing the desired
components from it.

4.21.1 Declaring Modules

A module file must begin with a module declaration:

module modname [interface-type].

Note the final dot.

The modname parameter declares the name of the module. It is
recommended that it be the same as the file name without the
‘.mf’ extension. The module name must be a valid MFL
literal. It also must not coincide with any defined MFL
symbol, therefore we recommend to always quote it (see example below).

The optional parameter interface-type defines the default
scope of visibility for the symbols declared in this module. If it is
‘public’, then all symbols declared in this module are made
public (importable) by default, unless explicitly declared otherwise
(see scope of visibility). If it is ‘static’, then all
symbols, not explicitly marked as public, become static. If the
interface-type is not given, ‘public’ is assumed.

The actual MFL code follows the ‘module’ line.

The module definition is terminated by the logical end of its
compilation unit, i.e. either by the end of file, or by the
keyword bye, whichever occurs first.

Special keyword bye may be used to prematurely end the current
compilation unit before the physical end of the containing file.
Any material between bye and the end of file is ignored by the
compiler.

4.21.2 Scope of Visibility

Scope of Visibility of a symbol defines from where this symbol
may be referred to. Symbols in MFL may have either of the
following two scopes:

Public

Public symbols are visible from the current module, as well as from
any external modules, including the main script file, provided that
they are properly imported (see import).

Static

Static symbols are visible only from the current module. There is
no way to refer to them from outside.

The default scope of visibility for all symbols declared within
a module is defined in the module declaration (see module structure). It may be overridden for any individual symbol by
prefixing its declaration with an appropriate qualifier: either
public or static.

4.21.3 Require and Import

Functions or variables declared in another module must be imported
prior to their actual use. MFL provides two ways of doing
so: by requiring the entire module or by importing selected
symbols from it.

Module Import: requiremodname

The require statement instructs the compiler to locate the
module modname and to load all public interfaces from it.

The compiler looks for the file modname.mf in the
current search path (see include search path). If no such file
is found, a compilation error is reported.

For example, the following statement:

require revip

imports all interfaces from the module revip.mf.

Another, more sophisticated way to import from a module is to use
the ‘from ... import’ construct:

from module import symbols.

Note the final dot. The ‘from’ and ‘module’ statements are
the only two constructs in MFL that require the delimiter.

The module has the same semantics as in the require
construct. The symbols is a comma-separated list of symbol
names to import from module. A symbol name may be given in
several forms:

Literal

Literals specify exact symbol names to import. For example,
the following statement imports from module A.mf symbols
‘foo’ and ‘bar’:

from A import foo,bar.

Regular expression

Regular expressions must be surrounded by slashes. A regular
expression instructs the compiler to import all symbols whose
names match that expression. For example, the following statement
imports from A.mf all symbols whose names begin with ‘foo’
and contain at least one digit after it:

from A import '/^foo.*[0-9]/'.

The type of regular expressions used in the ‘from’ statement is
controlled by #pragma regex (see regex).

Regular expression with transformation

Regular expression may be followed by a s-expression, i.e. a
sed-like expression of the form:

s/regexp/replace/[flags]

where regexp is a regular expression, replace is a
replacement for each part of the input that matches regexp.
S-expressions and their parts are discussed in detail in
s-expression.

The effect of such construct is to import all symbols that match the
regular expression and apply the s-expression to their names.

For example:

from A import '/^foo.*[0-9]/s/.*/my_&/'.

This statement imports all symbols whose names begin with ‘foo’
and contain at least one digit after it, and renames them, by prefixing
their names with the string ‘my_’. Thus, if A.mf declared a
function ‘foo_1’, it becomes visible under the name of ‘my_foo_1’.

4.22 MFL Preprocessor

Before compiling the script file, mailfromd preprocesses
it. The built-in preprocessor handles only file inclusion
(see include), while the rest of traditional facilities, such as
macro expansion, are supported via m4, which is used as an
external preprocessor.

The detailed description of m4 facilities lies far beyond
the scope of this document. You will find a complete user manual in
GNU M4 in GNU M4 macro processor. For the
rest of this section we assume the reader is sufficiently
acquainted with m4 macro processor.

The external preprocessor is invoked with -s flag, instructing
it to include line synchronization information in its output, which
is subsequently used by MFL compiler for purposes of error
reporting. The initial set of macro definitions is supplied in file
pp-setup, located in the library search path16,
which is fed to the preprocessor input before the script file itself.
The default pp-setup file renames all m4 built-in
macro names so they all start with the prefix ‘m4_’17. It changes comment characters to ‘/*’, ‘*/’ pair,
and leaves the default quoting characters, grave (‘`’) and acute
(‘'’) accents without change. Finally, pp-setup defines the
following macros:

M4 Macro: booleandefined(identifier)

The identifier must be the name of an optional abstract
argument to the function. This macro must be used only within a function
definition. It expands to the MFL expression that yields
true if the actual parameter is supplied for identifier.
For example:

This function will return last num characters of text if
num is supplied, and entire text otherwise, e.g.:

rcut("text string") ⇒ "text string"
rcut("text string", 3) ⇒ "ing"

Invoking the defined macro with the name of a mandatory argument
yields true

M4 Macro: printf(format, …)

Provides a printf statement, that formats its optional
parameters in accordance with format and sends the resulting
string to the current log output (see Logging and Debugging).
See String formatting, for a description of format.

Example usage:

printf('Function %s returned %d', funcname, retcode)

M4 Macro: string_(msgid)

A convenience macro. Expands to a call to gettext (see NLS Functions).

M4 Macro: string_list_iterate(list, delim, var, code)

This macro intends to compensate for the lack of array data type
in MFL. It splits the string list into segments
delimited by string delim. For each segment, the MFL
code code is executed. The code can use the variable var
to refer to the segment string.

For example, the following fragment prints names of all existing
directories listed in the PATH environment variable:

Care should be taken to properly quote its arguments. In the code
below the string str is treated as a comma-separated list of
values. To avoid interpreting the comma as argument delimiter the
second argument must be quoted:

string_list_iterate(str, `","', seg, `
echo "next segment: " . seg')

M4 Macro: N_(msgid)

A convenience macro, that expands to msgid verbatim. It is
intended to mark the literal strings that should appear in the
.po file, where actual call to gettext (see NLS Functions) cannot be used. For example:

You can obtain the preprocessed output, without starting actual
compilation, using -E command line option:

$ mailfromd -E file.mf

The output is in the form of preprocessed source code, which is sent
to the standard output. This can be useful, among others, to debug
your own macro definitions.

Macro definitions and deletions can be made on the command line, by
using the -D and -U options. They have the
following format:

-D name[=value]

--define=name[=value]

Define a symbol name to have a value value. If
value is not supplied, the value is taken to be the empty
string. The value can be any string, and the macro can be
defined to take arguments, just as if it was defined from within the
input using the m4_define statement.

For example, the following invocation defines symbol COMPAT to
have a value 43:

$ mailfromf -DCOMPAT=43

-U name

--undefine=name

A counterpart of the -D option is the option -U
(--undefine). It undefines a preprocessor symbol whose name
is given as its argument. The following example undefines the symbol
COMPAT:

$ mailfromf -UCOMPAT

The following two options are supplied mainly for debugging purposes:

--no-preprocessor

Disables the external preprocessor.

--preprocessor=command

Use command as external preprocessor. Be especially careful
with this option, because mailfromd cannot verify whether
command is actually some kind of a preprocessor or not.

4.23 Example of a Filter Script File

In this section we will discuss a working example of the filter
script file. For the ease of illustration, it is divided in several
sections. Each section is prefaced with a comment explaining its
function.

This filter assumes that the mailfromd.conf file contains the
following:

Next rule rejects all messages coming from hosts with dynamic IP
addresses. A regular expression used to catch such hosts is not 100%
fail-proof, but it tries to cover most existing host naming patterns:

Footnotes

Implementation note: actually, the references
are not interpreted within the string, instead, each such string is
split at compilation time into a series of concatenated atoms. Thus,
our sample string will actually be compiled as:

$f . " last connected from " . last_ip . ";"

See Concatenation, for a description of this construct. You can
easily see how various strings are interpreted by using
--dump-tree option (see --dump-tree). In this case,
it will produce:

The
only exception is ‘not’, whose precedence in MFL is
much lower than usual (in most programming languages it has the same
precedence as unary ‘-’). This allows to write conditional
expressions in more understandable manner. Consider the following
condition:

if not x < 2 and y = 3

It is understood as “if x is not less than 2 and y equals 3”,
whereas with the usual precedence for ‘not’ it would have meant
“if negated x is less than 2 and y equals 3”.