mvmf: mvmf language (MFL)

Executive summary: the mvmf language is called MFL, and is a mongrelization of a C-like language and a SIEVE-like language.

The syntax for mail filtering and mail disposition is mainly (entirely?) relegated to the "SIEVE" side of MFL. Sieve is a relatively easy to understand and easy to write language-- a lot of what you might want to do with your mail can be done entirely using Sieve constructs. It's easy enough that you can learn a lot about it just by browsing some examples. The C side of the MFL language is provided, in part, for those who want to orchestrate more elaborate control over mail delivery and over those SIEVE constructs.

This is not a language manual. This is more like a set of notes about MFL along with some simple examples. If you know "Sieve" (or know how to read the SIEVE RFC or look at some examples), and if you know "C" (or don't care about using the "C-like" side of MFL), these notes and examples should get you going.

You may want to skip directly to:

 

Background

Although MFL may be now used in different utilities, it was developed for the mail delivery agent (now called mvmda). It was very tempting to invent something completely new for its language; for instance, logic-based or assertion-based languages seemed like they might fit the bill. But after a few of those flights of fancy, we decided that we wanted to find something that was easy to understand even by non-programmers, and yet might be of use to programmers as well. These utilities are really just for helping people deal with their mail, so we wanted something that at a basic level was fairly easy to use and configure and with which to achieve reasonable goals, but which could be used in more complicated ways for those who wanted to do so. We also wanted to get something accomplished and not go off on a language quest. So we decided to use a fairly simple syntax for the basic mail controls, but also allow the use of a procedural programming language, as well as other special extensions, to support more complex configuration. The procedural level we chose was like "C", so we call that part the "C-like" language.

We had run across the "SIEVE" language quite some years ago, when there was an internet draft put out by cyrusoft. In its form at that time, SIEVE looked reasonable: it provided some control structures that described some simple ways to look at mail, without being in itself a full-blown programming language. It seemed the ideal thing to wrap a procedural language around: making a nice union of a language providing control flow and complex evaluation and one providing basic mail handling syntax. There's been a bit of a cloud there, though: SIEVE was enventually codified into an RFC, and recently a lot of work has been going on to propose extensions to it in various ways. Some of the extensions make the integration into an enclosing language more difficult. However, one does not have to accept all extensions, and indeed some of the extensions make a lot of sense.

At any rate, we combined a C-like procedural language and a SIEVE-like mail filtering language, and called it "MFL." To get anything out of MFL, you need to use the SIEVE parts of the language-- and in fact the SIEVE parts can be used without using any C-like syntax. So we'll start there.

 

MFL is like SIEVE

The basic SIEVE definition is set out in RFC 5228, superseding RFC 3028 (see also the related reading area.) It is a control language that allows you to perform tests on parts of a mail message, and take actions that dispose of the mail message in various ways.

Because MFL combines a C-like syntax and a SIEVE syntax, all SIEVE language elements must be enclosed in a "sieve" block, which is the keyword sieve followed by a code block enclosed in curly braces. (Each utility using MFL may offer exceptions to this; for example the mvmda mail delivery agent can be instructed to assume that the script starts out in sieve mode.) A sieve block can appear anywhere in MFL that a C-like statement or an expression term can appear. (SIEVE constructs always return a value, even if that value is simply a completion status.)

Sieve statements fall into three broad categories: control, test, and action. A control statement affects the flow of control (e.g. by evaluating a test statement and conditionally executing other statements as a result). A test statement tests a condition, and an action statement performs some function such as saving mail into a mailbox. Any of these sorts of statements can be used as a SIEVE element, or SIEVE statements can be combined into a SIEVE program section.

For example, the following is a section of SIEVE code (enclosed, as normally required, in a "sieve" block):

    sieve {
        if header :is "From" "big@boss.com" {
            discard;
        }
        else {
            keep;
        }
    }
Whereas the following illustrates how a SIEVE element can be used as part of a C-like expression:
    int score;

    /* Assign a big score for this */
    score += 64 * sieve { header :is "from" "big@boss.com" };

    if (score > 500)
        sieve { discard; }
    else
        sieve { keep; }

Sieve implementation status

This is the current MFL implementation status for SIEVE language elements, and elements under consideration.

Statement Type Status Comments
RFC: rfc5228 (the fundamental SIEVE spec)
address Test Complete  
allof Test Complete  
anyof Test Complete  
discard Action Complete  
else Control Complete  
elsif Control Complete  
envelope Test Complete Requires capability "envelope"
exists Test Complete  
false Test Complete  
fileinto Action Complete Requires capability "fileinto";
Also see "MV Extensions" note below.
header Test Complete  
if Control Complete  
keep Action Complete  
not Test Complete  
redirect Action Complete  
reject Action Complete Requires capability "reject"
require Action Complete  
size Test Complete  
stop Action Complete  
text lexical Complete multi-line text literal using the keyword "text";
true Test Complete  
# lexical Complete See "Misc Notes" section.
RFC: rfc3431 (SIEVE extension: relational tests)
:count Tagged option Complete  
:value Tagged option Complete  
[capability] These elements require capability "relational"
RFC: rfc3598 (SIEVE extension: subaddress)
:detail Tagged option See notes  
:user Tagged option See notes  
[capability] Requires capability "subaddress"
[notes] Was formerly Internet Draft draft-murchison-sieve-subaddress. This RFC identifies a useful function: to be able to isolate the base recipient name from the extension part for mail systems such as qmail which allow extension addresses. Will probably be implemented in mvmf, no timeframe.
RFC: rfc3685 (SIEVE extension: spamtest and virustest)
spamtest Test Unsupported  
virustest Test Unsupported  
[capability] Requires capability "spamtest". Also requires that capability "relational" be enabled.
[notes] Was formerly Internet Draft draft-daboo-sieve-spamtest. This RFC specifies a couple of tests against any spam and virus analysys that may have been applied and normalized into simple status information by the underlying SIEVE implementation. We have other more fined-grained assessment in mind for mvmf, and so are not going to implement this any time soon, if at all.
RFC: rfc3894 (SIEVE extension: copying without side effects)
:copy Tagged option Complete  
[capability] Requires capability "copy"
[notes] Was formerly Internet Draft draft-degener-sieve-copy
Draft: draft-degener-sieve-editheader
addheader Action Complete  
deleteheader Action Complete  
replaceheader Action Complete  
:index Tagged option Complete  
:last Tagged option Complete  
:newname Tagged option Complete  
:newvalue Tagged option Complete  
[capability] These elements require capability "editheader"
[variables] See the notes in the section about the Sieve "variables" extension.
[draft] The specification is still a draft and as such is subject to change or removal.
[new] Note: new version draft-degener-sieve-editheader-01 has a couple of minor changes that we have not yet incorporated (nor necessarily agree with)
Draft: draft-ietf-sieve-imapflags-00.txt
[thoughts] We're interested in watching this, as it would be useful to manipulate IMAP flags, however this draft still needs shaping up. No details to report here.
Draft: draft-murchison-sieve-regex
:regex Tagged option Complete  
[capability] Requires capability "regex"
[notes] Matched subparts are available via the C-like language elements.
[draft] The specification is still a draft and as such is subject to change or removal.
Draft: draft-ietf-sieve-refuse-reject
refuse Action Unsupported  
[capability] Requires capability "refuse"
[notes] Specifies an action "refuse" that will refuse an email message at SMTP-time, rather than trying to reject or discard it later. Since mvmda doesn't yet operate at SMTP-time, we don't have any support for this. However it's definitely worth implementing should mvmda be hooked into an SMTP engine in any way.
[draft] The specification is still a draft and as such is subject to change or removal.
Draft: draft-ietf-sieve-variables
set Action See notes  
setdate Action See notes  
string Test See notes  
:length Tagged option See notes  
:lower Tagged option See notes  
:lowerfirst Tagged option See notes  
:upper Tagged option See notes  
:upperfirst Tagged option See notes  
[capability] Requires capability "variables"
[thoughts] We have mixed feelings about this. It provides a reasonable facility for the SIEVE language, but MFL already provides much more powerful access to variables already. On the other hand, we could implement it fairly easily, so we may give it a go. Not to mention that I'd like to have the "string" test command (even if I don't like its name).
[draft] The specification is still a draft and as such is subject to change or removal.
Draft: draft-ietf-sieve-vacation
vacation Action Complete  
:addresses Tagged option Complete  
:days Tagged option Complete  
:from Tagged option Complete  
:handle Tagged option Complete  
:mime Tagged option Complete  
:subject Tagged option Complete  
[capability] Capability "vacation" required
[notes] This implementation requires the :from option as it does not want to guess the email address of the script owner.

The draft specifies that if :handle is omitted, one be synthesized from the amalgamation of other options. This implementation does not do that, it assumes a handle of "default" if one is not given.

[draft] The specification is still a draft and as such is subject to change or removal. (Though that's not likely.)
Draft: draft-degener-sieve-body
[capability] Capability "body" required
[thoughts] We may implement this, but have other ideas on the matter that fit into the MFL framework a little tighter. However, this may be a useful first step in its proposed form.
[draft] The specification is still a draft and as such is subject to change or removal.
MV Extension: C
C lexical Complete Introduces a block of C-like code, which must be enclosed in curly braces. This is not a Sieve extension, as it is part of MFL, and does not need to be enabled via a Sieve "require" statement.
MV Extension: dnsbl
dnsbl Test Complete Requires capability "vnd.mvmf.dnsbl" . See notes below.
:ip Tagged option Complete Specify an IP address to be tested, overriding the default.
MV Extensions in general
  See below for notes about MV extensions.
MV Extension: sieve
sieve lexical Complete Introduces a block of sieve code, which must be enclosed in curly braces. This allows a script writer to write code that is guaranteed to be in "sieve" mode without having to know the encompassing context. Useful, for example, for script code that is meant to be included (via @include) by some other script. This is not a Sieve extension, as it is part of MFL, and does not need to be enabled via a Sieve "require" statement.
Misc Notes
:comparator "i;ascii-casemap" and "i;octet" are complete;
i;ascii-numeric is also implemented but is not appropriate in all cases. This comparator requires capability "comparator-i;ascii-numeric" .
i;ascii-casemap is the default.
# comments As of the 20050825 release, MFL supports the "#"-style end of line comment. This can conflict somewhat with MFL's preprocessor statements if you enable the '#' character as a preprocessor introducer character. However, also as of the 20050825 release the default preprocessor introducer character has been changed to '@' which does not conflict with this style of comment. Note that MFL also supports the C-like "//" syntax to begin a comment to the end of the line, as well as "/*..*/" bracketed comments.

MV Language extensions

MV extension: C
The C statement allows C-like code to be parsed and executed inside a SIEVE block. C-like code is included in curly braces. E.g.:
    sieve {
       if header "to" "user@example.com" {
           C { to_me = 1; }
	   keep;
       }
    }
This is not a Sieve extension, it is simply part of the MFL implementation of Sieve as an embedded language, and thus is not enabled via a Sieve "require" statement.

 


MV extension: dnsbl
The dnsbl statment is used to test against one or more DNS-based blocklists/blacklists (called DNSBLs). It takes an optional ":ip" tagged option, plus two arguments, each of which may be a string or string list:
    dnsbl  [:ip <ipaddr: string>] <blnames: string-list>  <result-codes: string-list>

With the :ip option, the given IP address ipaddr is tested against the specified DNSBLs. No mail message need be open to use this form.

With no :ip option, the dnsbl statement tests each responsible IP address (each IP address that is believed to be responsible for transporting the message to the local server) against the specified DNSBLs.

The statement returns true if the IP address was found in one of the DNSBLs, and false if it was not.

Note: a list of responsible IP addresses is maintained by any application that includes and supports this language construct. Your MFL code may call a built-in-function "$msg_rip_add()" to add an IP address to this list. This would normally happen when the application calls a specifically-named MFL function, i.e. a "hook," at a relevant point in its processing. For example, the mvmda (Mail Delivery Agent) calls a hook when it has opened and scanned the incoming message. See each application's documentation for descriptions of any hooks supported.

DNSBL blacklist names and result code types are registered in a system-wide file dnsbl.conf normally located in /usr/local/share/mvmf . The blacklist name identifies a domain name suffix to be used for DNSBL lookups, and a result code is a mnemonic name for a result returned by that DNSBL. Generally all blnames have a result code "std" defined as their standard result. Some DNSBLs have various results indicating various things. Result code "*" will match any result returned by the DNSBL lookup.

The code section:

    sieve {
        if dnsbl ["spamcop", "njabl"] "std" {
            discard; stop;
        }
    }
tests all responsible IP addresses against the standard result codes of both the "spamcop" and "njabl" DNSBLs, discarding the message if one of the IP addresses is listed.

This code:

    int flag;
    flag = sieve { dnsbl :ip "127.0.0.2" "spamhaus" "sbl" };
sets the variable "flag" depending on whether the specific IP address 127.0.0.2 is found in the "spamhaus" sbl DNSBL, as does this code:
    string prefix = "127.";
    int flag = sieve { dnsbl :ip [ prefix + "0.0.2" ] "spamhaus" "sbl" };

The "dnsbl" capability must be enabled via SIEVE's "require" statement in order to use this statement, using a capability name of "vnd.mvmf.dnsbl" . Earlier MFL implementations used a capability name of "dnsbl" instead of the more proper vendor-specific name. When you configure and build the mvmf package you can still choose to support the old capability name as an alias.

 


MV Extension: pipe in "fileinto"
If the target of a "fileinto" statement begins with a vertical bar ("|"), it's taken to mean that the mail message should be piped into the command following the vertical bar. This capability is restricted, though; it works only if the "pipe_allow" control has been adminstratively enabled. (Administrative controls are addressed elsewhere.)
    // Assumes that this has been executed somewhere in admin mode:
    $admin_int_set( "pipe_allow", 1 );
             .
             .
             .
    sieve {
        fileinto "|process-report";
    }

MFL also has an interface to system-defined plugins using the $cusp_ family of built-in-functions. The CUSP interface is intended for helper applications that have a more compex interface than simply piping a message into an external program. See, for example, the clamdif interface to a clamav anti-virus daemon.

 


MV Extension: expressions in SIEVE constructs
Some SIEVE constructs have been extended so that an MFL expression can occur in some places. In general, anywhere that a string can be used in a SIEVE statement, you can place an MFL expression. However, because of some syntax conflicts between SIEVE string lists and MFL expressions (see below), such expressions must appear only within the string list format (i.e., enclosed in square brackets). For a rather contrived example:
  string my_other_domain;
  my_other_domain = "example.com";
  sieve {
     if not address :domain "To" [ my_other_domain ] {
         keep;
     }
     else {
         redirect [ (string)"myself@" + my_other_domain ];
     }
  }

What's the syntax conflict that requires that expressions only be used inside square brackets? The problem comes down to the fact that in SIEVE statements, terms outside of square brackets are separated by spaces, while terms inside of square brackets are separated by commas. Imagine allowing expressions anywhere, such as in this potential case:

    string header_name = "subject";
    sieve {
        if header :contains header_name ["ADV"] { discard; }
    }
If the parser is allowed to look for an expression outside of a string list (i.e., outside of square brackets), it can easily think that
    header_name [ "ADV" ]
follows the syntax of an array reference. While this simple case might seem easy to resolve, more complex cases are not. Fortunately, terms inside of string lists are separated by commas, removing that kind of ambiguity there. (Use of commas in that part of the SIEVE language seems a mite inconsistent, but I'm not complaining.)

 


MV Extension: message part selection

MFL knows about the MIME structure of messages, and has the concept of a "current message part." All header tests are done in the context of this current message part. In the default state, the top-level message part (i.e., the message headers) are selected. MFL scripts may select other message parts (e.g. the children of a multipart message part). Let's say you have a message whose top level content type is "multipart/alternative" with two children, one with content type "text/plain" and the next with "text/html". Consider these three statements:

    /* A */  sieve { header :matches "content-type" "multipart/*" }
    /* B */  sieve { header :matches "content-type" "text/plain*" }
    /* C */  sieve { header :matches "content-type" "text/html*" }
With the top (default) message part selected, only statement A evaluates to TRUE. With the first child selected, only B is true, and with the second child selected, only C is true.

 

MFL is like C

MFL's enveloping language for procedural and logic flow is C-like in nature. (We won't explain "C" here, but if you are reading this far you probably either know it or can find out about it.) We say "C-like" because it gets its data typing and control flow from C, but it doesn't implement a full C language.

What's in MFL's C-like component: fundamental and compound data types, expressions, control flow statements, variables, initializers, functions, and a cpp-like preprocessor.

What's not: switch statement (and case labels), function prototypes (except for function definitions), local variables inside any compound block (including functions);

Oddities: MFL C-like variables may contain "$". Thus "$a" and "a$" are legal variable names. Functions supplied as part of mvmf will always name variables and functions starting with '$' -- other script writers should avoid doing that.

Status of C-like implementation

The following table shows the implementation status of various C-like elements of MFL.

Thing Status Comments
Fundamental data types and modifiers
unsigned Supported May be used as a modifier to an integer type, or by itself as an abbreviation for unsigned int
short Supported May be used as a modifier to int, or by itself as an abbreviation for short int
long Supported May be used as a modifier to int, or by itself as an abbreviation for long int
char Supported A 1-byte value
int Supported A natural integer (currently 2-byte value)
$int4$ Supported MFL extension to guarantee 4-byte int
(short int is 2 bytes; long int is 4 bytes.)
float Supported Floating point number
double Supported Double precision floating point number
string Supported MFL extension for character strings.
Aggregates and metatypes
typedef Supported Defines a new type in terms of another type definition
struct Supported A data structure
union Supported A data overlay
enum Supported Enumerated types. See note "E".
[] Supported Arrays.
* Supported Pointers. See note "P".
Control statements
break Supported Exit loop.
continue Supported Next loop iteration.
do Supported Loop control.
if Supported Conditional execution
else Supported (implemented as part of "if")
for Supported Loop control
return Supported Return from a function (with optional return value)
switch..case Not supported Value detection -- no plan to support this
while Supported Loop control
pv$ Supported MFL extension to print to stdout. See note "PV".
[built-in functions] Supported Described here
[MFL functions] Supported User-written functions (see below)
Expressions and evaluation
C Supported MFL extension to introduce a C-like code block, which must be enclosed in curly braces. Useful to guarantee that a script is in C-like mode, e.g. for a code snippet that is intended to be included by another script.
sizeof Supported Returns number of bytes of a variable, storage element, type, or expression. See note "SO".
sieve Supported MFL extension to introduce a SIEVE code block, which must be enclosed in curly braces.
( ) Supported Parenthetical grouping for explicit precedence
? : Supported Conditional expression: test ? truth : falsth
! Supported "!" Operator (boolean not)
~ Supported "~" Operator (bitwise complement)
, Supported "," Operator (return second of two expressions)
= Supported "=" Operator or assignment
== Supported "==" Operator (compare equal)
==^ Supported "==^" Operator (string compare equal, ignore case) See note "S".
!= Supported "!=" Operator (compare not equal)
!=^ Supported "!=^" Operator (string compare not equal, ignore case) See note "S".
=. Supported "=." Operator (regex matching, pattern on RHS) See note "S".
!=. Supported "!=." Operator (regex non-matching, pattern on RHS) See note "S".
=? Supported "=?" Operator (glob-style matching, pattern on RHS) See note "S".
=?^ Supported "=?^" Operator (glob-style matching, ignore case, pattern on RHS) See note "S".
!=? Supported "!=?" Operator (glob-style non-matching, pattern on RHS) See note "S".
!=?^ Supported "!=?^" Operator (glob-style non-matching, ignore case, pattern on RHS) See note "S".
< Supported "<" Operator (compare less than)
<^ Supported "<^" Operator (string compare less than, ignore case) See note "S".
<= Supported "<=" Operator (compare less than or equal)
<=^ Supported "<=^" Operator (string compare less than or equal, ignore case) See note "S".
<< Supported "<<" Operator (shift left)
<<= Supported "<<=" Assignment operator (shift left)
> Supported ">" Operator (compare greater than)
>^ Supported ">^" Operator (string compare greater than) See note "S".
>= Supported ">=" Operator (compare greater than or equal)
>=^ Supported ">=^" Operator (string compare greater than or equal, ignore case) See note "S".
>> Supported ">>" Operator (shift right)
>>= Supported ">>=" Assignment operator (shift right)
+ Supported "+" Operator (add)
+= Supported "+=" Assignment operator (add)
++ Supported "++" Operator (increment)
- Supported "-" Operator (subtract)
-= Supported "-=" Assignment operator (subtract)
-- Supported "--" Operator (decrement)
* Supported "*" Prefix operator (pointer dereference)
* Supported "*" Infix operator (multiply)
*= Supported "*=" Assignment operator (multiply)
/ Supported "/" Operator (divide)
/= Supported "/=" Assignment operator (divide)
% Supported "%" Operator (modulo)
%= Supported "%=" Assignment operator (modulo)
[ Supported "[" Operator(kinda) (array reference)
& Supported "&" infix operator (bitwise AND)
& Supported "&" Prefix operator (address-of)
&& Supported "&&" Operator (boolean AND)
&= Supported "&=" Assignment operator (bitwise AND)
| Supported "|" Operator (bitwise OR)
|| Supported "||" Operator (boolean OR)
|= Supported "|=" Assignment operator (bitwise OR)
. Supported "." Operator (member reference)
-> Supported "->" Operator (member reference)
Preprocessor
MFL sports a basic cpp-like preprocessor; this section lists the preprocessor elements you might expect. The preprocessor is conceptually responsible for removing comments and interpreting preprocessor directives. Directives are indicated in a script by using '@' as the first character on the line (i.e., in the first column). (The use of '#', as with C, conflicts with the Sieve-mandated comment characters. Nevertheless when you configure mvmf you can enable the use of '#' instead of or in addition to the '@' character.)
There are more elaborate notes about the use of the preprocessor later in this document.
@define Supported Defines a preprocessor constant or macro.
@else Supported Starts the "else" part of a preprocessor conditional
@endif Supported Ends a preprocessor conditional block
@help Supported Prints the supported preprocessor statements (useful only in interactive mode)
@ifdef Supported Begins a conditional block that is executed if a preprocessor symbol is defined.
@ifndef Supported Begins a conditional block that is executed if a preprocessor symbol is not defined.
@include Supported Includes the contents of another MFL file at this point in the compilation/interpretation
/*..*/ Supported Block comment
// Supported Comment to end of line
Preprocessor extensions
MFL has some other preprocessor directives.
@ifdef_func Supported Begins a conditional block that is executed if an mfl function is defined.
@ifdef_var Supported Begins a conditional block that is executed if an mfl variable is defined.
@ifndef_func Supported Begins a conditional block that is executed if an mfl function is not defined.
@ifndef_var Supported Begins a conditional block that is executed if an mfl variable is not defined.
@include_noerr Supported Like @include, but doesn't complain if the file is not available. Useful for loading control files that don't have to exist.

Note E: A specific assignment to an enum member definition is not supported, e.g.:

    enum {
        aa,
	bb=3,
	cc }
does not work in MFL.

Note P: Pointers are supported inasmuch as you can point to some other data storage defined in an MFL program. Pointers are constrained at run-time only to reference a particular data object.


Note PV: pv$ is basically a hack to allow debugging printouts. You can print a single value e.g.

    int x;
    pv$ x;
or you can print a printf-like format string and a single argument, e.g.:
    int x;
    x = 23;
    pv$ "x is %d\n", x;

Note S: These string comparison operators are MFL extensions.


Note SO: The MFL interpreter does late type binding and late evaluation; there is currently no way for the interpreter to figure out the type of an expression without evaluating it. sizeof can give you the size of an expression, but note well that the expression will be evaluated in the process. E.g. in:

   int x = 0;
   int sx;

   sx = sizeof( x = 3 );
sx will be the size of the expression (an int), and x will be set to 3!

 

Initializers

Definitions of most variable types can include initializers (one exception is unions). Examples:
    int x = 3;
    int y = {3};
instantiate x and y and set both values to 3. (It's an inconsistency of C syntax (so we follow it) that scalar initializers can optionally be enclosed in braces, yet initializers for scalars within aggregates can not.)
   struct {
       int key;
       string val;
} kvt[3] = {
    { 10, "key 1" },
    { 20, "key 2" },
    { 30, "key 3" } };

 

Statements and blocks in expressions

MFL allows compound blocks to be used as terms in an expression. The statements inside of the compound blocks still need to be fully-formed statements themselves. The value of the compound block is the value of the last statement executed in it.
    int i;
    i = { 3 + 7 };		// This is wrong
    i = { 3 + 7; };		// This is correct
The mvmf application may also be configured (when it is compiled) to allow the use of some native C-like statements as expression terms. These statements include do, for, if, pv$, and while. sizeof is always available as a term, while break, return, and continue never are. As with statements inside of compound blocks, the statement as an expression term must still be fully-formed, which can result in some odd-looking code, as in this contrived example:
    int i;
    i = if ( foo() ) 3; else 4; ;	// looks odd
    i = if ( foo() ) {3;} else {4;} ;	// perhaps better.

 

Strings

Strings are a native type in MFL. A string's basic type is a fixed length even though it refers to a string that may change size. For this reason a string variable can easily be included as a member of an aggregate (e.g. arrays or structs) -- the string data element is an anchor for the string, and not the string itself. (This may sound like a pointer, but it's not: it obeys native type semantics, not pointer semantics.)

Strings are implemented using something called a refstr, which is a view of a referenced string. Multiple views to common strings may be obtained via string pointers (i.e. (string *)) which can be dereferenced to access their reference target. When a string is modified, any views into the underlying string object are modified to reflect the change. For example, consider this MFL code sequence:

    string s = "I am a test";
    string *sp = $str_sub(s, 7, 4);       // points to "test"
    string *s1P = $str_sub(s, 2, 2);      // points to "am"
    *s1P = "used to be";
The string s is now
    "I used to be a test"
and the string pointer sp still points to
    "test"

Every string, including the targets of string pointers, has the following attributes:

Start offset
The offset into the base string at which this refstr begins. This offset is absolute, but it may be subject to change either explicitly or implicitly when some other refstr alters the base string (see discussion above). An absolute value of -1 means that the offset is anchored at the beginning of the string, despite any attempts to move it.

End offset
Just like the start offset, this is where in the base string that the refstr ends. A value of -1 anchors the end offset at the end of the base string.

Current byte index aka "bx"
Some string operations keep track of a current byte index. For example, built-in-functions to find or extract a token from a string will use the bx as a start point, and will update it to provide the next start point -- this allows repeated "find token" operations to step through the string. The bx may be used internally in some cases as well. Note that the bx is associated with each refstr, including the refstr that is the target of a pointer. So:
      string s = "hello there";
      string *sP;
      sP = $str_sub(s, 4, 4);	// "o th"
      $str_bx_set(*sP,3);	// Sets to 3, the 'h' position
      $str_bx(s);		// will initially be 0
      $str_bx(*sP);		// is still 3.
  

 


Some operations on strings:

Conversions.
Converting from a non-string to a string will attempt to do the right thing. e.g. '(string)3' yields the string "3". Converting from a string to a non-string will also attempt to do something reasonable: e.g. assigning from a string to an int will perform a C-like 'atol()' function on the string.

"+"
Adding something to a string will convert the second term to a string if possible, then return the concatenation of the two strings. So '(string)"hello" + 3' yields the string '"hello3"' .

Comparisons
'==', '!=', '<', '<=', '>=', '>' do what you might expect.

Regex matches
'=.' performs a regex match on a string, with the pattern on the right hand side. e.g. in:
       string s = "hello";
       if ( s =. "h.*o" )
           some code here;
    
the test would succeed. '!=.' is the notted version of the test.

Wild matches
'=?' and '!=?' are similar to regex matching, but using more familiar (to some) matching where '?' matches exactly one character and '*' matches zero or more characters. This is just like the sieve ":matches" match-type.

Case insensitivity
Some of the string match operators have a case-insensitive mode which is indicated by using a '^' after the operator. These are: "==^", "!=^", "=?^", "!=?^", "<^", "<=^", ">^", and ">=^" . The '^' is supposed to suggest some kind of case shifting.

 


Notes about string pointers:

General
Every string is a refstr which is a view into an underlying string. The only way to manipulate this view is through string pointers. Various built-in functions such as $str_sub provide ways to create a refstr to another string; the reference is always done via a pointer. When you dereference the pointer, you are operating on the string.

"+" and "-"
If you add or subtract to a string pointer, you shift its position into the underlying string. Both the start and ending positions will be adjusted, unless they run off the end of the reference string. Consider:
       string s = "abcdefgh";
       string *sp;
       sp = $str_sub( s, 3, 2 );	// Now points to "de"
       ++sp;				// Now points to "ef"
       sp - 4;				// points to "ab"
       sp - 5;				// points to "a"
    


Note that a string literal is not a string until it is coerced into one. Since those kinds of coercions happen automatically in many places you might not notice the need for it. But be aware that:

    "abcdefg" + 3
is essentially an array reference to the third character of the character array (not a string!) "abcdefg", while
    (string)"abcdefg" + 3
evaluates to the string "abcdefg3" since the first term is coerced via the typecast.

 

built-in functions

A limited number of built-in functions are supported. They are described in a separate document.

 

MFL functions

An MFL function has a C-like syntax, with a declaration of a return type, a formal argument list, and a function body. One quirk of MFL functions is that a function is treated syntactically like a variable declaration, one side-effect of which is that it has to be terminated with a semicolon (or a comma and another declaration using the same type). An MFL function therefore looks something like this:

    /* Recursive function to return digits of an integer
       separated by spaces
    */
    string dp( int n ) {
        if ( n < 10 ) return (string)n;
        return dp( n/10 ) + " " + (string)(n%10);
    };

Using the above, dp(25821) returns the string "2 5 8 2 1"

 

Preprocessor

As with the "C-like" side of the language, we should call this "preprocessor-like" or more exactly "cpp-like" because it implements some of the functions of C's "cpp" preprocessor program. Because MFL includes a self-contained parser and interpreter, the preprocessor functions are built into the lexical input stage, and are thus not part of a separate preprocessor program. Nevertheless they do offer basic basic preprocessor capabilities. Here's a brief statement of what those capabilities are.

Comment removal. Conceptually, the preprocessor removes comments from the script before it is interpreted. There are two styles of comments: block comments and rest-of-line comments.

A block comment begins with the pair of characters /* and ends with the pair */. Comments do not nest: once a comment block is opened with /* the next */ closes it, even if another /* is encountered first.

A rest-of-line comment begins with the pair of characters // or with the single character # and ends at the end of the line. This is useful for annotating a single statement.

The following illustrates both kinds of comments:

    /* Basic SIEVE setup */
    sieve {
	require ["fileinto", "envelope"];
    }

    int score = 0;		// Declare and initialize score
    float f;			# A temporary

    /* Now look for a special "X-Spam-Score" header and adjust
       our integer score according to the floating point value
       found there.
    */
    if ( sieve { header :matches "X-Spam-Score" "*" } ) {
	f = $str_match(0);	// pick up the score value
	if ( ( f >= 9.0 ) && ( f <= 9.9 ) )
	    ++score;		// Significant value bumps score.
As you can see: block comments can span lines, rest-of-line comments continue only to the end of the line, and (sometimes) too many comments can obscure the meaning rather than amplify it. (Unless we are illustrating how comments are implemented, of course.)

Macro substitution. The preprocessor has its own symbol table. Symbols in this table are variously thought of as preprocessor symbols, macro names, or manifest constants. The combination of a symbol and its value may be thought of as a macro. Whenever the preprocessor encounters one of these symbols in the input stream (e.g., in the script), the value that has been assigned to that symbol is used instead of the actual symbol. This is known as macro substitution or macro expansion. A macro is created via the "@define" preprocessor directive, described below.

There are two kinds of macros: those with arguments and those without. Actual arguments to a macro are supplied in a parenthesized list, with arguments separated by commas. (Depending on the way MFL is built, whitespace may or may not be allowed between the macro name and the opening parenthesis. To be safe, don't use whitespace here: follow the macro name immediately by an opening parenthesis.)

A simple example of macro definition and substitution:

    @define ALTADDR "fred@example.com"

    sieve {
        redirect ALTADDR;
    }
Macro substitution for ALTADDR occurs before the "redirect" statement is parsed: that statement is parsed exactly as if it were written:
	redirect "fred@example.com";

A macro with arguments:

    @define aab(a,b) (a+a+b)

    int i = aab(3,4);
The preprocessor turns that into:
    int i = (3 + 3 + 4);
initializing variable i with a value of 10.

Macro references can not be recursive. While a macro is being expanded, it is prevented from further expansion until its value is completely substituted. (It's said that the macro is "painted blue" while it is ineligible for expansion.) This prevents a macro value from refering to the macro name itself, or to the name of another macro being expanded. Note also that macro substitution occurs on a token-by-token basis. Since a quoted string is an individual token, any macro names inside a quoted string are not substituted.

Preprocessor directives. The preprocessor is otherwise commanded via preprocessor directives. A preprocessor directive is indicated by a @ character at the first character position on a line, followed by a recognized preprocessor command. For compatibility with older mvmf releases, when you build and install mvmf, you can choose to enable '#' as an alternative preprocessor introducer character, in place of or in addition to the use of '@'. Note that the indentation in various examples is for clarity only: the @ (or '#') must occur at column 1. Whitespace may occur between the @ and the command and in fact is encouraged to indicate a nesting level. However, when one talks about a preprocessor directive, it's usually with the concatenation of the @ and the command name. Preprocessor directives:

@define macroname[(arguments)] [value]
Define a macro, either with or without formal arguments. Its simplest form is:
    @define NAME  value
so that whenever NAME is encountered in the script after this, the value is used instead. Formal arguments may also be given by including them in parenthesis directly following the macro name. Each occurance of a formal argument in the macro body will be replaced by the actual argument when the macro is invoked. For example:
    @define RS(rcpt) sieve { redirect rcpt; }

    RS( "bozo@example.com" )
will cause this code to be used:
    sieve { redirect "bozo@example.com"; }
Macros can be used to stand in for commonly used strings or sequences, particulary for code that might be changed from time to time (thus you'd only have to change the macro definition rather than changing code in multiple places in your script). As with C guidelines, making your macro names uppercase gives a visual clue that macro names are being used.

@else
Negates the effect of conditional parsing.

@endif
Ends a conditionally parsed block.

@help
Prints a list of valid preprocessor commands. Useful only in interactive mode.

@ifdef macroname
Tests whether a name exists in the preprocessor symbol table; if it does, allows interpretation of the script up to the next matching @else or @endif directive. If it doesn't, the script up to the next matching @else or @endif directive is not parsed. Example:
    /* @define DROPMAIL  */

    sieve {
    @ifdef DROPMAIL
	discard;
    @else
	keep;
    @endif
    }
i.e., if DROPMAIL is defined to the preprocessor, the discard statement is executed. Otherwise, the keep statement is executed.

@ifdef_func funcname
Tests whether an MFL function exists.

@ifdef_var varname
Tests whether an MFL variable exists.

@ifndef macroname
Tests whether a name doesn't exist in the preprocessor symbol table, and performs conditional script parsing based on that. i.e., this is the inverse of @ifdef.

@ifndef_func funcname
Tests whether an MFL function doesn't exist.

@ifndef_var varname
Tests whether an MFL variable doesn't exist.

@include fileref
Inserts another script file at this point in the parsing of the script where the @include is encountered. fileref is either a quoted string, in which case it refers to a user-level file, or it's a name enclosed in anglebrackets, in which case it refers to a system-level file. Each sort of access has its own include path, i.e. a list of directories that will be searched for the file (for a user-level file, the file is always looked for relative to the current directory before trying the user-level include path). Every application that uses MFL will establish at least one directory in the system-level path, and you can add directories to both paths, e.g. by using the built-in-function $mfl_incdir_add().
    @include "data.mfl"
inserts the contents of the file data.mfl located in the current directory (e.g., your home directory) or in the user-level include path, and
    @include <common.mfl>
inserts the contents of the file common.mfl that is found along the system-level include path.

@include_noerr fileref
Like @include, but ignores any error accessing the file.

 

Preprocessor statements are always acted on at parse time. You might have an elaborate MFL function that you store in its own file; using something like

    @include "bigfunction.mfl"
will always load and parse that file whether or not you ever need the function (assuming that this does not occur in a false preprocessor condition). If you only want to load the function when you know you are going to need it, you can include the file using a runtime parse and execute function, e.g.:
    sieve {
	if envelope :is "from" "monthly-report@example.com" {
	    C {
		$mfl_exec_string( "@include \"bigfunction.mfl\"" );
		bigfunction();
	    }
	}
    }

 

Misc notes

Depending on how it was compiled, each mvmf application may have the capability of executing system-wide or user-level MFL scripts when it starts. These can be used to define commonly-used functions, hook functions that are called automatically by the application at certain stages, variables, and so forth.

Each application may also call specially-named MFL functions at particular points in the utility's execution. You may supply these hook functions to affect some aspect of the application's operation at the point the hook is called.

Application-specific details such as these are described along with each utility that incorporates MFL.

 

Future plans and ideas

This section has been incorporated into the To Do page.

 

Examples

Examples are primarily relevant to each utility; please see the documentation for each mvmf application that uses MFL.