Metalua Manual

Previous Up Next

Chapter 1  Meta-programming in metalua

1.1  Concepts

Lua
Lua1 is an very clean and powerful language, with everything the discrimiating hacker will love: advanced data structures, true function closures, coroutines (a.k.a collaborative multithreading), powerful runtime introspection and metaprogramming abilities, ultra-easy integration with C.

The general approach in Lua's design is to implement a small number of very powerful concepts, and use them to easily offer particular services. For instance, objects can be implemented through metatables (which allow to customize the behavior of data structures), or through function closures. It's quite easy to develop a class based system with single or multiple inheritance, or a prototype based system la Self2, or the kind of more advanced and baroque things that only CLOS users could dream of...

Basically, Lua could be though of as Scheme, with:
  • a conventional syntax (similar to Pascal's or Ruby's);
  • the associative table as basic datatype instead of the list;
  • no full continuations (although coroutines are actually one-shot semi-continuations);
  • no macro system.
Metalua
Metalua is an extension of Lua, which essentially addresses the lack of a macro system, by providing compile-time metaprogramming (CTMP) and the ability for user to extend the syntax from within Lua.

Runtime metaprogramming (RTMP) allows a program to inspect itself while running: an object can thus enumerate its fields and methods, their properties, maybe dump its source code; it can be modified on-the-fly, by adding a method, changing its class, etc. But this doesn't allow to change the shape of the language itself: you cannot use this to add exceptions to a language that lacks them, nor call-by-need (a.k.a. ``lazy'') evaluation to a traditional language, nor continuations, nor new control structures, new operators... To do this, you need to modify the compiler itself. It can be done, if you have the sources of the compiler, but that's generally not worth it, given the complexity of a compiler program and the portability and maintenance issues that ensue.

Metaprogramming
A compiler is essentially a system which takes sources (generally as a set of ASCII files), turn them into a practical-to-play-with data structure, does stuff on it, then feeds it to a bytecode or machine code producer. The source and byte-code stages are bad abstraction levels to do anything practical: the sensible way to represent code, when you want to manipulate it with programs, is tha abstract syntax tree (AST). This is the practical-to-play-with abstraction level mentionned above: a tree in which each node corresponds to a control structure, where the inclusion relationship is respected (e.g. if an instruction I is in a loop's body B, then the node representing I is a subtree of the tree representing B)...

CTMP is possible if the compiler allows its user to read, generate and modify AST, and to splice these generated AST back into programs. This is done by Lisp and Scheme by making the programmer write programs directly in AST (hence the lot of parentheses in Lisp sources), and by offering a magic instruction that executes during compilation a piece of code which generates an AST, and inserts this AST into the source AST: that magic couple of instructions is the macro system.

Metalua has a similar execute-and-splice-the-result magic construct; the main difference is that it doesn't force the programmer to directly write in AST (although he's allowed to if he finds it most suitable for a specific task). However, supporting ``real language syntax'' adds a couple of issues to CTMP: there is a need for transformation from real syntax to AST and the other way around, as well as a need for a way to extend syntax.

This manual won't try to teach Lua, there's a wealth of excellent tutorials on the web for this. I highly recommand Roberto Ierusalimschy's ``Programming in Lua'' book3, a.k.a. ``the blue PiL'', probably one of the best programming books since K&R's ``The C Language''. Suffice to say that a seasonned programmer will be able to program in Lua in a couple of hours, although some advanced features (coroutines, function environments, function closures, metatables, runtime introspection) might take longer to master if you don't already know a language supporting them.

Among resources available online, my personal favorites would be:
  • The reference manual: http://www.lua.org/manual/5.1
  • The first edition of PiL, kindly put online by its author at http://www.lua.org/pil
  • A compact reference sheet (grammar and standard libraries) by Enrico Colombini: http://lua-users.org/wiki/LuaShortReference
  • Generally speaking, the Lua community wiki (http://lua-users.org/wiki) is invaluable.
  • The mailing list (http://www.lua.org/lua-l.html) and the IRC channel (irc://irc.freenode.net/#lua) are populated with a very helpful community.
  • You will also find a lot of useful programs and libraries for Lua hosted at http://luaforge.net: various protocol parsers, bindings to 2D/3D native/portable GUI, sound, database drivers...
  • A compilation of the community's wisdom will eventually be plubished as ``Lua Gems''; you can already check its ToC at http://www.lua.org/gems
So, instead of including yet another Lua tutorial, this manual will rather focus on the features specific to Metalua, that is mainly:
  • The couple of syntax extensions offered by Metalua over Lua;
  • The two CTMP magic constructs +{...} and -{...};
  • The libraries which support CTMP (mainly for syntax extension).
Metalua design philosophy
Metalua has been designed to occupy a vacant spot in the space of CTMP-enabled languages:
  • Lisp offers a lot of flexibility, at the price of macro-friendly syntax, rather than user-friendly. Besides the overrated problem of getting used to those lots of parentheses, it's all too tempting to mix macros and normal code in Lisp, in a way that doesn't visually stand out; this really doesn't encourage the writing of reusable, mutually compatible libraries. As a result of this extreme flexibility, large scale collaboration doesn't seem to happen, and Lisps lack a de facto comprehensive set of standard libs, besides those included in Common Lisp's specification. Comparisons have been drawn between getting Lispers to work together and herding cats...

  • Macro-systems bolted on existing languages (Template Haskell4, CamlP55, MetaML6...) tend to be hard to use: the syntax and semantics of these target languages are complex, and make macro writing much harder than necessary. Moreover, for some reason, most of these projects target statically typed languages: although static inference type systems la Hindley-Milner are extremely powerful tools in many contexts, my intuition is that static types are more of a burden than a help for many macro-friendly problems.

  • Languages built from scratch, such as converge7 or Logix8, have to bear with the very long (often decade) maturing time required by a programming language. Moreover, they lack the existing libraries and developpers that come with an already succesful language.
Lua presents many features that beg for a real macro system:
  • Its compact, clear, orthogonal, powerful semantics, and its approach of giving powerful generic tools rather than ready-made closed features to its users.
  • Its excellent supports for runtime metaprogramming.
  • Its syntax, despite (or due to its) being very readable and easy to learn, is also extremely simple to parse. This means no extra technology gets in the way of handling syntax (no BNF-like specialized language, no byzantine rules and exceptions). Even more importantly, provided that developers respect a couple of common-sense rules, cohabitation of multiple syntax extensions in a single project is made surprizingly easy.
Upon this powerful and sane base, Metalua adds CTMP with the following design goals:
  • Simple things should be easy and look clean: writing simple macros shouldn't require an advanced knowledge of the language's internals. And since we spend 95% of our time not writing macros, the syntax should be optimized for regular code rather than for code generation.
  • Good coding practices should be encouraged. Among others, separation between meta-levels must be obvious, so that it stands out when something interesting is going on. Ideally, good code must look clean, and messy code should look ugly.
  • However, the language must be an enabler, not handcuffs: it should ensure that users know what they're doing, but it must provide them with all the power they're willing to handle.
Finally, it's difficult to talk about a macro-enabled language without making Lisp comparisons. Metalua borrows a lot to Scheme's love for empowering minimalism, through Lua. However, in many other respects, it's closer to Common Lisp: where Scheme insists on doing The Right Thing, CL and metalua assume that the programmer knows better than the compiler. Therefore, when a powerful but potentially dangerous feature is considered, metalua generally tries to warn the user that he's entering the twilight zone, but will let him proceed. The most prominent example is probably macro hygiene. Scheme pretty much constraints macro writing into a term rewriting system: it allows the compiler to enforce macro hygiene automatically, but is sometimes crippling when writing complex macros (although it is, of cource, Turing-complete). Metalua opts to offer CL style, non-hygienic macros, so that AST are regular data manipulated by regular code. Hygienic safety is provided by an optional library, which makes it easy but not mandatory to do hygienic macros.

1.2  Metalua syntax extensions over Lua

Metalua is essentially Lua + code generation at compile time + extensible syntax. However, there are a couple of additional constructs, considered of general interest, which have been added to Lua's original syntax. These are presented in this section

1.2.1  Anonymous functions

Lua lets you use anonymous functions. However, when programming in a functional style, where there are a lot of short anonymous functions simply returning an expression, the default syntax becomes cumbersome. Metalua being functional-styel friendly, it offers a terser idiom: ``function(arg1, arg2, argn) return some_expr end'' can be written:
``|arg1,arg2,argn| some_exp''.

Notice that this notation is currying-friendly, i.e. one can easily write functions that return functions: ``function(x) return function(y) return x+y end end'' is simply written ``|x||y| x+y''.

Lua functions can return several values, but it appeared that supporting multiple return values in metalua's short lambda notation caused more harm than good. If you need multiple returns, use the traditional long syntax.

Finally, it's perfectly legal to define a parameterless function, as in | | 42. This makes a convenient way to pass values around in a lazy way.

1.2.2  Functions as infix operators

In many cases, people would like to extend syntax simply to create infix binary operators. Haskell offers a nice compromize to satisfy this need without causing any mess, and metalua incorporated it: when a function is put between backquotes, it becomes infix. for instance, let's consider the plus function ``plus=|x,y|x+y''; this function can be called the classic way, as in ``plus (20, 22)''; but if you want to use it in an infix context, you can also write ``20 `plus` 22''.

1.2.3  Algebraic datataypes

This syntax for datatypes is of special importance to metalua, as it's used to represent source code being manipulated. Therefore, it has its dedicated section later in this manual.

1.2.4  Metalevel shifters

These to dual notations are the core of metaprogramming: one transforms code into a manipulaeble representation, and the other transforms the representation back into code. They are noted +{...} and -{...}, and due to their central role in metalua, their use can't be summed up adequately here: they are fully described in the subsequent sections about metaprogramming.

1.3  Data structures

1.3.1  Algebraic Datatypes (ADT)

(ADT is also the usual accronym for Abstract DataType. However, I'll never talk about abstract datatypes in this manual, so there's no reason to get confused about it. ADT always refers to algebraic datatypes).

Metalua's distinctive feature is its ability to easily work on program source codes as trees, and this include a proper syntax for tree manipulation. The generic table structure offered by Lua is definitely good enough to represent trees, but since we're going to manipulate them a lot, we give them a specific syntax which makes them easier to read and write.

So, a tree is basically a node, with:
  • a tag (a string, stored in the table field named ``tag'')
  • some children, which are either sub-trees, or atomic values (generally strings, numbers or booleans). These children are stored in the array-part9 of the table, i.e. with consecutive integers as keys.
Example 1
The most canonical example of ADT is probably the inductive list. Such a list is described either as the empty list Nil, or a pair (called a cons in Lisp) of the first element on one side (car in Lisp), and the list of remaining elements on the other side (cdr in Lisp). These will be represented in Lua as { tag = "Nil" } and { tag = "Cons", car, cdr }. The list (1, 2, 3) will be represented as:
{ tag="Cons", 1, 
  { tag="Cons", 2, 
    { tag="Cons", 3, 
      { tag="Nil" } } } }
Example 2
Here is a more programming language oriented example: imagine that we are working on a symbolic calculator. We will have to work this:
  • litteral numbers, represented as integers;
  • symbolic variables, represented by the string of their symbol;
  • formulae, i.e. numbers, variables an/or sub-formulae combined by operators. Such a formula is represented by the symbol of its operator, and the sub-formulae / numbers / variables it operates on.
Most operations, e.g. evaluation or simplification, will do different things depending on whether it is applied on a number, a variable or a formula. Moreover, the meaning of the fields in data structures depends on that data type. The datatype is given by the name put in the tag field. In this example, tag can be one of Number, Var or Formula. The formula eip+1 would be encoded as:
{ tag="Formula", "Addition", 
  { tag="Formula", "Exponent", 
    { tag="Variable", "e" },
    { tag="Formula", "Multiplication", 
      { tag="Variable", "i" },
      { tag="Variable", "pi" } } },
  { tag="Number", 1 } }
Syntax
The simple data above already has a quite ugly representation, so here are the syntax extensions we provide to represent trees in a more readable way:
  • The tag can be put in front of the table, prefixed with a backquote. For instance, { tag = "Cons", car, cdr } can be abbreviated as `Cons{ car, cdr }.
  • If the table contains nothing but a tag, the braces can be omitted. Therefore, { tag = "Nil" } can be abbreviated as `Nil (although `Nil{ } is also legal).
  • If there is only one element in the table besides the tag, and this element is a literal number or a literal string, braces can be omitted. Therefore { tag = "Foo", "Bar" } can be abbreviated as `Foo "bar".
With this syntax sugar, the eip+1 example above would read:
`Formula{ "Addition", 
   `Formula"{ "Exponent", 
      `Variable "e",
      `Formula{ "Multiplication", 
                `Variable "i",
                `Variable "pi" } },
   `Number 1 }
Notice that this is a valid description of some tree structure in metalua, but it's not a representation of metalua code: metalua code is represented as tree structures indeed, but a structure different from this example's one. In other words, this is an ADT, but not an AST.

For the record, the metalua (AST) representation of the code "1+e (i*pi)" is:
`Op{ "add", `Number 1,
     `Op{ "pow", `Id "e", 
          `Op{ "mul", `Id "i", `Id "pi" } } }
After reading more about AST definition and manipulation tools, you'll hopefully be convinced that the latter representation is more powerful.

1.3.2  Abstract Syntax Trees (AST)

An AST is an Abstract Syntax Tree, a data representation of source code suitable for easy manipulation. AST are just a particular usage of ADT, and we will represent them with the ADT syntax described above.

Example
this is the tree representing the source code print(foo, "bar"):

`Call{ `Id "print", `Id "foo", `String "bar" }

Metalua tries, as much as possible, to shield users from direct AST manipulation, and a thorough knowledge of them is generally not needed. Metaprogrammers should know their general form, but it is reasonnable to rely on a cheat-sheet to remember the exact details of AST structures. Such a summary is provided in appendix of this tutorial, as a reference when dealing with them.

In the rest of this section, we will present the translation from Lua source to their corresponding AST.

1.3.3  AST Lua source translation

This subsection explains how to translate a piece of lua source code into the corresponding AST, and conversely. Most of time, users will rely on a mechanism called quasi-quotes to produce the AST they will work with, but it is sometimes necessary to directly deal with AST, and therefore to have at least a superficial knowledge of their structure.

Expressions

The expressions are pieces of Lua code which can be evaluated to give a value. This includes constants, variable identifiers, table constructors, expressions based on unary or binary operators, function definitions, function calls, method invocations, and index selection from a table.

Expressions should not be confused with statements: an expression has a value with can be returned through evaluation, whereas statements just execute themselves and change the computer state (mainly memory and IO). For instance, 2+2 is an expression which evaluates to 4, but four=2+2 is a statement, which sets the value of variable four but has no value itself.

Number constants
A number is represented by an AST with tag Number and the number value as its sole child. For instance, 6 is represented by `Number 610.

String constants
A string is represented by an AST with tag String and the string as its sole child. For instance, "foobar" is represented by:
`String "foobar".

Variable names
A variable identifier is represented by an AST with tag Id and the number value as its sole child. For instance, variable foobar is represented by `Id "foobar".

Other atomic values
Here are the translations of other keyword-based atomic values:
  • nil is encoded as `Nil11;
  • false is encoded as `False;
  • true is encoded as `True;
  • ... is encoded as `Dots.
Table constructors
A table constructor is encoded as:

`Table{ ( `Pair{ expr expr } | expr )* }

This is a list, tagged with Table, whose elements are either:
  • the AST of an expression, for array-part entries without an explicit associated key;
  • a pair of expression AST, tagged with Pair: the first expression AST represents a key, and the second represents the value associated to this key.
Examples
  • The empty table { } is represented as `Table{ };

  • {1, 2, "a"} is represented as:
    `Table{ `Number 1, `Number 2, `String "a" };

  • {x=1, y=2} is syntax sugar for {["x"]=1, ["y"]=2}, and is represented by `Table{ `Pair{ `String "x", `Number 1 }, `Pair{ `String "y", `Number 2} };

  • indexed and non-indexed entries can be mixed: { 1, [100]="foo", 3} is represented as `Table{ `Number 1, `Pair{ `Number 100, `String "foo"}, `Number 3 };
Binary Operators
Binary operations are represented by `Op{ operator, left, right}, where operator is a the operator's name as one of the strings below, left is the AST of the left operand, and right the AST of the right operand.

The following table associates a Lua operator to its AST name:
Op. AST Op. AST Op. AST Op. AST
+ "add" - "sub" * "mul" / "div"
% "mod" ^ "pow" .. "concat" == "eq"
< "lt" <= "le" and "and" or "or"

Operator names are the sames as the corresponding Lua metatable entry, without the prefix "_ _". There are no operator for operators ~=, >= and >: they can be simulated by swapping the arguments of <= and <, or adding a not to operator ==.

Examples
  • 2+2 is represented as `Op{ 'add', `Number 2, `Number 2 };
  • 1+2*3 is represented as:
    `Op{ 'add', `Number 1, 
         `Op{ 'mul', `Number 2, `Number 3 } }
    
  • (1+2)*3 is represented as:
    `Op{ 'mul, `Op{ 'add', `Number 1, `Number 2 },
         `Number 3 } }
    
    `Op{ 'mul', `Op{ 'add', `Number 1, `Number 2 }, `Number 3 }
  • x>=1 and x<42 is represented as:
    `Op{ 'and', `Op{ 'le', `Number  1, `Id "x" },
                `Op{ 'lt', `Id "x", `Number 42 } }
    
    
Unary Operators
Unary operations are similar to binary operators, except that they only take the AST of one subexression. The following table associates a Lua unary operator to its AST:
Op. AST Op. AST Op. AST
- "unm" # "len" not "not"

Examples
  • -x is represented as `Op{ 'unm', `Id "x" };
  • -(1+2) is represented as:
    `Op{ 'unm', `Op{ 'add', `Number 1, `Number 2 } }
  • #x is represented as `Op{ 'len', `Id "x" }
Indexed access
They are represented by an AST with tag Index, the table's AST as first child, and the key's AST as second child.

Examples
  • x[3] is represented as `Index{ `Id "x", `Number 3 };
  • x[3][5] is represented as:
    `Index{ `Index{ `Id "x", `Number 3 }, `Number 5 }
  • x.y is syntax sugar for x["y"], and is represented as:
    `Index{ `Id "x", `String "y" }
Notice that index AST can also appear as left-hand side of assignments, as shall be shown in the subsection dedicated to statements.

Function call
Function call AST have the tag Call, the called function's AST as first child, and its arguments as remaining children.

Examples
  • f() is represented as `Call{ `Id "f" };
  • f(x, 1) is represented as `Call{ `Id "f", `Id "x", `Number 1 };
  • f(x, ...) is represented as `Call{ `Id "f", `Id "x", `Dots }.
Notice that function calls can be used as expressions, but also as statements.

Method invocation
Method invocation AST have the tag Invoke, the object's AST as first child, the string name of the method as a second child, and the arguments as remaining children.

Examples
  • o:f() is represented as `Invoke{ `Id "o", String "f" };
  • o:f(x, 1) is represented as:
    `Invoke{ `Id "o", `String "f", `Id "x", `Number 1 };
  • o:f(x, ...) is represented as:
    `Invoke{ `Id "o", `String "f", `Id "x", `Dots };
Notice that method invocations can be used as expressions, but also as statements. Notice also that ``function o:m (x) return x end'' is not a method invocation, but syntax sugar for statement ``o["f"] = function (self, x) return x end''. See the paragraph about assignment in statements subsection for its AST representation.

Function definition
A function definition consists of a list of parameters and a block of statements. The parameter list, which can be empty, contains only variable names, represented by their `Id{...} AST, except for the last element of the list, which can also be a dots AST `Dots (to indicate that the function is a vararg function).

The block is a list of statement AST, optionnaly terminated with a `Return{...} or `Break pseudo-statement. These pseudo-statements will be described in the statements subsection.

FIXME: finally, return and break will be considered as regular statements: it's useful for many macros.

The function definition is encoded as `Function{ parameters block }

Examples
  • function (x) return x end is represented as:
    `Function{ { `Id x } { `Return{ `Id "x" } } };

  • function (x, y) foo(x); bar(y) end is represented as:
    `Function{ { `Id x, `Id y } 
               { `Call{ `Id "foo", `Id "x" },
                 `Call{ `Id "bar", `Id "y" } } }
    
  • function (fmt, ...) print (string.format (fmt, ...)) end is represented as:
    `Function{ { `Id "fmt", `Dots } 
               { `Call{ `Id "print",
                        `Call{ `Index{ `Id "string",
                                       `String "format" }, 
                               `Id "fmt", 
                               `Dots } } } }
    


  • function f (x) return x end is not an expression, but a statement: it is actually syntax sugar for the assignment f = function (x) return x end, and as such, is represented as:
    `Let{ { `Id "f" }, 
          { `Function{ {`Id 'x'} {`Return{`Id 'x'} } } } }
    
    (see assignment in the statements subsection for more details);
Parentheses
In Lua, parentheses are sometimes semantically meaningful: when the parenthesised expression returns multiple values, putting it between parentheses foreces it to return only one value. For instance, ``local function f() return 1, 2, 3 end; return { f() }'' will return ``{1, 2, 3}'', whereas ``local function f() return 1, 2, 3 end; return { (f()) }'' will return ``{ 1 }'' (notice the parentheses around the function call).

Parentheses are represented in the AST as a node ```Paren{ }''. The second example above has the following AST:
{ `Localrec{ { `Id "f" },  
             { `Function{ { },
                          `Return{ `Number 1,
                                   `Number 2,
                                   `Number 3 } } } },
  `Return{ `Table{ `Paren{ `Call{ `Id "f" } } } } }

Statements

Statements are instructions which modify the state of the computer. There are simple statement, such as variable assignment, local variable declaration, function calls and method invocation; there are also control structure statements, which take simpler statement and modify their action: these are if/then/else, repeat/until, while/do/end, for/do/end and do/end statements.

Assignment
Variable assignment a, b, c = foo, bar is represetned by AST `Set{ lhs, rhs }, with lhs being a list of variables or table indexes, and rhs the list of values assigned to them.

Examples
  • x[1]=2 is represented as:
    `Set{ { `Index{ `Id "x", `Number 1 } }, { `Number 2 } };

  • a, b = 1, 2 is represented as:
    `Set{ { `Id "a",`Id "b" }, { `Number 1, `Number 2 } };

  • a = 1, 2, 3 is represented as:
    `Set{ { `Id "a" }, { `Number 1, `Number 2, `Number 3 } };

  • function f(x) return x end is syntax sugar for:
    f = function (x) return x end. As such, is represented as:
    `Set{ { `Id "f" }, 
          { `Function{ {`Id 'x'}  {`Return{ `Id "x" } } } } }
    


  • function o:m(x) return x end is syntax sugar for:
    o["f"] = function (self, x) return x end, and as such, is represented as:
    `Set{ { `Index{ `Id "o", `String "f" } }, 
          { `Function{ { `Id "self, "`Id x } 
                       { `Return{ `Id "x" } } } } }
    
Local declaration
Local declaration local a, b, c = foo, bar works just as assignment, except that the tag is Local, and it is allowed to have an empty list as values.

Examples
  • local x=2 is represented as:
    `Local{ { `Id "x" }, { `Number 2 } };

  • local a, b is represented as:
    `Local{ { `Id "a",`Id "b" }, { } };
Recursive local declaration
In a local declaration, the scope of local variables starts after the statement. Therefore, it is not possible to refer to a variable inside the value it receives, and ``local function f(x) f(x) end'' is not equivalent to ``local f = function (x) f(x) end'': in the latter, the f call inside the function definition probably refers to some global variable, whereas in the former, it refers to the local variable currently being defined (f this therefore a forever looping function).

To handle this, the AST syntax defines a special `Localrec local declaration statement, in which the variables enter in scope before their content is evaluated. Therefore, the AST corresponding to local function f(x) f(x) end is:
`Localrec{ { `Id "f" }, 
           { `Function{ { `Id x } 
                        { `Call{ `Id "f", `Id "x" } } } } }
Caveat: In the current implementation, both variable names list and values list have to be of lenght 1. This is enough to represent local function ... end, but should be generalized in the final version of Metalua.

Function calls and method invocations
They are represented the same way as their expression counterparts, see the subsection above for details.

Blocks and pseudo-statements
Control statements generally take a block of instructions as parameters, e.g. as the body of a for loop. Such statement blocks are represented as the list of the instructions they contain. As a list, the block itself has no tag field.

Example
foo(x); bar(y); return x,y is represented as:
{ `Call{ `Id "foo", `Id "x" },
  `Call{ `Id "bar", `Id "y" },
  `Return{ `Id "x", `Id "y" } }
Do statement
These represent do ... end statements, which limit local variables scope. They are represented as blocks with a Do tag.

Example
do foo(x); bar(y); return x,y end is represented as:
`Do{ `Call{ `Id "foo", `Id "x" },
     `Call{ `Id "bar", `Id "y" },
     `Return{ `Id "x", `Id "y" } }
While statement
while <foo> do <bar1>; <bar2>; ... end is represented as
`While{ <foo>, { <bar1>, <bar2>, ... } }.

Repeat statement
repeat <bar1>; <bar2>; ... until <foo> is represented as
`Repeat{ { <bar1>, <bar2>, ... }, <foo> }.

For statements
for x=<first>,<last>,<step> do <foo>; <bar>; ... end is represented as `Fornum{ `Id "x", <first>, <last>, <step>, { <foo>, <bar>, ... } }.

The step parameter can be omitted if equal to 1.
for x1, x2... in e1, e2... do
  <foo>;
  <bar>;
  ...
end
isrepresented as:
`Forin{ {`Id "x1",`Id "x2",...}, { <e1>, <e2>,... } { <foo>, <bar>, ... } }.

If statements
``If'' statements are composed of a series of (condition, block) pairs, and optionnaly of a last default ``else'' block. The conditions and blocks are simply listed in an `If{ ... } ADT. Notice that an ``if'' statement without a final ``else'' block will have an even number of children, whereas a statement with a final ``else'' block will have an odd number of children.

Examples
  • if <foo> then <bar>; <baz> end is represented as:
    `If{ <foo>, { <bar>, <baz> } };

  • if <foo> then <bar1> else <bar2>; <baz2> end is represented as: `If{ <foo>, { <bar1> }, { <bar2>, <baz2> } };

  • if <foo1> then <bar1>; <baz1> elseif <foo2> then <bar2>; <baz2> end
    is represented as:
    `If{ <foo1>, { <bar1>, <baz1> }, <foo2>,{ <bar2>, <baz2> } };

  • if     <foo1> then <bar1>; <baz1> 
    elseif <foo2> then <bar2>; <baz2> 
    else               <bar3>; <baz3> end+ 
    
    is represented as:
    `If{ <foo1>, { <bar1>, <baz1> }, 
         <foo2>, { <bar2>, <baz2> },
                 { <bar3>, <baz3> } }
    
Breaks and returns
Breaks are represented by the childless `Break AST. Returns are retpresented by the (possibly empty) list of returned values.

Example
return 1, 2, 3 is represented as:
`Return{ `Number 1, `Number 2, `Number 3 }.

Extensions with no syntax

A couple of AST nodes do not exist in Lua, nor in Metalua native syntax, but are provided because they are particularly useful for writing macros. They are presented here.

Goto and Labels
Labels can be string AST, identifier AST, or simply string; they indicate a target for goto statements. A very common idiom is ``local x = mlp.gensym(); ... `Label{ x } ''. You just jump to that label with ```Goto{ x } ''.

Identifiers, string AST or plain strings are equivalent: ```Label{ `Id "foo"}'' is synonymous for ```Label{ `String "foo"}'' and ```Label "foo"''. The same equivalences apply for gotos, of course.

Labels are local to a function; you can safely jump out of a block, but if you jump inside a block, you're likely to get into unspecified trouble, as local variables will be in a random state.

Statements in expressions
A common need when writing a macro is to insert a statement in the middle of an expression. It can be done by using an anonymous function closure, but that would be expensive, so Metalua offers a better solution. The `Stat node evaluates a statement block in the middle of an expression, then returns an arbitrary expression as its result. Notice one important point: the expression is evaluated in the block's context, i.e. if there are some local variables declared in the block, the expression can use them.

For instance, `Stat{ +{local x=3}, +{x}} evaluates to 3.

1.4  Splicing and quoting

As the previous section shows, AST are not extremely readable, and as promized, Metalua offer a way to avoid dealing with them directly. Well, rarely dealing with them anyway.

In this section, we will deal a lot with +{...} and -{...}; the only (but real) difficulty is not to get lost between meta-levels, i.e. not getting confused between a piece of code, the AST representing that piece of code, some code returning an AST that shall be executed during compilation, etc.

1.4.1  Quasi-quoting

Quoting an expression is extremely easy: just put it between quasi-quotes. For instance, to get the AST representing 2+2, just type +{expr: 2+2}. Actually, since most of quotes are actually expression quotes, you are even allowed to skip the ``expr:'' part: +{2+2} works just as well.

If you want to quote a statement, just substitute ``expr:'' with ``stat:'': +{stat: if x>3 then foo(bar) end}.

Finally, you might wish to quote a block of code. As you can guess, just type:

+{block: y = 7; x = y+1; if x>3 then foo(bar) end}.

A block is just a list of statements. That means that +{block: x=1} is the same as { +{stat: x=1} } (a single-element list of statements).

However, quoting alone is not really useful: if it's just about pasting pieces of code verbatim, there is little point in meta-programming. We want to be able to poke ``holes'' in quasi-quotes (hence the ``quasi''), and fill them with bits of AST comming from outside. Such holes are marked with a -{...} construct, called a splice, inside the quote. For instance, the following piece of Metalua will put the AST of 2+2 in variable X, then insert it in the AST an assignement in Y:
X = +{ 2 + 2 }
Y = +{ four = -{ X } }
After this, Y will contain the AST representing four = 2+2. Because of this, a splice inside a quasi-quote is often called an anti-quote (as we shall see, splices also make sense, although a different one, outside quotes).

Of course, quotes and antiquotes can be mixed with explicit AST. The following lines all put the same value in Y, although often in a contrived way:

-- As a single quote:
Y = +{stat: four = 2+2 }
-- Without any quote, directly as an AST:
Y = `Let{ { `Id "four" }, { `Op{ `Add, `Number 2, `Number 2 } } }
-- Various mixes of direct AST and quotes:
X = +{ 2+2 };                          Y = +{stat: four = -{ X } }
X = `Op{ `Add, +{2}, +{2} };           Y = +{stat: four = -{ X } }
X = `Op{ `Add, `Number 2, `Number 2 }; Y = +{stat: four = -{ X } }
Y = +{stat: four = -{ `Op{ `Add, `Number 2, `Number 2 } } }
Y = +{stat: four = -{ +{ 2+2 } } }
Y = `Let{ { `Id "four" }, { +{ 2+2 } } }
-- Nested quotes and splices cancel each other:
Y = +{stat: four = -{ +{ -{ +{ -{ +{ -{ +{ 2+2 } } } } } } } } }
The content of an anti-quote is expected to be an expression by default. However, it is legal to put a statement or a block of statements in it, provided that it returns an AST through a return statement. To do this, just add a ``block:'' (or ``stat:'') markup at the beginning of the antiquote. The following line is (also) equivalent to the previous ones:
Y = +{stat: four = -{ block: 
                      local two=`Number 2
                      return `Op{ 'add', two, two } } }
Notice that in a block, where a statement is expected, a sub-block is also be accepted, and is simply combined with the upper-level one. Unlike `Do{ } statements, it doesn't create its own scope. For instance, you can write -{block: f(); g()} instead of -{stat:f()}; -{stat:g()}.

1.4.2  Splicing

Splicing is used in two, rather different contexts. First, as seen above, it's used to poke holes into quotations. But it is also used to execute code at compile time.

As can be expected from their syntaxes, -{...} undoes what +{...} does: quotes change a piece of code into the AST representing it, and splices cancel the quotation of a piece of code, including it directly in the AST (that piece of code therefore has to either be an AST, or evaluate to an AST. If not, the result of the surrounding quote won't be an AST).

But what happens when a splice is put outside of any quote? There is no explicit quotation to cancel, but actually, there is an hidden AST generation. The process of compiling a Metalua source file consists in the following steps:

                  ______               ________
+-----------+    /      \    +---+    /        \    +--------+
|SOURCE FILE|-->< Parser >-->|AST|-->< Compiler >-->|BYTECODE|  
+-----------+    \______/    +---+    \________/    +--------+

So in reality, the source file is translated into an AST; when a splice is found, instead of just turning that AST into bytecode, we will execute the corresponding program, and put the AST it must return in the source code. This computed AST is the one which will be turned into bytecode in the resulting program. Of course, that means locally compiling the piece of code in the splice, in order to execute it:

                                                     +--------+
                  ______               ________   +->|BYTECODE|  
+-----------+    /      \    +---+    /        \  |  +--------+
|SOURCE FILE|-->< Parser >-->|AST|-->< Compiler >-+
+-----------+    \______/    +-^-+    \________/  |  +--------+
                              /|\      ________   +->|BYTECODE|  
                               |      /        \     +---+----+
                               +-----<   Eval   ><-------+
                                      \________/
As an example, consider the following source code, its compilation and its execution:

 
fabien@macfabien$ cat sample.mlua
-{block: print "META HELLO"
         return +{ print "GENERATED HELLO" } }
print "NORMAL HELLO"

fabien@macfabien$ metalua -v sample.mlua -o sample.luac
[ Param "sample.mlua" considered as a source file ]
[ Compiling `File "sample.mlua" ]
META HELLO
[ Saving to file "sample.luac" ]
[ Done ]
fabien@macfabien$ lua sample.luac
GENERATED HELLO
NORMAL HELLO
fabien@macfabien$ _
 
Thanks to the print statement in the splice, we see that the code it contains is actually executed during evaluation. More in details, what happens is that:
  • The code inside the splice is parsed and compiled separately;
  • it is executed: the call to print "META HELLO" is performed, and the AST representing
    print "GENERATED HELLO" is generated and returned;
  • in the AST generated from the source code, the splice is replaced by the AST representing
    print "GENERATED HELLO". Therefore, what is passed to the compiler is the AST representing
    print "GENERATED HELLO"; print "NORMAL HELLO".
Take time to read, re-read, play and re-play with the manipulation described above: understanding the transitions between meta-levels is the essence of meta-programming, and you must be comfortable with such transitions in order to make the best use of Metalua.

Notice that it is admissible, for a splice outside a quote, not to return anything. This allows to execute code at compile time without adding anything in the AST, typically to load syntax extensions. For instance, this source will just print "META HELLO" at compile time, and "NORMAL HELLO" at runtime: -{print "META HELLO"}; print "NORMAL HELLO"

1.4.3  A couple of simple concrete examples

ternary choice operator
Let's build something more useful. As an example, we will build here a ternary choice operator, equivalent to the _ ? _ : _ from C. Here, we will not deal yet with syntax sugar: our operator will have to be put inside splices. Extending the syntax will be dealt with in the next section, and then, we will coat it with a sweet syntax.

Here is the problem: in Lua, choices are made by using if _ then _ else _ end statements. It is a statement, not an expression, which means that we can't use it in, for instance:
local hi = if lang=="fr" then "Bonjour" 
           else "hello" end -- illegal!
This won't compile. So, how to turn the ``if'' statement into an expression? The simplest solution is to put it inside a function definition. Then, to actually execute it, we need to evaluate that function. Which means that our pseudo-code local hi = (lang == "fr" ? "Bonjour" : "Hello") will actually be compiled into:
local hi = 
  (function ()
     if lang == "fr" then return "Bonjour"
                     else return "Hello" end end) ()
We are going to define a function building the AST above, filling holes with parameters. Then we are going to use it in the actual code, through splices.

 
fabien@macfabien$ cat sample.lua
-{stat:
  -- Declaring the [ternary] metafunction. As a 
  -- metafunction, it only exists within -{...}, 
  -- i.e. not in the program itself.
  function ternary (cond, b1, b2)
     return +{ (function() 
                    if -{cond} then
                       return -{b1} 
                    else
                       return -{b2}
                    end
                 end)() }
  end }

lang = "en"
hi = -{ ternary (+{lang=="fr"}, +{"Bonjour"}, +{"Hello"}) }
print (hi)

lang = "fr"
hi = -{ ternary (+{lang=="fr"}, +{"Bonjour"}, +{"Hello"}) }
print (hi)

fabien@macfabien$ mlc sample.lua
Compiling sample.lua...
...Wrote sample.luac
fabien@macfabien$ lua sample.luac
Hello
Bonjour
fabien@macfabien$ _
 
Incrementation operator
Now, we will write another simple example, which doesn't use quasi-quotes, just to show that we can. Another operator that C developpers might be missing with Lua is the ++ operator. As with the ternary operator, we won't show yet how to put the syntax sugar coating around it, just how to build the backend functionnality.

Here, the transformation is really trivial: we want to encode x++ as x=x+1. We will only deal with ++ as statement, not as an expression. However, ++ as an expression is not much more complicated to do. Hint: use the turn-statement-into-expr trick shown in the previous example. The AST corresponding to x=x+1 is `Let{ { `Id x }, { `Op{ `Add, `Id x, `Number 1 } } }. From here, the code is straightforward:

 
fabien@macfabien$ cat sample.lua
-{stat:
   function plusplus (var) 
      assert (var.tag == "Id")
      return `Let{ { var }, { `Op{ `Add, var, `Number 1 } } }
   end }

x = 1;                  
print ("x = " .. tostring (x))
-{ plusplus ( +{x} ) }; 
print ("Incremented x: x = " .. tostring (x))

fabien@macfabien$ mlc sample.lua
Compiling sample.lua...
...Wrote sample.luac
fabien@macfabien$ lua sample.luac
x = 1
Incremented x: x = 2
fabien@macfabien$ _
 
Now, we just miss a decent syntax around this, and we are set! This is the subject of the next sections: gg is the generic grammar generator, which allows to build and grow parsers. It's used to implement mlp, the Metalua parser, which turns Metalua sources into AST.

Therefore, the informations useful to extend Metalua syntax are:
  • What are the relevant entry points in mlp, the methods which allow syntax extension.
  • How to use these methods: this consists into knowing the classes defined into gg, which offer dynamic extension possibilities.

1
http://www.lua.org
2
http://research.sun.com/self
3
Programming in Lua, 2nd edition.
Published by Lua.org, March 2006
ISBN 85-903798-2-5 Paperback, 328 pages
Distributed by Ingram and Baker & Taylor.
4
http://www.haskell.org/th/
5
http://pauillac.inria.fr/~ddr/camlp5/
6
http://www.cse.ogi.edu/pacsoft/projects/metaml
7
http://convergepl.org
8
http://http://www.livelogix.net/logix/index.html
9
Tables in Lua can be indexed by integers, as regular arrays, or by any other Lua data. Moreover, their internal representation is able to optimize both array-style and hashtable-style usage, and both kinds of keys can be used in the same table. In this manual, I'll refer to the integer-indexed part of a table as its array-part, and the other one as its hash-part.
10
As explained in the section about ADT, `Number 6 is exactly the same as `Number{ 6 }, or plain Lua { tag="Number", 6}
11
which is a short-hand for `Nil{ }, or { tag="Nil" } in plain Lua.

Previous Up Next