Lua is a fun language to write a compiler for, because it’s relatively small but still very full-powered. The entire Lua grammar fits in something like 50 lines.1
Haskell is a fun language to write compilers in, because it has great support for handling abstract syntax trees, parsing, and the like. It’s a match made in heaven!
Here’s an AST for Lua:2
: { stat* }
block
:
stat`Do{ stat* }
| `Set{ {lhs+} {expr+} }
| `While{ expr block }
| `Repeat{ block expr }
| `If{ (expr block)+ block? }
| `Fornum{ ident expr expr expr? block }
| `Forin{ {ident+} {expr+} block }
| `Local{ {ident+} {expr+}? }
| `Localrec{ ident expr }
| `Goto{ <string> }
| `Label{ <string> }
| `Return{ <expr*> }
| `Break
| apply
expr:
`Nil | `Dots | `True | `False
| `Number{ <number> }
| `String{ <string> }
| `Function{ { ident* `Dots? } block }
| `Table{ ( `Pair{ expr expr } | expr )* }
| `Op{ opid expr expr? }
| `Paren{ expr }
| apply
| lhs
apply:
`Call{ expr expr* }
| `Invoke{ expr `String{ <string> } expr* }
: `Id{ <string> }
ident
lhs: ident | `Index{ expr expr }
: 'add' | 'sub' | 'mul' | 'div'
opid| 'mod' | 'pow' | 'concat'| 'eq'
| 'lt' | 'le' | 'and' | 'or'
| 'not' | 'len'
This already looks quite a bit like a Haskell ADT. We can directly
translate |
to |
, for example, and things like
opid:
just become data OpId =
Recall from your extensive experience writing and working with
grammars (or from ever having used a regex) that *
means “0
or more”, +
means “1 or more”, and ?
means “0
or 1”. This means that we can translate ident*
into
[Ident]
, lhs+
into NonEmpty Lhs
,
and expr?
into Maybe Expr
.
Just a few translations left. We separate out Dots
so we
can use it as both a type and a constructor. Table
has a
mini sum type inside it (Pair{ expr expr } | expr )
), so
we’ll separate that out as well. Finally, because each branch of an
If
before the else
has both an
expr
and a block
, we’ll just group those
together in a tuple (Expr, Block)
. We’ll add a
Show
instance to everything so it’s a little easier to work
with.
Here’s our final AST ADT:
module LuaAST where
import Data.List.NonEmpty
data Block
= Block [Statement]
deriving Show
data Statement
= Do [Statement]
| Set (NonEmpty Lhs) (NonEmpty Expr)
| While Expr Block
| Repeat Block Expr
| If (NonEmpty (Expr, Block)) (Maybe Block)
| Fornum Ident Expr Expr (Maybe Expr) Block
| Forin (NonEmpty Ident) (NonEmpty Expr) Block
| Local (NonEmpty Ident) (Maybe (NonEmpty Expr))
| Localrec Ident Expr
| Goto String
| Label String
| Return [Expr]
| Break
| StatementApply Apply
deriving Show
data Dots = Dots
deriving Show
data TableElement
= Pair Expr Expr
| TableExpr Expr
deriving Show
data Expr
= Nil
| ExprDots Dots
| True
| False
| Number Double
| String
| Function [Ident] (Maybe Dots) Block
| Table [TableElement]
| Op OpId Expr (Maybe Expr)
| Paren Expr
| ExprApply Apply
| ExprLhs Lhs
deriving Show
data Apply
= Call Expr [Expr]
| Invoke Expr String [Expr]
deriving Show
data Ident
= Id String
deriving Show
data Lhs
= LhsIdent Ident
| Index Expr Expr
deriving Show
data OpId
= Add
| Sub
| Mul
| Div
| Mod
| Pow
| Concat
| Eq
| Lt
| Le
| And
| Or
| Not
| Len
deriving Show
This is a full Lua abstract syntax tree representable in Haskell code! That wasn’t so bad, was it?
For bonus points, try representing the following programs using this AST. An answer is provided for the first one to get you started.
m, a = 100, 9.81
f = m * a
print("Hello world!")
function factorial(n)
if (n == 0) then
return 1
else
return n * factorial(n - 1)
end
end
print(factorial(5))
Answer:
Block
[ Set
(LhsIdent (Id "m") :| [LhsIdent (Id "a")] )
(Number 100 :| [Number 9.81] )
, Set
(LhsIdent (Id "f") :| [])
(Op Mul
(ExprLhs (LhsIdent (Id "m")))
(Just (ExprLhs (LhsIdent (Id "a"))))
:| [])
]