mitchell vitez

dark mode

blog about music art media

resume email github

Building an AST for Lua in Haskell

Lua is a fun language to write a compiler for, because it’s relatively small but still very full-powered. The entire Lua grammar fits in something like 50 lines.1

Haskell is a fun language to write compilers in, because it has great support for handling abstract syntax trees, parsing, and the like. It’s a match made in heaven!

Here’s an AST for Lua:2

block: { stat* }

stat:
  `Do{ stat* }
| `Set{ {lhs+} {expr+} }
| `While{ expr block }
| `Repeat{ block expr }
| `If{ (expr block)+ block? }
| `Fornum{ ident expr expr expr? block }
| `Forin{ {ident+} {expr+} block }        
| `Local{ {ident+} {expr+}? }             
| `Localrec{ ident expr }                 
| `Goto{ <string> }
| `Label{ <string> }
| `Return{ <expr*> }
| `Break
| apply

expr:
  `Nil  |  `Dots  |  `True  |  `False
| `Number{ <number> }
| `String{ <string> }
| `Function{ { ident* `Dots? } block }
| `Table{ ( `Pair{ expr expr } | expr )* }
| `Op{ opid expr expr? }
| `Paren{ expr }
| apply
| lhs

apply:
  `Call{ expr expr* }
| `Invoke{ expr `String{ <string> } expr* }

ident: `Id{ <string> }

lhs: ident | `Index{ expr expr }

opid: 'add'   | 'sub'   | 'mul'   | 'div'
    | 'mod'   | 'pow'   | 'concat'| 'eq'
    | 'lt'    | 'le'    | 'and'   | 'or'
    | 'not'   | 'len'

This already looks quite a bit like a Haskell ADT. We can directly translate | to |, for example, and things like opid: just become data OpId =

Recall from your extensive experience writing and working with grammars (or from ever having used a regex) that * means “0 or more”, + means “1 or more”, and ? means “0 or 1”. This means that we can translate ident* into [Ident], lhs+ into NonEmpty Lhs, and expr? into Maybe Expr.

Just a few translations left. We separate out Dots so we can use it as both a type and a constructor. Table has a mini sum type inside it (Pair{ expr expr } | expr )), so we’ll separate that out as well. Finally, because each branch of an If before the else has both an expr and a block, we’ll just group those together in a tuple (Expr, Block). We’ll add a Show instance to everything so it’s a little easier to work with.

Here’s our final AST ADT:

module LuaAST where

import Data.List.NonEmpty

data Block
  = Block [Statement]
  deriving Show

data Statement
  = Do [Statement]
  | Set (NonEmpty Lhs) (NonEmpty Expr)
  | While Expr Block
  | Repeat Block Expr
  | If (NonEmpty (Expr, Block)) (Maybe Block)
  | Fornum Ident Expr Expr (Maybe Expr) Block
  | Forin (NonEmpty Ident) (NonEmpty Expr) Block
  | Local (NonEmpty Ident) (Maybe (NonEmpty Expr))
  | Localrec Ident Expr
  | Goto String
  | Label String
  | Return [Expr]
  | Break
  | StatementApply Apply
  deriving Show

data Dots = Dots
  deriving Show

data TableElement
  = Pair Expr Expr
  | TableExpr Expr
  deriving Show

data Expr
  = Nil
  | ExprDots Dots
  | True
  | False
  | Number Double
  | String
  | Function [Ident] (Maybe Dots) Block
  | Table [TableElement]
  | Op OpId Expr (Maybe Expr)
  | Paren Expr
  | ExprApply Apply
  | ExprLhs Lhs
  deriving Show

data Apply
  = Call Expr [Expr]
  | Invoke Expr String [Expr]
  deriving Show

data Ident
  = Id String
  deriving Show

data Lhs
  = LhsIdent Ident
  | Index Expr Expr
  deriving Show

data OpId
  = Add
  | Sub
  | Mul
  | Div
  | Mod
  | Pow
  | Concat
  | Eq
  | Lt
  | Le
  | And
  | Or
  | Not
  | Len
  deriving Show

This is a full Lua abstract syntax tree representable in Haskell code! That wasn’t so bad, was it?

For bonus points, try representing the following programs using this AST. An answer is provided for the first one to get you started.

m, a = 100, 9.81
f = m * a
print("Hello world!")
function factorial(n)
    if (n == 0) then
        return 1
    else
        return n * factorial(n - 1)
    end
end

print(factorial(5))

Answer:

Block 
  [ Set
    (LhsIdent (Id "m") :| [LhsIdent (Id "a")] )
    (Number 100 :| [Number 9.81] )
  , Set 
    (LhsIdent (Id "f") :| [])
    (Op Mul
      (ExprLhs (LhsIdent (Id "m")))
      (Just (ExprLhs (LhsIdent (Id "a"))))
    :| [])
  ]