mitchell vitez blog music art media dark mode

Concatenative Programming

Some languages look like the key data structures used to implement them. A common example is lisp—all those parentheses build structures that look like parse trees. Languages like Forth, and like the little language we’ll build here, look like stacks.

“Concatenative” just means putting things together by putting them next to each other. A nice slogan for that might be “composition by juxtaposition”. In our language, for example, the program 1 2 + means “push a 1 to the stack, push a 2 to the stack, then pop the first two elements off the stack and push their sum on the stack”.

We’ll add debugging output that looks like this:

: [1] 1
: [2] 2, 1
: [+] 3

We put : in front of each line just to separate this debug output from normal printing output. The thing in square brackets is the current word (or command) that we’ve just added to the stack. On each line, we’ll then write out the stack separated by commas, which is essentially the entire state of the program at that time.

Input and Debugging Output

With a language this simple, you really can’t go wrong with your choice of implementation language, but we’ll use Python for extra simplicity. The first thing we need to do is read in each word from standard input. Any connected string of non-whitespace characters should be treated as a word, and we’ll just skip over whitespace.

import sys

for line in sys.stdin:
    for word in line.split():

Simple enough. The next thing we’ll want to do is add an empty stack. In this case, it’s just an empty Python list. We’ll choose to add new words to the beginning of that list. We also want to provide the debugging output we saw above so we can follow the trace of our programs. That can be done with a formatted string that takes the current word and the current stack and prints them out.

import sys

stack = []
for line in sys.stdin:
    for word in line.split():
        stack.insert(0, word)
        word, *rest = stack
        # ...
        print(f': [{word}] {", ".join(stack)}')


Now we can begin adding commands to our language. This will be a simple match on the current word. Usually, a word’s behavior involves manipulating the stack in some way, and then calling through to some primitive from Python. For print, we just take the print word off the stack (since it’s being used), and call Python’s print to see the first word of the new stack.

        if word == 'print':
            stack = rest

We implement addition likewise. Let’s deconstruct the stack by peeling off its first two elements, then add them and put them back on the stack.

        elif word == '+':
            a, b, *rest = rest
            stack = [int(a) + int(b)] + rest

Now we can run programs like this!

1 2 3 + + print

The debugging output shows us what our program is really doing at every step. It reads almost like a deductive proof, where each step follows from the previous one.

: [1] 1
: [2] 2, 1
: [3] 3, 2, 1
: [+] 5, 1
: [+] 6
: [print] 6


We can also add features that are usually “baked in” to a lexer or parser, but we can do it with only the stack. Let’s add a flag for whether we’re currently in a comment.

is_comment = False

Comments will begin with (. When we see the word (, we enter “comment mode” by setting the is_comment flag.

        elif word == '(':
            is_comment = True
            stack = rest

We continue reading in words until we see the “end comment” word ). Reset the is_comment flag, and continue on to reading the next word.

        if is_comment:
            if word == ')':
                is_comment = False
            stack = rest

We can now correctly read programs like this

1 2 ( 3 + commented out ) + print

Note that while in comment mode, there’s no debug output because the state of the stack is going to remain unchanged throughout the entire comment.

: [1] 1
: [2] 2, 1
: [(] 2, 1
: [+] 3
: [print] 3