Some languages look like the key data structures used to implement them. A common example is lisp—all those parentheses build structures that look like parse trees. Languages like Forth, and like the little language we’ll build here, look like stacks.
“Concatenative” just means putting things together by putting them
next to each other. A nice slogan for that might be “composition by
juxtaposition”. In our language, for example, the program
1 2 +
means “push a 1 to the stack, push a 2 to the stack,
then pop the first two elements off the stack and push their sum on the
stack”.
We’ll add debugging output that looks like this:
: [1] 1
: [2] 2, 1
: [+] 3
We put :
in front of each line just to separate this
debug output from normal printing output. The thing in square brackets
is the current word (or command) that we’ve just added to the stack. On
each line, we’ll then write out the stack separated by commas, which is
essentially the entire state of the program at that time.
With a language this simple, you really can’t go wrong with your choice of implementation language, but we’ll use Python for extra simplicity. The first thing we need to do is read in each word from standard input. Any connected string of non-whitespace characters should be treated as a word, and we’ll just skip over whitespace.
import sys
for line in sys.stdin:
for word in line.split():
print(word)
Simple enough. The next thing we’ll want to do is add an empty stack. In this case, it’s just an empty Python list. We’ll choose to add new words to the beginning of that list. We also want to provide the debugging output we saw above so we can follow the trace of our programs. That can be done with a formatted string that takes the current word and the current stack and prints them out.
import sys
= []
stack for line in sys.stdin:
for word in line.split():
0, word)
stack.insert(*rest = stack
word, # ...
print(f': [{word}] {", ".join(stack)}')
Now we can begin adding commands to our language. This will be a
simple match on the current word. Usually, a word’s behavior involves
manipulating the stack in some way, and then calling through to some
primitive from Python. For print
, we just take the
print
word off the stack (since it’s being used), and call
Python’s print
to see the first word of the new stack.
if word == 'print':
= rest
stack print(stack[0])
We implement addition likewise. Let’s deconstruct the stack by peeling off its first two elements, then add them and put them back on the stack.
elif word == '+':
*rest = rest
a, b, = [int(a) + int(b)] + rest stack
Now we can run programs like this!
1 2 3 + + print
The debugging output shows us what our program is really doing at every step. It reads almost like a deductive proof, where each step follows from the previous one.
: [1] 1
: [2] 2, 1
: [3] 3, 2, 1
: [+] 5, 1
: [+] 6
6
: [print] 6
We can also add features that are usually “baked in” to a lexer or parser, but we can do it with only the stack. Let’s add a flag for whether we’re currently in a comment.
= False is_comment
Comments will begin with (
. When we see the word
(
, we enter “comment mode” by setting the
is_comment
flag.
elif word == '(':
= True
is_comment = rest stack
We continue reading in words until we see the “end comment” word
)
. Reset the is_comment
flag, and continue on
to reading the next word.
if is_comment:
if word == ')':
= False
is_comment = rest
stack continue
We can now correctly read programs like this
1 2 ( 3 + commented out ) + print
Note that while in comment mode, there’s no debug output because the state of the stack is going to remain unchanged throughout the entire comment.
: [1] 1
: [2] 2, 1
: [(] 2, 1
: [+] 3
3
: [print] 3