Some languages look like the key data structures used to implement them. A common example is lisp—all those parentheses build structures that look like parse trees. Languages like Forth, and like the little language we’ll build here, look like stacks.
“Concatenative” just means putting things together by putting them next to each other. A nice slogan for that might be “composition by juxtaposition”. In our language, for example, the program 1 2 +
means “push a 1 to the stack, push a 2 to the stack, then pop the first two elements off the stack and push their sum on the stack”.
We’ll add debugging output that looks like this:
: [1] 1
: [2] 2, 1
: [+] 3
We put :
in front of each line just to separate this debug output from normal printing output. The thing in square brackets is the current word (or command) that we’ve just added to the stack. On each line, we’ll then write out the stack separated by commas, which is essentially the entire state of the program at that time.
With a language this simple, you really can’t go wrong with your choice of implementation language, but we’ll use Python for extra simplicity. The first thing we need to do is read in each word from standard input. Any connected string of non-whitespace characters should be treated as a word, and we’ll just skip over whitespace.
Simple enough. The next thing we’ll want to do is add an empty stack. In this case, it’s just an empty Python list. We’ll choose to add new words to the beginning of that list. We also want to provide the debugging output we saw above so we can follow the trace of our programs. That can be done with a formatted string that takes the current word and the current stack and prints them out.
import sys
stack = []
for line in sys.stdin:
for word in line.split():
stack.insert(0, word)
word, *rest = stack
# ...
print(f': [{word}] {", ".join(stack)}')
Now we can begin adding commands to our language. This will be a simple match on the current word. Usually, a word’s behavior involves manipulating the stack in some way, and then calling through to some primitive from Python. For print
, we just take the print
word off the stack (since it’s being used), and call Python’s print
to see the first word of the new stack.
We implement addition likewise. Let’s deconstruct the stack by peeling off its first two elements, then add them and put them back on the stack.
Now we can run programs like this!
1 2 3 + + print
The debugging output shows us what our program is really doing at every step. It reads almost like a deductive proof, where each step follows from the previous one.
: [1] 1
: [2] 2, 1
: [3] 3, 2, 1
: [+] 5, 1
: [+] 6
6
: [print] 6
We can also add features that are usually “baked in” to a lexer or parser, but we can do it with only the stack. Let’s add a flag for whether we’re currently in a comment.
Comments will begin with (
. When we see the word (
, we enter “comment mode” by setting the is_comment
flag.
We continue reading in words until we see the “end comment” word )
. Reset the is_comment
flag, and continue on to reading the next word.
We can now correctly read programs like this
1 2 ( 3 + commented out ) + print
Note that while in comment mode, there’s no debug output because the state of the stack is going to remain unchanged throughout the entire comment.
: [1] 1
: [2] 2, 1
: [(] 2, 1
: [+] 3
3
: [print] 3