Sum Types Are Better Than Others

I recently came across something like the following Python code in the wild:

def check_flag(flag):
    if flag == "BEGIN":
        ...
    elif flag == "MIDDLE":
        ...
    elif flag == "END":
        ...
    else:
        raise ValueError('Invalid flag')

There are a few immediate ways to improve this. First, we might consider using constants BEGIN, MIDDLE, and END instead of hardcoded strings. While that doesn’t save us much in this part of the code, it does help us avoid some kinds of bugs, like typos in these strings when we’re passing them around elsewhere. We’re still allowing any valid string to be a flag though. We could solve this with a custom class or even something like namedtuple. Let’s go even simpler, and use an Enum.

class StoryPart(Enum):
    BEGIN = "begin"
    MIDDLE = "middle"
    END = "end"

While it seems like just another way to organize our three constants, this actually gives us a pretty nice benefit: a new type. Using mypy, we can now check that the flag we’re being passed is a valid StoryPart, and do away with the string handling. We’re still condemned to having to remember all possible values of StoryPart if we want to do something different on each of them, but it’s a good start.

Of course, some languages handle these kinds of tagged unions (or “sum types”) natively. Here’s some sample Haskell, where we pattern match over a sum type:

data StoryPart = Begin | Middle | End

checkFlag :: StoryPart -> ...
checkFlag Begin = ...
checkFlag Middle = ...
checkFlag End = ...

If we turn on the -fwarn-incomplete-patterns warning, we’ll even get messages that we missed one of the possible cases.

Elm goes one step further and doesn’t let your code compile until you’ve handled all cases explicitly. While this can be a bit annoying, I’d usually much rather be somewhat annoyed at compile time than be even more annoyed chasing down bugs at runtime.