A Few Simple Python Reductions

I talk to a lot of people who are in what you might call the “intermediate beginner” stage of learning programming. They know the basic control structures—loops, conditionals, etc.—and are familiar with all of the basic datatypes—strings, ints, lists, etc. Perhaps most importantly, they’ve acquired the skill of googling for answers when they run into something they don’t yet know how to do.

However, one important part of writing code is readability—not just whether your program runs, or whether it gets the correct answer, but whether it’s readable and understandable to other people. I think readability is a bit harder to learn via googling, and maybe the only way to get really good is through large amounts of practice trying to write cleaner programs, but there are at least a few tips I can give that wouldn’t be obvious to a beginner, but help readability immensely.

I call what I’m doing here a “reduction” because we’re reducing the complexity of some expression in code. If, in math, I have the expression \((a + b)(a + b) - 2ab = c^2\), it’s a likely bet that reducing this to the equivalent \(a^2 + b^2 = c^2\) is going to make the expression easier to work with, and more readable for others. A similar concept applies to reducing complexity out of our programs.

Because so many people learn the basics with Python, I’ll use that for my examples.

Not using unnecessary keywords

One of the most basic examples of clutter is using structures that can be omitted without changing how the program works at all. For example, functions that don’t return a value don’t need the word return at the end. These two functions will do the same thing:

def a():
    pass
    return

def b():
    pass

A similar example might be the use of looping when indexing is really what you want. The following functions have the same functionality (so to speak), but the second is way cleaner and easier to read.

def first(items):
    for index, item in enumerate(items):
        if index == 0:
            return item
def first(items):
    return items[0]

You might think that this is a silly example, but it comes from real-world code. Probably, the programmer intended to use a loop, but ending up only needing the first item and forgot to clean up the looping construct. It’s easy to spot that kind of thing in these simplified examples, but harder in real code.

Moving repeated code into a single place

Here’s a another lightly-modified example from real code:

def test(condition):
    return condition == 'correct'

if test(condition):
    data1 = input('Input: ')
else:
    data1 = 'default1'

if test(another_condition):
    data2 = input('Input: ')
else:
    data2 = 'default2'

if test(a_third_condition):
    data3 = input('Input: ')
else:
    data3 = 'default3'

Hopefully the repetition here is obvious. However, the way to fix it might not be obvious, especially in real code with other logic obscuring the nature of the repetitive bits. What we should do here, since we already have a function handling some of the repeated logic, is to push as much of the other logic into that function as we reasonably can. If such a function doesn’t already exist, make one! If we ever have to go back and make changes to how this code works, it’s much nicer to change it in just one place than to have to copy and paste all over, with the added risk of missing one of the changes.

def test(condition, default):
    if condition == 'correct':
        return input('Input: ')
    else:
        return default

data1 = test(condition, 'default1')
data2 = test(another_condition, 'default2')
data3 = test(a_third_condition, 'default3')

Use normal, not exceptional, control flow

Try to avoid using try and except if it’s possible to use if and else instead. There’s a saying that exceptions are for exceptional cases, meaning that the more you can do with standard control statements, the better. if/else is typically better behaved.

Of course, there are some places where try/except is needed. If some function is throwing an exception that you want to handle right then and there, then by all means use try/except, since it’ll be cleaner.

do_stuff = True
while do_stuff:
    try:
        number = int(input('Input: '))
        do_stuff = False
    except:
        print('Input must be a number')

However, the above example can be cleaned up further, leading me to my next point….

Avoid unnecessary flags

Roughly, let’s call a flag some variable whose only role is to temporarily maintain track of some internal program state, before being discarded. A flag is not modeling some fact about the real world, or doing anything other than internal state management.

It’s easy to see how if we get rid of a flag, we’re also getting rid of some clutter. In the example above, do_stuff is a flag, and as it turns out, an unnecessary one. We could instead make use of the break keyword

while True:
    try:
        number = int(input('Input: '))
        break
    except:
        print('Input must be a number')

Alternatively, it might be best to wrap all this logic into a new function and return a result instead

def get_numeric_input():
    while True:
        try:
            return int(input('Input: '))
        except:
            print('Input must be a number')

Then, at the calling site, our code becomes this clean single call to a well-named function:

number = get_numeric_input()

Using the right data structures

A hugely important part of achieving Good, Clean Code is using the right data structures. With the right structures in place, nearly everything becomes easier to follow, and programming itself is more enjoyable. Without them, clutter tends to accumulate as we start doing unnecessary work that a better data structure could have handled for us.

One good example in Python is defaultdict. Let’s say the world you’re modeling is that you have a bunch of people in mind, and want to map them to their favorite fruits. We can set up a dictionary like:

fruits = { 'Sam': []
         , 'Frodo': ['banana', 'strawberry']
         , 'Odin': ['blackberry']
         , 'Frida': ['pomegranate', 'cherry']
         }

Now, to add a new person we need to do something like this

fruits['Mitchell'] = ['blueberry', 'raspberry']

This doesn’t seem so bad, but what about in the case where we’re not sure if Mitchell is already accounted for? Then we need something like

if not fruits['Mitchell']:
    fruits['Mitchell'] = ['blueberry', 'raspberry']

And what if we want to add a fruit to Mitchell’s list, but we’re not sure if that key is in the dictionary yet?

if fruits['Mitchell']:
    fruits['Mitchell'].append('orange')
else:
    fruits['Mitchell'] = ['orange']

Alternatively, we can reduce the above to

if not fruits['Mitchell']:
    fruits['Mitchell'] = []
fruits['Mitchell'].append('orange')

However, you can imagine that this kind of extra logic compounds quickly, making for a more confusing program. It’d be nice if we could get away with just writing fruits['Mitchell'].append('orange'). It turns out we can, using defaultdict! If we have a defaultdict with a default using list, then whenever we interact with a key that doesn’t yet exist, instead of getting an error, we get an empty list.

fruits = defaultdict(list)

Now, we can just append to whatever keys we want. We’ve removed the conditional on whether the key is already present, reducing a level of nesting and making the code more readable

fruits['Mitchell'].append('orange')

Naming intermediate results

One last tip for now. If you have some complex expression you’re reusing, or partially reusing, it probably makes sense to give it a name instead of calling it twice. First of all, a good name will remind you (and help others quickly learn) about what the complex expression is supposed to be doing. If you ever need to change how it works, you now only need to edit it in one place instead of two. And finally, by running complicated code once instead of twice, we can speed up our program’s running time.

Here’s a quick and silly example:

if complex['condition'](which, may)['or']['may'] is not be(True):
    x = f(complex['condition'](which, may)['or']['may'])
result = complex['condition'](which, may)['or']['may']
if result is not be(True):
    x = f(result)