I talk to a lot of people who are in what you might call the “intermediate beginner” stage of learning programming. They know the basic control structures—loops, conditionals, etc.—and are familiar with all of the basic datatypes—strings, ints, lists, etc. Perhaps most importantly, they’ve acquired the skill of googling for answers when they run into something they don’t yet know how to do.
However, one important part of writing code is readability—not just whether your program runs, or whether it gets the correct answer, but whether it’s readable and understandable to other people. I think readability is a bit harder to learn via googling, and maybe the only way to get really good is through large amounts of practice trying to write cleaner programs, but there are at least a few tips I can give that wouldn’t be obvious to a beginner, but help readability immensely.
I call what I’m doing here a “reduction” because we’re reducing the complexity of some expression in code. If, in math, I have the expression \((a + b)(a + b) - 2ab = c^2\), it’s a likely bet that reducing this to the equivalent \(a^2 + b^2 = c^2\) is going to make the expression easier to work with, and more readable for others. A similar concept applies to reducing complexity out of our programs.
Because so many people learn the basics with Python, I’ll use that for my examples.
One of the most basic examples of clutter is using structures that can be omitted without changing how the program works at all. For example, functions that don’t return a value don’t need the word
return at the end. These two functions will do the same thing:
A similar example might be the use of looping when indexing is really what you want. The following functions have the same functionality (so to speak), but the second is way cleaner and easier to read.
You might think that this is a silly example, but it comes from real-world code. Probably, the programmer intended to use a loop, but ending up only needing the first item and forgot to clean up the looping construct. It’s easy to spot that kind of thing in these simplified examples, but harder in real code.
Here’s a another lightly-modified example from real code:
Hopefully the repetition here is obvious. However, the way to fix it might not be obvious, especially in real code with other logic obscuring the nature of the repetitive bits. What we should do here, since we already have a function handling some of the repeated logic, is to push as much of the other logic into that function as we reasonably can. If such a function doesn’t already exist, make one! If we ever have to go back and make changes to how this code works, it’s much nicer to change it in just one place than to have to copy and paste all over, with the added risk of missing one of the changes.
Try to avoid using
except if it’s possible to use
else instead. There’s a saying that exceptions are for exceptional cases, meaning that the more you can do with standard control statements, the better.
if/else is typically better behaved.
Of course, there are some places where
try/except is needed. If some function is throwing an exception that you want to handle right then and there, then by all means use
try/except, since it’ll be cleaner.
However, the above example can be cleaned up further, leading me to my next point….
Roughly, let’s call a flag some variable whose only role is to temporarily maintain track of some internal program state, before being discarded. A flag is not modeling some fact about the real world, or doing anything other than internal state management.
It’s easy to see how if we get rid of a flag, we’re also getting rid of some clutter. In the example above,
do_stuff is a flag, and as it turns out, an unnecessary one. We could instead make use of the
Alternatively, it might be best to wrap all this logic into a new function and
return a result instead
Then, at the calling site, our code becomes this clean single call to a well-named function:
A hugely important part of achieving Good, Clean Code is using the right data structures. With the right structures in place, nearly everything becomes easier to follow, and programming itself is more enjoyable. Without them, clutter tends to accumulate as we start doing unnecessary work that a better data structure could have handled for us.
One good example in Python is
defaultdict. Let’s say the world you’re modeling is that you have a bunch of people in mind, and want to map them to their favorite fruits. We can set up a dictionary like:
Now, to add a new person we need to do something like this
This doesn’t seem so bad, but what about in the case where we’re not sure if
Mitchell is already accounted for? Then we need something like
And what if we want to add a fruit to
Mitchell’s list, but we’re not sure if that key is in the dictionary yet?
Alternatively, we can reduce the above to
However, you can imagine that this kind of extra logic compounds quickly, making for a more confusing program. It’d be nice if we could get away with just writing
fruits['Mitchell'].append('orange'). It turns out we can, using
defaultdict! If we have a
defaultdict with a default using
list, then whenever we interact with a key that doesn’t yet exist, instead of getting an error, we get an empty list.
Now, we can just append to whatever keys we want. We’ve removed the conditional on whether the key is already present, reducing a level of nesting and making the code more readable
One last tip for now. If you have some complex expression you’re reusing, or partially reusing, it probably makes sense to give it a name instead of calling it twice. First of all, a good name will remind you (and help others quickly learn) about what the complex expression is supposed to be doing. If you ever need to change how it works, you now only need to edit it in one place instead of two. And finally, by running complicated code once instead of twice, we can speed up our program’s running time.
Here’s a quick and silly example: