Write code and tests in tandem

I often tell students in my lab that they need to put more of an emphasis on testing their research code. We have automated tests on quite a lot of our code to verify that it works as expected, which frequently helps prevent issues before they arise and makes refactoring code faster, easier, and more reliable. Even research code benefits greatly from automated testing. In discussions on testing with newer students, I’ve found that a big hurdle to writing tests is that they often don’t quite when to start writing a test. Here’s my approach to getting started: If you need to write additional code to verify that some functionality is working, that additional code should be written as a test. It’s a simple heuristic, but a powerfully useful one.

Rather than putting code in a one-off script or by modifying another—often to be deleted shortly after being written—code is written in an automated test, which persists even after the code is considered stable. In my experience, it is pretty rare that code, particularly research code, will be written without any debugging along the way. Many less-experienced researchers will write code and have a large block of either plotting code or print statements that exist to convince themselves that their new code is working as expected. Once the code is believed to work, however, that code will either be commented out or deleted, so that it is not run in production, a process that has always felt to me like wasted effort. Whenever I want to add a new feature or functionality in my code, I start by setting up a test and populating it with an example scenario as I write the code, evolving both in tandem until I achieve some broader aim.

These ideas are consistent with the philosophy of Test Driven Development (TDD), which broadly refers to the practice of writing tests for an API in advance of its implementation.

This guidance applies both during initial development, when going back over older code, and when tracking down bugs or other unwanted behavior. Though it takes a bit of extra effort to set up the boilerplate code to set up and tear down the scenario being tested, the code I would once have written and then discarded lives on in my testing suite and is tested occasionally to ensure the new functionality is still performing as expected after I’ve moved on.

What about visualization code? Keep that too! Sometimes there’s no substitute to just looking at something when you’re trying to write code. Many of my automated tests have an optional do_debug_plot flag that also generates plots for me to look at or inspect in case I ever need to look more closely at that functionality again. Much of my work interacts with computer vision and so having the ability to lay my eyes on the result is sometimes just what I need when I’m trying to debug or understand something.

I have also discovered over time that I have evolved so as to write new code such that it is as easy as possible to test. This observation matches a similar thought from developer kovarex from the wonderfully addicting game Factorio. Here is an except from a developer blog post of his concerning Test Driven Development (TDD):

TDD actually is the constant fast switching between extending the tests and making them pass continuously. So as you write tests, you write code to satisfy them basically at the same time. This allows you to instantly test what you write, and mainly use tests as specification of what the code should actually do, which guides the thought process to make you think about where you are headed to, and to write code that is more structured and testable from the very beginning.

Starting my development process in tests has led to overall code readability improvements and has made it easier to refactor and extend the code when it became necessary later on.

My guidance above is really only a stepping stone to more comprehensive testing practices.

See also, my colleagues Paul Ammann and Jeff Offutt’s popular (and excellent) book on Software Testing.

In addition to these sorts of integration tests, which involve running multiple parts of the code at once and testing them in unison, it is still critical to take the time to write both unit tests, to more rigorously test functions in relative isolation, and regression tests, which involve writing a test to reproduce and squash bugs found later on. There’s both an art and a science to testing and my heuristics for when to write additional tests and how in-depth they should be vary according to both time and application.

I welcome thoughts and feedback on Twitter.