Testable APIs

By now you may have heard the news from Paris that a unit testing framework has landed in Drupal core. A huge shout-out goes to everyone involved. I particularly want to note the work that's been put in by former GHOP students and members of the GHOP team. It's amazing to see how far some people have come in a short time, despite still having homework to do. :-)

The next step, of course, is to make Drupal itself fully-tested. That poses a number of challenges, particularly for unit tests. Because I'm sure others will be singing the (well-deserved) praises of the testing team, I want to take a moment to focus on that next step and one important approach: Testable APIs.

Unit tests

The goal of unit tests is to isolate some small portion of the code base (a "unit") and test it exhaustively. Once you know that piece is definitively working in all circumstances, you can move on to another area and test it, knowing that, if you did your job right, any new bugs you find cannot be in the already-tested code because you already tested all possible input and output combinations from that code and they behaved as they are supposed to.

Of course, such testing takes a lot of time so you automate it with a testing framework (which we now have!), and can easily re-run all of those tests to see if anything changed. If you're trying to track down a weird bug and component Foo passes all of its tests, then you know that the bug is not in component Foo, because its input and output is behaving correctly. If the problem is actually in Foo, then you aren't fully testing it and need to write more tests to add to your battery of automated tests.

That brings us to a very important point: It's easier to write tests for code that has simple, predictable inputs and outputs. The smaller the black box you're testing, the better you can isolate input and output and confirm that a given input gives you the output it is supposed to. That small black box must also be a closed, isolated system. If it interacts with its environment too much, it becomes very difficult if not impossible to test because you can't control the complete set of inputs. Like any scientific experiment, you must be able to reliably replicate the entire environment and input to a system in order to derive useful information from the output.

State's rights

Naturally, some coding architectures are more or less testable than others. Code that relies heavily on circumstantial input is harder to test properly than code that relies on clearly defined input vectors. "Circumstantial input" is the fancy way of saying "state". State-sensitive code is harder to test than state-insensitive code, because the state is an input; you have to configure your state as well as your direct inputs in order to test the code's behavior properly. Therefore, stateless code is most desirable from a testability perspective, and stateful code with easily-configured state is more desirable than hard-to-configure state. State in this case includes everything from global variables to the database to the session to HTTP header information to even static variables.

Similarly, easily-testable code is bite-sized. A system with 5 possible inputs and 5 possible outputs can be tested with 5 unit tests, one for each discrete input set. A system with 50 possible inputs and 50 possible outputs requires 50 tests. A system with a variable number of inputs, with many of those coming from state, is ridiculously hard to test fully. If, however, you can break that system down into a series of several smaller components, each with only a few discrete inputs and expected outputs, you can test those pieces completely and reduce the amount of mystery in the variable, uncontrollable section.

We can therefore conclude that key attributes of testable code are:

  • It is broken down into discrete, single-purpose components.
  • Those components have clear, limited inputs with verifiable outputs.
  • Those components are easily separated from their environment.
  • Those components are, where possible, stateless, or at least the state is maintained separately from the algorithm to be tested.

Good code; don't have a $_COOKIE

Fortunately, those are all attributes not only of testable code, but also of readable, human-understandable code. It's easier to understand a bite-sized chunk of code that does not interact with its environment except through very clearly-defined channels than it is to follow (and debug, automatically or manually) code that does 3 things at the same time, using 4 different global variables and a few static variables. They are also frequently more flexible and powerful, because you have smaller, swappable pieces that can not only be tested independently but used or replaced independently as well.

Purely functional languages are very easy to unit test in this regard, because the language itself enforces statelessness and separation of concerns. With good coding techniques, however, we can still write code that is easy to test, for the most part, in procedural or OOP languages.

Of course, not all code can be single-purpose and stateless, especially when the whole point of your application is to maintain state. There are various techniques to help make code easier to unit test, such as mock objects or fake functions, but those are already documented far better elsewhere so I will not go into them in detail. However, what I will stress is the point I made above: Testable APIs are also Readable APIs and Flexible APIs. That is, more testable code and better quality code go hand-in-hand, even without considering the benefits of actually doing the testing. As one of the stated goals for Drupal 7 is better internal APIs, that makes writing testable code doubly important as doing so also results in better APIs.

Writing testable code

That's great. So how do we actually go about writing "testable code", especially when we kinda need state? Drupal's entire hook architecture is built around modules being able to inject code and data structures into each other's state. Doesn't that make writing testable code in Drupal rather hard, and counter-productive?

Not at all. It does, however, mean we need to take care to separate our concerns properly. There are two recommendations I will make here for "more testable code":

  1. Avoid deep, dependent function stacks.
  2. Use stateful driver functions.

Avoid deep, dependent function stacks.

Separating an algorithm or routine out into multiple functions (or multiple objects and methods, if writing OOP code) is generally understood to be a Good Thing(tm), as it lets you divide and conquer a problem. However, there is more than one way to divide and conquer. You can make sub-components of a routine dependent on each other and sub-call each other ad nausem, or you can make them independent of each other and have a wrapping routine that ties them together. The second approach is more testable, as there are fewer dependencies between each component. You can then black-box test each of them individually, then black-box test the wrapping routine with the knowledge that any bugs you find must be in the wrapper. There's less need to write mock objects or fake functions for sub-components because the sub-components do not depend on each other.

As an example consider the code registry parser, which I hope will get committed soon. :-) The parser takes a PHP source code file by its full file name (including directory) and extracts from it a list of functions or classes it contains. For each function, it needs to derive the following information: The function name, the file the function is in, the module that function belongs to, and the hook (if any) that function is an implementation of. That information is then stored in the database. Some of that information is derived from other pieces of information; specifically, the file name of the function determines the module it belongs to, according to certain rules.

One way of writing that parser is to, in a single function, iterate over every function in the file and gets its name, use the file name to determine the module, the module to determine the hook, then save that data. Great. And when there's a bug in the hook-detection output, how do you know if it was a bug in the hook determination code, the module determination code, or the file name parsing? How do you know which of the 5 variables defined at the top of the function belongs to which step in the process? You don't. Such a mechanism has poor testability.

Another way is to pass a file handle to the file name parsing code, which then sub-calls the module detection code, which then sub-calls the hook detection code, and return the resulting information. That's a problem because then one step depends not on the return value of another step, but on the existence of it. You then must write a fake subroutine for the hook detection code in order to properly isolate the module detection code. That's annoying and time-consuming. It also means the hook detection code assumes it is being called from within the module detection code because it's passed an internal data structure; that is, it depends on the state of the function stack. That's a "deep dependent function stack".

Use stateful driver functions

The better approach, which is more testable as well as easier to read and re-use, is to have a driver function that does not itself do all that much, but hands off the processing to each sub-step independently. Let's have a look at the current version in the patch (and embarrass Larry greatly if it gets changed before the patch is committed):

<?php
function _registry_parse_directory($path, $patterns) {
  static
$map = array(T_FUNCTION => 'function', T_CLASS => 'class', T_INTERFACE => 'interface');

 
$active_modules = module_list();
 
$active_modules['node_content'] = 'node_content';

 
$files = file_scan_directory($path, '\.(inc|module|install)$');
  foreach (
$files as $filename => $file) {
   
$tokens = token_get_all(file_get_contents($filename));
    while (
$token = next($tokens)) {
     
// Ignore all tokens except for those we are specifically saving.
     
if (is_array($token) && isset($map[$token[0]])) {
       
$type = $map[$token[0]];
        if (
$resource_name = _registry_get_resource_name($tokens, $type)) {
         
$module = _registry_get_resource_module($resource_name, $filename, $type);
          if (
$module != 'includes' && !isset($active_modules[$module])) {
           
// If this is a disabled module then we skip the entire file.
           
continue 2;
          }
         
$hook = _registry_get_resource_hook($resource_name, $module, $patterns);

         
// Now save the resource record to the database.
         
$result = _registry_save_resource($resource_name, $type, $module, $hook, $filename);
         
// We skip the body because classes may contain functions.
         
_registry_skip_body($tokens);
        }
      }
    }
  }
}
?>

All this function does itself, really, is iterate over files. Once it hits a function or class, it first calls _registry_get_resource_name() to determine the name of the function. Then it calls _registry_get_resource_module() to get the module the function belongs to. Then it calls _registry_get_resource_hook() to determine what if any hook the function implements. Each of those steps is now uncoupled from the others, and can be tested "in a vacuum". Once each of those is verified to behave as expected, we can test _registry_parse_directory() to make sure they all work together, possibly without even writing fake functions for the sub-steps as we already know (through testing) that they work. Because those are also stateless routines (they have no globals, session variables, or statics) we know that if we pass in a certain input, we must get back its corresponding output. Mocking them is therefore not necessary (or at least less necessary).

That approach applies to many areas of code. In general, separate out the state-sensitive parts (the file stream parsing) from the state-insensitive algorithm. You can then test the algorithm itself with tightly-controlled inputs to verify that it works as intended, then test the state-tracking parts of the code separately without worrying about the algorithm operating on that data. As an added bonus, you then get a cleanly re-usable algorithm all tied up in a neat function (or method) with a little pink bow, as well as more easily-understandable code since you can look at and wrap your head around the state managing code and the algorithm independently of each other.

That stateful routine includes anything that accesses the environment the code runs in. That includes the database, the cache, variable_get(), the session, cookies, HTTP headers, arg() calls, potentially static variables, etc. None of that should appear in the stateless algorithm code.

Even if you can't separate out code that far, using that model as an ideal helps to create better-factored code. Take the new Drupal 6 menu system, for instance. The title callback, access callback, loading callbacks, and page handler callback are all called independently of each other in serial rather than daisy-chained together. That means those routines can be tested independent of their presence in the menu system, and independent of the result of the other callbacks.

Better testability, better readability, better reusability, better modularity: All around better code. Score!

Comments

well done

this made some fuzzy ideals much more concrete for me. well done.

Doctest

I discovered this in the python documentation recently: http://docs.python.org/lib/module-doctest.html

It's a kind of small unit tests embedded in the documentation of the module/class/function. It's one way of making sure your documentation matches your implementation. Then there are APIs into the python unit testing framework.... I haven't read it completely yet but it looks interesting.