Dependency injection, testing, and Drupal

Submitted by Larry on 24 December 2008 - 2:08am

Via Planet PHP I stumbled across this article decrying Singletons. It's not a new argument, really, but one of the comments pointed me toward a Google Tech Talk video entitled "Global State and Singletons". To be honest I don't agree with everything said in either the article or the video, but both are spot on about the problems of global state, something I've lamented before in relation to testing.

That is especially relevant now, as we consider the question of Handlers in Drupal. Why? Because the most controvercial part so far, the environment variable, is designed to address exactly this problem, a problem that is currently prevalent throughout all of Drupal.

Permit me to explain.

Code is state

As noted in all of the above linked articles, global state is bad. It's bad for testing and bad for developing stable code. (I'll assume you read the links above so that I don't need to explain why again.) However, global state includes code. If you can't separate out two pieces of code from each other, then you can't develop or test them separately. Two pieces of code are inseparable if you cannot make one run without the other from outside of those two pieces of code. Now, that's not always a bad thing; code generally requires other code and you really can't split every line of code out to be independent of every other. It is still something to keep in mind when architecting a system, however.

The way to keep two pieces of code separate is to introduce a layer of abstraction between them, and that is easily the first win of object-oriented code over procedural. The syntax itself is a layer of abstraction. Every method you call on an object is one layer of abstraction removed from calling a function that did the same thing, because you're calling it through a variable (the object), and if you change the object then the method call can be replaced without changing the method call itself. OK, that sounds more complicated than it is, but consider:

<?php
// Example 1
$a = $something;
thing1($a);
thing2($a);
thing3($a);

// Example 2
$a = new Something();
$a->thing1();
$a->thing2();
$a->thing3();
?>

If we wanted to change the behavior of thing1(), thing2(), and thing3() (say, in order to test some other routine that uses this bit of code), then in the first example we'd need to change three lines of code, one for each function call. In the second, we'd change one line of code, the $a = new Something(); call, to a different class that also has those methods. That makes the second version inherently more loosely coupled, even if in this contrived example it doesn't help much.

However, let's look at that situation slightly differently. Dependent code is a form of state. A Singleton, or global, is a dependency that you cannot change because you have no layer of indirection. If you call a function directly, then you cannot change what function is called without changing the code. That is, it is an immutable dependency. That is:

Every function call is a singleton/global.

Think on that for a moment, and you begin to understand why staunch OO adherents are so anti-functions. :-)

Loose coupling

Of course, just as in OO code there are ways to introduce layers of indirection in procedural code. Some languages make it easier than others. In my experience it's a royal pain in C, as you need to use a function pointer (the syntax for which is totally nasty). In PHP, we have $function and call_user_func_array(). While they do add overhead, they are a very straightforward way of separating functions from each other just as one does in OO code by passing object references around.

In fact, Drupal's entire architecture is built around that indirection. Hooks are called indirectly. Theme functions are called indirectly. Once you have that layer of indirection you can do all sorts of exciting things without modifying someone else's code. In fact, the degree to which Drupal supports that sort of separation (when in doubt, make a hook) is its secret weapon, in a sense, and is what makes Drupal Drupal.

Hooks, as explained in Pro Drupal Development, are an implementation of the Inversion of Control design pattern, specifically the "Event-driven" variant. (Every instance of module_invoke_all() is an "event" to which one or more hooks response.) In a procedural system, that is a very good way to go about keeping code loosely coupled. In the Google Talk video linked above, the presenter talks about using a constructor to define what external dependencies an object has and forcing the caller to provide them. The procedural equivalent for hooks is the hook function signature; it declares what external data needs to be provided, such as a $node object to modify.

The downside is that it doesn't always go far enough. Code in a hook can still call out to pretty much anything it wants, so while it may be decoupled from its caller it is not fully decoupled from the system. The best examples are registry-style hooks, such as hook_menu(). Most menu hook implementations do not have any external dependencies, but those that build up menu items dynamically do; at the very least they generally have an external dependency on the db_query() singleton and the database behind it.

In OO code, one can do event-style inversion of control (typically with the Observer pattern or similar) but also Dependency Injection. Dependency injection relies on the built-in extra layer of abstraction provided by OO syntax, as described in the Google video, to "inject" dependent code and state into another object. As long as the dependent code meets the same interface that the object is expecting (which is where syntactic interfaces become really nice, as they provide a compile time check for just that), everything still works and the two systems have become decoupled with all the benefits that brings.

As described in the video, that requires a change in the way you think about your program. A given class needs to define what external systems (code or state) it requires and then it is "someone else's problem" to know what those are and to provide it with those. That someone else is the calling code, which could be the calling test, too. Of course, that someone else is usually you, so you still have a problem, just a different one, even though you're a step ahead because of the separation of concerns.

That "someone else's problem" chain is one of the reasons that OO code is frequently derided as being more complex and unapproachable than procedural code. It's also true. All of that indirection is weird, it's not at all intuitive, it takes up extra lines of code to write, and it makes understanding the "big picture" of the code more difficult than if you could just read through lines sequentially and see what they do.

But then, that exact same challenge exists with hooks, too. How many people are scared off of Drupal because hooks are just plain weird? Drupal does everything "sideways", which is totally unintuitive to someone who doesn't already understand it. Same problem, just different syntax.

Globals, globals, everywhere!

PHP, of course, doesn't make that separation of concerns easy. It is practically built upon global data. register_globals is finally getting put out of our misery in PHP 6, but we still have the dead-easy use of functions. We also have the "super-globals", global state so global that we can't avoid it.

That's great, that's wonderful, and it makes loose coupling ridiculously hard if you rely on it; that's because a super-global gets injected for you, whether you want it to or not, so you can't really control them and swap them out for alternate implementations. If you want to be able to really test your code properly, you need to not use functions or super-globals. (So why is PHP attracting all of these hard-core OO people these days? I totally don't get that.)

Wrap it up

In many languages and frameworks (Java, C#, even classic ASP if I understood my limited interaction with it properly) there is a "Request object" singleton that at least centralizes these state problems so you can deal with them in one place. Not so in PHP. So, as the video suggests when dealing with a lousy API, we need to add our own wrapper around the ugly bits and then train ourselves to use it properly. That is, we need an object that wraps and provides access to $_GET, $_POST, $_COOKIE, $_SERVER, and so forth. It provides a centralized access to the PHP environment variables. And I suppose while we're at it, we may as well include the super-common Drupal environment stuff, too, like arg() (based on $_GET). Then we can inject that and train ourselves to not use the super-globals or the bare functions that can't be swapped out with mock versions for testing.

What shall we call such an object? How about the environment object, which gets passed into each Handler? (See, I got to the point eventually!)

Pragmatism wins again

Arguably we should carry this line of thinking to its logical extreme. That is, wrap everything. Even database calls. Then have each class (handler?) declare in its constructor what external systems it needs, and inject several different objects to use depending on which ones the handler will actually need. The request object, the database connection object, the variable/system-settings object, etc. So why don't we? Simple. It makes life a lot more difficult, and complicates the implementation.

In fact, the origin of the environment object was the realization that the external dependencies were not consistent between different implementations of the same interface. That is, the CVS and SVN implementations of the VersonControlSystem handler needed different variables; a database-backed Cache handler needs a database connection while a file-system based one needs a file handle or directory hook and a memcache-based wants something else; and so on.

So by passing in a facade object through which the handler can pull its dependencies, we get the benefits of dependency injection (testability, loose coupling, mock objects, etc.) while still allowing for variable dependencies. When testing the database-based Cache implementation we can pass it a mocked environment object that behaves normally except for having an alternate, fake database connection that the handler then requests, not knowing that it's a fake. (Yes, that does mean that we will have to make the database connection accessed through the environment object, not directly through db_query() as it is extremely difficult to mock db_query() properly.)

In addition, I know full well that much of this is new to a lot of Drupal folks. Having to rethink all of Drupal into dependency injection objects at once is, um, impossible. Heck, I don't want to do it either, and we'd probably break everything if we tried. I don't want to have to explain to people why they need to instantiate 5 objects themselves and pass them to an object that gets passed to an object that gets passed to an object that does something, do you?

There are plenty of good ways to do good loosely coupled code procedurally; we already do many of them. Let's keep those, but extend our ability to extend Drupal in another direction, using tried-and-true object-oriented techniques. And in the process, we get much better testable code at the expense of a thin layer of indirection.

The articles is well balanced and accepts that all programming is like Perl or religions with multiple paths to efficient ecstasy.

Programming rules are only guidelines that need to be broken now and then. More than one programmer has taken OO rules or "separating logic from presentation" to the extreme. The resulting fragmented and fragile code is as bad as the spaghetti code from the past.

A programmers job, in my view, is to work within a well defined (discrete) box but know when to bend or break the rules. Too many programmers are hell-bent on the rules and ignore the reality that the rules can get in the way of solving the problem.

The paradigm of Drupal 5/6 is adequate but maybe 7 will be better. Or maybe it will be worse- especially considering the relearning involved which may leave previous module developers out in the cold. Thank God I am somewhat new.

If you are already thinking about injected dependencies for Drupal, I think it would be a logical step to also consider dependency injection containers as a possible piece of the design.

This pattern allows or encourages very explicit constructor signatures, asking for exactly those things the object really needs, but automates the process of providing these things to the constructor method. This could be a way to make the overly big $env parameter unnecessary.

You can have a look at symfony's container, just to name one.
http://components.symfony-project.org/dependency-injection/documentation