Beyond Abstract classes

Submitted by Larry on 7 January 2014 - 1:18am

Recently, Anthony Ferrara has been posting a periodic "Beyond" series about software design philosophy. Some in particular have hinted at concepts I've been pondering as well. With his blessing, therefore, consider this a continuation of that series.

PHP 5.4 is not exactly new, but it's finally starting to see actual usage by a decent number of people. Its most notable new feature is Traits, which in PHP are implemented as, essentially, compile-time copy-paste. Conceptually, though, they're a way to mix functionality into a class without using inheritance, and without requiring a separate distinct object for composition. (At least in PHP; the term "trait" appears in other languages for similar but subtly different tools.) That's not to say that they're a surrogate for composition; they most certainly are not. They serve a different purpose, that is, providing code for a class to reuse without using inheritance.

Recently, I was reading an article discussing the implementation of inheritance, such as it is, in Go, Rust, and other new-wave concurrent languages. (Thanks to twistor for helping me track down the link.) It made an interesting point that crystallized for me why it is I am so excited about traits. Specifically, it noted that there are not one but two kinds of code reuse: interface reuse and code reuse.

Classic inheritance

In traditional classic languages (C++, Java, PHP, etc.), the most readily apparent form of reuse is the "is a" relationship. That is, thing A "is a" thing B, or "is a" special case of thing B, or "is a" subclass of thing B. That is generally represented syntactically by means of class inheritance. Cat is-a special case of Animal, for the classic example, meaning class Cat extends Animal {}. In Java and PHP, though, you have to be careful with that because both languages use single-inheritance, that is, a class can extend only one other class. (Largely this is to avoid the confusion problems that C++'s multiple inheritance brought.)

But is the "is a" relationship interface reuse or code reuse? That's the problem: It's a little of each. In fact, "is a" is too concrete a concept. I don't care what another object is. I care how I can treat it, and how it will behave when I do so. That is, I care about its interface and only its interface.

Interfaces: Just what I care about and nothing more

Instead, let's limit our relationship to "can behave as a" (CBAA), which is all a consumer of an object really cares about. I don't care that Cat "is a" Animal; I care that I can treat Cat like an animal and have it behave the way I expect. That is, the methods I expect any Animal to have Cat will have and will do what I expect them to. We have a language construct for that CBAA relationship; it's called Interface. It means, exclusively, "this object can be treated as a... and I won't promise anything else."

So that's interface reuse, nicely baked into the language. What about code reuse? Well, we have class inheritance. But class inheritance implies CBAA and code reuse. Class inheritance isn't code reuse; it's "is a special case of", which implies CBAA. We want to separate these concepts cleanly so that we can use them independently.

We have composition, but that's not code reuse, either. Composition is "makes use of". But composition is actually quite tedious to make line up with "can be used as a"; for one, it's not actually the same thing. For another, as a practical matter it requires manually forwarding method calls from the composing object to the composed object, and reimplementing the interface. That can be an unpleasant amount of boilerplate just to get what could easily be an unnatural fit.

Abstract classes: virtually unneeded?

We have a concept in PHP called an abstract class. An abstract class is a class that is actively missing one or more methods that it nevertheless declares it needs. That is, it's both a class and an interface at the same time. What is its relationship? Like normal inheritance, it's both "can behave as" and "is a special case of". Except it's not really a special case of something, because the something it's a special case of isn't a real thing in the first place. It's incomplete by definition! So its child classes are "a special case that actually works"?

For a practical example, let's look at the PSR-3 logging specification published by the Framework Interoperability Group. It defines, primarily, an interface for a log channel; application code can take any object that declares it "can behave as a" Psr\Log\LoggerInterface and treat it the exact same way, not caring about the implementation details. The interface is quite simple: There's one meaningful method, log(), and several utility methods that simply alias the first parameter of log():

<?php
interface LoggerInterface
{
    public function emergency($message, array $context = array());
    public function alert($message, array $context = array());
    public function critical($message, array $context = array());
    public function error($message, array $context = array());
    public function warning($message, array $context = array());
    public function notice($message, array $context = array());
    public function info($message, array $context = array());
    public function debug($message, array $context = array());
    public function log($level, $message, array $context = array());
}
?>

Any object can be a logger if it implements that interface. Of course, the implementation of the utility methods is, by design, rather mundane and doesn't vary between implementations. Therefore, the package also defines an abstract base class that trivially implements all of the utility methods (only one shown here for brevity; they're all the same):

<?php
abstract class AbstractLogger implements LoggerInterface {
    // ...
    public function debug($message, array $context = array()) {
        $this->log(LogLevel::DEBUG, $message, $context);
    }
    abstract public function log($level, $message, array $context = array());
}
?>

Such utility base classes are quite common in many frameworks, and for what it's worth quite useful, because now different implementers don't need to all repeat the exact same mundane code for 8 different methods. There are two big problems, however:

It's wasting the only reuse tool that can only be used once, inheritance. That means my logger implementation, if I want to avoid pointless boilerplate code, must extend from this class and no other, ever. That severely limits my flexibility in designing my library.
It's not the right relationship. MyLogger "is a" AbstractLogger doesn't make sense at all. MyLogger "is a special case of" AbstractLogger is only slightly less weird sounding.

See why it's problematic?

Traits: Just gimme the codez

Enter traits. Traits don't imply a relationship. They're purely code reuse. At runtime, traits no longer exist. They say nothing about an object that uses them, how they behave, what you can do with them, or anything else. They are purely code reuse. In that sense, they are completely orthogonal to interfaces; whereas an interface is purely interface reuse, a trait is purely code reuse.

The smart cookies in FIG also provided, in the same package, a trait that implements the same boilerplate as the abstract class:

<?php
trait LoggerTrait {
     // ...
    public function debug($message, array $context = array()) {
        $this->log(LogLevel::DEBUG, $message, $context);
    }
    abstract public function log($level, $message, array $context = array());
}
?>

Now, a class can use that trait to avoid pointless boilerplate without either of the problems noted above. Score!

Now look again. You should notice that the body of the abstract class is... identical to that of the trait. Which means, in turn, the abstract class could be rewritten as follows:

<?php
abstract class AbstractLogger implements LoggerInterface {
    use LoggerTrait;
}
?>

At runtime, that has the exact same result. And that result is... exactly the same as if an implementing class did so itself. Compare:

<?php
class MyLogger extends AbstractLogger {
    public function log($level, $message, array $context = array()) {}
}
?>

<?php
class MyLogger implements LoggerInterface {
    use LoggerTrait;
    public function log($level, $message, array $context = array()) {}
}
?>

At first glance, the runtime implications of those two nearly-identical pieces of code is exactly the same: MyLogger "can behave as a" LoggerInterface as far as application code is concerned, and still avoid writing the same boilerplate all over again. With the second approach, however, the addition of one extra line that will not even exist at runtime avoids both of the problems we noted above, and means that MyLogger can extend from whatever class I feel like, including none.

As an added bonus, the interface(s) that MyLogger can be counted on to obey is listed directly on the class so we don't have to go looking up a class hierarchy for it. That improves readability for the developer. (In this case we only implement one interface but it is completely legitimate to implement multiple interfaces, especially for domain objects.)

Lessons learned

We can generalize this realization to a universal rule (at least within PHP):

Any abstract class may be represented as a trivial intersection of an interface and a trait that happens to fulfill that interface.

So easy is that win that I would argue it is never worth using an abstract class. In fact, I would argue the forced mental separation of "can behave as a" from "I don't have to rewrite this over and over again" is always superior, because forcing them into separate syntactic constructs (for nearly the same number of lines of code) encourages mentally separating interface from implementation; quite literally.

Put another way, as of PHP 5.4, abstract classes are vestigial and should not be used. The clear separation of interfaces and traits provides all the same DRY benefits with less munging of concepts and less restriction on the developer.

That doesn't mean class inheritance in general should never be used. "Is a special case of" is still a valid relationship, and does exist. When divorced from "I just want to not have to type this over and over again", though, its use cases shrink considerably. Similarly, interfaces can inherit from each other; that's fine too. It's not inheritance that is a problem; it's using inheritance as a poor-man's code reuse, which is not what it is for.

PS: If traits are so much better than abstract classes, why does PSR-3 include both? The abstract class is there solely to support developers on PHP 5.3 who do not yet have access to traits. Once those poor lost souls upgrade to PHP 5.4 I would recommend avoiding the abstract utility class and using the traits exclusively.

grafts

back when traits were proposed, there was also an alternative proposal to help with object composition, which I prefer over traits:
https://wiki.php.net/rfc/horizontalreuse#grafts_-_class_composition_not…

The referenced article.

I believe this was the article you were talking about. For posterity. https://lwn.net/Articles/548560/

Yes!

That's it! I'll add it to the article, thanks.

So...

Drupal 8 will be switching from abstract classes to traits then?
;-]

That's the hope

We're discussing that right now in https://drupal.org/node/2134513

We can't yet, since testbot is still on 5.3; core hasn't actually switched to 5.4 yet, just committed to doing so. Yes, this is problematic. :-/ But there's a number of key places where using traits instead of utility base classes, as described here, should make Drupal 8 a lot nicer. We likely won't get to all of them, but I hope we get to most of them.

Interfaces do not necessarily guarantee CBAA

Thanks for the fantastic write-up, Larry. As almost always, I learned a lot. I would like to add one caveat, however, that does justify the usage of abstract classes in some cases. I would be interested in your response to this.

The problem is that "can behave as a" cannot always represented by an interface in PHP depending on what the behavior is that you are trying to enforce. You can only enforce a list of methods that must be implemented and their parameters. Especially with type-hinting this is already a very useful language-level feature and in many cases quite sufficient.

The LoggerInterface example that you give is perfectly well-defined with this as it is completely up to the implementation what logging actually means, so it is arbitrary what actually happens in the log() method. So far, so good.

There are other use-cases, though, where "can behave as a" forces certain implementation details onto the object which cannot be satisfied by the mere interface. An entity in Drupal 8, for example, only makes sense as such if it invokes certain hooks. Otherwise other modules that interact generically with entities do not work. A class could implement EntityInterface yet never invoke those hooks. In the specific Drupal 8 sense it does not behave as an entity, though. The same could be said if it did not invoke the storage controller's save() method.

In this particular case it would be quite valid to provide an abstract class of the form:

<?php
abstract class Entity implements EntityInterface {
  abstract protected function preSave();
  final public function save() {
    $this->moduleHandler->invoke('presave', $this);
    $this->preSave();
    $this->storageController->save($this);
    $this->postSave();
    $this->moduleHandler->invoke('postsave', $this);
  }
  abstract protected function postSave();
}
?>

In terms of type-hinting one could then propose the following duality (although that admittedly might be a can of worms):

Code relying on an entity that behaves in the specific way a Drupal 8 entity should type-hint the abstract Entity base class
Code that simply wants to call methods such as id() on an entity can type-hint EntityInterface as that is completely sufficient in that case

That doesn't enforce it either

The point here is that there may be "behavior" you want to enforce that cannot be enforced syntactically. That's certainly true, as PHP interfaces can only hard-enforce method names and signatures; not even return types. (Java's can also enforce return types, but the rest of the problem still stands.)

However, they can also serve as a place for soft-enforcement via documentation. Eg, if a hypothetical EntityInterface::save() were documented as:

<?php
/** 
 * Save the entity and notify other modules of that fact.
 */
?>

then it would be a violation of the contract (as far as the human is concerned) to not notify other modules. If that's not documented, then that's not, in fact, part of the contract of being an Entity.

Similarly, is calling $this->storageController->save() really part of the contract of being an entity? No, it's not. The contract of the entity is simply that when I call $entity->save(), the entity is persisted. That it is persisted with the help of another object is irrelevant to me as the caller.

If we wanted to enforce that entities only get saved via a storage controller, then we should make Entity::save() just call the storage controller and document it as such; then the hook logic goes in the storage controller, not the entity. Incidentally, I have become increasingly convinced that's the case and we should move that logic out of the entity itself as it doesn't belong there.

Also, even using a base class doesn't enforce that those hooks get called:

<?php
class ShyEntity extends Entity {
  public function save() {
    $this->db->insert(...);
  }
}
?>

That's still completely legal syntactically. The only way to force it is to make save() final, which is a rather sledge-hammer way of enforcing implementation. (Let's not do that.)

What you're changing is not the behavior, but the implementation. You shouldn't enforce implementation. If you want a broader definition of behavior than PHP interface syntax allows, that's what documentation is for. Just remember to keep it as generic as possible to give future-you as much flexibility as possible. There most certainly are cases where you wouldn't want to fire two hooks on every entity save. (Think: an import you're trying to make faster and know that you don't need those hooks; implement a different class, use it instead, and skip all that work. You could make other optimizations, too.)

Methods have side effects

What you're saying is that methods can (and very often do) have side effects which are part of the behaviour of the object. In the example that you give the side effect is that two hooks are called, but the PSR-3 log() method also has a side effect: it writes to a log somewhere (even if "somewhere" isn't defined by the interface).

While final methods/classes

While final methods/classes are theoretically nice, they are always problematic in real life.

One problem is that you can never know whether there aren't really some usecases to override the method.
Additional using final blocks mocking, which might be helpful in some cases.

Clearly explained, thanks!

I have always been uneasy with abstract classes, multiple inheritance, and composition; I can now understand exactly why. I am anxious to start using traits!

Light dawns over traits

Thanks, Larry. I started reading your article with the opinion that traits were another gizmo in PHP's ever growing kitchen sink. By the time I was finished, I was convinced that traits are useful code reuse tools, and look forward to finding ways to ingrain them in my PHP coding habits and idioms.

Close, but maybe a poor example.

Although I am using abstract classes less and less, as I move more and more to composition, the example you've used is not a good one.

If you are defining the log function in the class and pulling in all the other boilerplate functions through traits, then any code inspection tool will report an error for all of those traits, as:

i) The function 'log' is not in the trait.

ii) It is not possible for code-inspection tools to analyze every place where the trait is used, and inspect if there will be a 'log' function available.

It also just isn't possible to be inspected by a human - i.e. If I encounter the LoggerTrait class while reading the code, I have absolutely no indication of what class it is required to be used. There is absolutely no indication of what 'contract' must be fulfilled to allow the trait to be used.

That is what abstract classes exist for; to define the contract that each class must fulfill to allow the class to be instantiated without finding out problems at run time caused by missing methods.

Abstract methods

Actually traits can have abstract methods for exactly the case you specify. LoggerTrait does, but I had left it out of the sample here. I've added it now.