Pluggable systems HowTo

Submitted by Larry on 7 April 2008 - 10:57pm

I recently had a discussion with Peter Wolanin about pluggable subsystems. (You can tell this is going to be an exciting entry already, can't you?) Drupal has supported a few pluggable subsystems for a long time, namely the database and cache systems. In both cases, they work on a very simple principle: Conditionally include one of two (or more) files that defines the same set of functions but with different bodies.

That's all well and good and simple, but has some very serious limitations. Most notably, because the same function name is defined twice you can never load multiple versions at the same time. That becomes a problem if you want to, say, connect to a MySQL and PostgreSQL database in the same page request. In addition, Drupal 7 is on track to include a fully introspective code registry for conditional code loading, which, based on earlier benchmarks, should be a huge performance boost. The Registry, however, assumes that all code resources (functions, classes, and interfaces) are globally unique throughout Drupal. Having a given function name defined twice will confuse the poor thing.

That is not an insurmountable problem, or even, truth be told, a difficult one. It simply requires switching from a simple include to a more formal mechanism. There are, in fact, several ways that can be done, so to further the education of the world at large (and probably embarrass myself a bit in front of other architecture buffs) I decided to write a brief survey of simple pluggable mechanisms.

Factories

There is no problem in computer science that cannot be solved by adding another layer of indirection.

Any pluggable system is based on the idea of indirection. Instead of calling a routine directly, you call a helper routine that then calls the routine you want, or more specifically the appropriate version of it. Although usually implemented using classes and objects, they do not have to be. There is no problem that cannot be solved using procedural code or OOP code, although the implementation is frequently cleaner one way or the other for a given problem.

Let's take the example of a simple caching system, loosely modeled on Drupal's caching system. How would we get a procedural-only, name-unique, pluggable caching system? The naive implementation would look something like this:

<?php
/* cache.inc */
function cache_set($id, $data) {
  $cache = variable_get('cache_system', 'database');
  include_once('includes/cache.'. $cache '.inc');
  $function = 'cache_set_'. $cache;
  return $function($id, $data);
}
function cache_get($id) {
  $cache = variable_get('cache_system', 'database');
  include_once('includes/cache.'. $cache '.inc');
  $function = 'cache_get_'. $cache;
  return $function($id, $data);
}
/* cache.database.inc */
function cache_set_database($id, $data) {
  // Save to the database.
}
function cache_get_database($id) {
  // Load from the database.
}
/* cache.memcache.inc */
function cache_set_memcache($id, $data) {
  // Save to the memcache server.
}
function cache_get_memcache($id) {
  // Load from the memcache server.
}
/* main.php */
cache_set('diamonds', $some_expensive_data);
$data = cache_get('diamonds');
?>

And voila, we now have a pluggable cache system that only loads the code we'll be using and has globally unique names, so we could (in theory) switch cache types in the middle of the request.

Of course, all of those include_once() calls are non-free, espcially with an opcode cache. (Opcode caches and include_once() do not get along.) The code is also kinda sloppy. It also means if you ever reorganize your code, you have to change the include snippet in every function. If your API has a dozen or two functions, that's going to be problematic.

In the OO world, that is, naturally, handled via a loadable object. A first-blush all-OOP implementation would look something more like this:

<?php
/* cache.inc */
class Cache {
  static protected $instance;
  public static function instance() {
    if (empty(self::$instance)) {
      $cache = variable_get('cache_system', 'database');
      $class = 'Cache_'. $cache;
      self::$instance = new $class;
    }
    return self::$instance;
  }
}
abstract class CacheManager {
  abstract public function cacheSet($id, $data);
  abstract public function cacheGet($id);
}
/* cache.database.inc */
class Cache_database extends CacheManager {
  abstract public function cacheSet($id, $data) {
    // Save to the database.
  }
  abstract public function cacheGet($id) {
    // Load from the database.
  }
}
/* cache.memcache.inc */
class Cache_memcache extends CacheManager {
  abstract public function cacheSet($id, $data) {
    // Save to the memcache server.
  }
  abstract public function cacheGet($id) {
    // Load from the memcache server.
  }
}
/* main.php */
Cache::instance()->cacheSet('diamonds', $some_expensive_data);
$data = Cache::instance()->cacheGet('diamonds');
?>

Of course, because that's more to type it is quite easy to add utility functions:

<?php
cache_set($id, $data) {
  return Cache::instance()->cacheSet($id, $data);
}
cache_get($id, $data) {
  return Cache::instance()->cacheGet($id);
}
?>

In pattern-speak, the Cache class is "Factory", and we are implementing the "Factory method pattern". It also implements a Singleton. Arguably we could merge Cache and CacheManager into a single class, and many implementations do. The take-away here,though, is that picking the right implementation to use has been centralized. We have also encapsulated the various pieces of the cache system into an object, which allows us to have non-global implementation-specific state. It also gives us, via the abstract CacheManager class (which could also just as easily be an interface), clear documentation of what a new implementation needs to offer. As an added benefit, we could potentially take advantage of PHP 5's autoload capabilities and eliminate the need for an include_once(). A slight variation would also allow us to request a specific implementation if desired. To wit:

<?php
class Cache {
  static protected $instance;
  static protected $defaultImplementation;
  public static function implmentation($implementation = NULL) {
    if (!empty($implementation)) {
      self::$defaultImplementation = $implementation;
    }
    if (empty(self::$defaultImplementation)) {
      self::$defaultImplementation = variable_get('cache_system', 'database');
    }
    return self::$defaultImplementation;
  }
  public static function instance($implementation = NULL) {
    if (empty($implementation)) {
      $implementation = self::implementation();
    }
    if (empty(self::$instance[$implementation])) {
      $class = 'Cache_'. $implementation;
      self::$instance[$implementation] = new $class;
    }
    return self::$instance;
  }
}
Cache::instance('memcache')->cacheSet('diamonds', $some_expensive_data);
?>

For a simple case, the factory class needn't even be a class. In our first OOP version, we could very easily replace the Cache class with a function like so:

<?php
function cache_instance() {
  static $instance;
  if (empty($instance)) {
    $cache = variable_get('cache_system', 'database');
    $class = 'Cache_'. $cache;
    $instance = new $class;
  }
  return $instance;
}
?>

In the second case, it's a bit more difficult. It could be done, but the code wouldn't be quite as clean and may require using a global variable. I am of the mind that a global variable is a sure sign that you're doing something wrong, although there are others that disagree.

In general, I favor the OOP implementation. It offers the potential for cleaner expansion (such as multi-implementation access), and allows us to leverage autoloading. It is also simpler than the procedural version, I would argue. After the initial creation of the object, really all that happens is simple object dereferencing.

Back to Drupal

At present, Drupal 7/HEAD has three pluggable systems: Database, Cache, and Password. The cache initializes before the Registry will, so it won't actually break the registry. The database layer does as well, but is being refactored to use the OOP-style singleton-factory anyway as part of the larger Database API TNG rewrite. The new password hashing system, however, initializes after the Registry so it would be affected. My recommendation is to refactor it into a class-based pluggable implementation as described here. It is probably simple enough that it could use a factory function instead of a factory class, but I leave that to more password-minded developers to decide.

Happy coding!

Tutorials

PHP

Drupal

Password hashing system?

Offtopic:
Could you please add a link to the note on the "new password hashing system"? In Gallery 2 we are dealing with some integration issues related to different applications using different password hashing schemes and we're discussing solutions. I'm interested in Drupal's approach.

@Pluggable systems:
Gallery 2 is using a central registry / factory for lots of things and it's working pretty well. The only issue that needs to be revised is versioning of interfaces. But that will be a small change.
And we're looking forward to our next development cycle to finally use PHP 5's OO features. :)

PS: Are you using Mollom? It thought this was spam and I got a CAPTCHA.

Issue queue

The issue where the new password system was added is here: http://drupal.org/node/29706 . It's very new, so there's no general docs on it yet that I'm aware.

Yes, I'm using Mollom now. By answering the CAPTCHA, you're making the system more accurate. Thanks!

And yes, PHP 5 is all purdy. :-)

more pluggable systems

nice article ... fyi there are 2 more pluggable systems in core: smtp and sessions. search for

variable_get('smtp_library', '')
variable_get('session_inc', './includes/session.inc');

Routers

Have you looked at slantview's Cache Router module?

http://drupal.org/project/cacherouter

I'd like to know how this approach fits into the subsystem architecture that you discuss.

Is this approach the type of thinking we should be adopting?

i'm not a great OOP

i'm not a great OOP programmer, but that is basically exactly what i did with CacheRouter. I tried to avoid factory and abstract classes to be PHP4 safe, but any patches are always welcome :)

the other thing i did was abstract it so that you can assign different "bins" to different "implementations"

-s

Looks like it

From a quick glance through the code, yes, it looks like it's doing something very similar, using a hybrid mechanism that looks closer to the procedural mechanism at the front. It's a bit more complicated than I'd like, due probably to the need to work with PHP 4 and existing code, but I didn't look into it that closely.

For systems where we want to have pluggable logic, especially if we want to have multiple implementations active at once, then yes I believe this is a good way to go. However, there is a performance cost. It's fairly small per-call if implemented properly, but it does add up. A pluggable implementation of l(), for instance, would be a huge performance hit. :-) For the database, it's an acceptable one given savings elsewhere. For the cache system, maybe it makes sense, maybe it doesn't.

I generally favor the OOP-centric approach with a wrapper function, for various reasons listed above.

In my testing there was

In my testing there was virtually no difference in speed for database caching with CacheRouter and standard caching. There is however a HUGE performance gain by using it with file, memcache, apc or xcache. See the stats listed on the project page.

Most of the complications are due to the bug fixes we've done with memcache, apc, etc or due to some weirdness with Drupal core (Hi page_fast_cache!). I refactored most of the caching modules into this module, and hope that we could eventually get this or something very similar into core.

Basically we could have a global $cache object that is the cache router and this way we could extend the caching system to have _get, _set, _flush and _delete, instead of the wacky cache_clear_all. It is an oddly named, and strange behaving function. cache_clear_all is sort of like a spork, it's not really a spoon and not really a fork, and does pretty poorly at handling each function.

Anyway, thanks for writing this article. i would love to see more people active with trying to get a better caching subsystem.

-s

Abstract implementation possible?

I did not learn OOP yet, thus my question: Currently, I'm developing Migrator module, which aims to convert an existing (non-Drupal) site to Drupal. Basically, I've implemented support for multiple external systems in a similar way like your procedural example. So regarding Drupal core: Would a central, abstract factory registry for factories be possible, which could be leveraged by contrib modules to register their factories, too? I think we could avoid plenty of duplicate code (in core and contrib) if that would be possible.

Not really

A factory, like most design patterns, isn't copy/paste-able code. It's a general approach style. You don't want to have a "factory class" you inherit from, especially in PHP 5.2 as it lacks late static binding. You can, however, always use a similar "flavor" of factory (there are 3 options listed above) to make it easier to transfer knowledge from one subsystem to another.