Pluggable systems HowTo

I recently had a discussion with Peter Wolanin about pluggable subsystems. (You can tell this is goin' t' be an excitin' entry already, can't ye?) Drupal has supported a few pluggable subsystems fer a long time, namely th' database an' cache systems. In both cases, they work on a very simple principle: Conditionally include one o' two (or more) files that defines th' same set o' functions but with different bodies.

That's all well an' good an' simple, but has some very seri'us limitations. Most notably, because th' same function name is defined twice ye can no nay ne'er load multiple versions at th' same time, Hornswaggle That becomes a problem if ye want t', say, connect t' a MySQL an' PostgreSQL database in th' same page request, Get out of me rum! In addition, Drupal 7 is on track t' include a fully introspective code registry fer conditional code loadin', which, based on earlier benchmarks, should be a huge performance boost. The Registry, however, assumes that all code resources (functions, classes, an' interfaces) be globally unique throughout Drupal. Havin' a given function name defined twice will confuse th' poor thin'.

That is not an insurmountable problem, or even, truth be told, a difficult one. It simply requires switchin' from a simple include t' a more formal mechanism, Ya horn swogglin' scurvy cur! There be, in fact, several ways that can be done, so t' further th' education o' th' world at large (an' likely embarrass meself a bit in front o' other architecture buffs) I decided t' write a brief survey o' simple pluggable mechanisms.

Factories

There is no problem in computer science that cannot be solved by addin' another layer o' indirection.

Any pluggable system is based on th' notion o' indirection. Instead o' callin' a routine directly, ye call a helper routine that then calls th' routine ye want, or more specifically th' appropriate version o' it. Although usually implemented usin' classes an' objects, they dern't have t' be. There is no problem that cannot be solved usin' procedural code or OOP code, although th' implementation is frequently cleaner one way or th' other fer a given problem.

Let's take th' example o' a simple cachin' system, loosely modeled on Drupal's cachin' system. How would we get a procedural-only, name-unique, pluggable cachin' system? The naive implementation would look somethin' like this:

<?php
/* cache.inc */
function cache_set($id, $data) {
 
$cache = variable_get('cache_system', 'database');
  include_once(
'includes/cache.'. $cache '.inc');
 
$function = 'cache_set_'. $cache;
  return
$function($id, $data);
}

function
cache_get($id) {
 
$cache = variable_get('cache_system', 'database');
  include_once(
'includes/cache.'. $cache '.inc');
 
$function = 'cache_get_'. $cache;
  return
$function($id, $data);
}

/* cache.database.inc */
function cache_set_database($id, $data) {
 
// Save to the database.
}

function
cache_get_database($id) {
 
// Load from the database.
}

/* cache.memcache.inc */
function cache_set_memcache($id, $data) {
 
// Save to the memcache server.
}

function
cache_get_memcache($id) {
 
// Load from the memcache server.
}

/* main.php */
cache_set('diamonds', $some_expensive_data);
$data = cache_get('diamonds');
?>

And voila, we now have a pluggable cache system that only loads th' code we'll be usin' an' has globally unique names, so we could (in theory) switch cache types in th' middle o' th' request.

Of course, all o' those include_once() calls be non-free, espcially with an opcode cache. (Opcode caches an' include_once() dern't get along.) The code is also kinda sloppy, feed the fishes It also means if ye e'er reorganize yer code, ye have t' change th' include snippet in every function. If yer API has a dozen or two functions, that's goin' t' be problematic.

In th' OO world, that is, naturally, handled via a loadable object. A first-blush all-OOP implementation would look somethin' more like this:

<?php
/* cache.inc */
class Cache {
  static protected
$instance;

  public static function
instance() {
    if (empty(
self::$instance)) {
     
$cache = variable_get('cache_system', 'database');
     
$class = 'Cache_'. $cache;
     
self::$instance = new $class;
    }
    return
self::$instance;
  }
}

abstract class
CacheManager {
  abstract public function
cacheSet($id, $data);
  abstract public function
cacheGet($id);
}

/* cache.database.inc */
class Cache_database extends CacheManager {

  abstract public function
cacheSet($id, $data) {
   
// Save to the database.
 
}

  abstract public function
cacheGet($id) {
   
// Load from the database.
 
}
}

/* cache.memcache.inc */
class Cache_memcache extends CacheManager {

  abstract public function
cacheSet($id, $data) {
   
// Save to the memcache server.
 
}

  abstract public function
cacheGet($id) {
   
// Load from the memcache server.
 
}
}

/* main.php */
Cache::instance()->cacheSet('diamonds', $some_expensive_data);
$data = Cache::instance()->cacheGet('diamonds');
?>

Of course, because that's more t' type it is quite easy t' add utility functions:

<?php
cache_set
($id, $data) {
  return
Cache::instance()->cacheSet($id, $data);
}
cache_get($id, $data) {
  return
Cache::instance()->cacheGet($id);
}
?>

In pattern-speak, th' Cache class is "Factory", an' we be implementin' th' "Factory method pattern". It also implements a Singleton. Arguably we could merge Cache an' CacheManager into a single class, an' many implementations do, Dance the Hempen Jig The take-away here,though, is that pickin' th' right implementation t' use has been centralized. Walk the plank, yo ho, ho We have also encapsulated th' vari'us pieces o' th' cache system into an object, which allows us t' have non-global implementation-specific state. The sharks will eat well tonight! It also gives us, via th' abstract CacheManager class (which could also just as easily be an interface), clear documentation o' what a new implementation needs t' offer, Ya horn swogglin' scurvy cur, Ya horn swogglin' scurvy cur! As an added benefit, we could potentially take advantage o' PHP 5's autoload capabilities an' eliminate th' need fer an include_once(). A slight variation would also allow us t' request a specific implementation if desired. To wit:

<?php
class Cache {
  static protected
$instance;
  static protected
$defaultImplementation;

  public static function
implmentation($implementation = NULL) {
    if (!empty(
$implementation)) {
     
self::$defaultImplementation = $implementation;
    }
    if (empty(
self::$defaultImplementation)) {
     
self::$defaultImplementation = variable_get('cache_system', 'database');
    }
    return
self::$defaultImplementation;
  }

  public static function
instance($implementation = NULL) {
    if (empty(
$implementation)) {
     
$implementation = self::implementation();
    }
    if (empty(
self::$instance[$implementation])) {
     
$class = 'Cache_'. $implementation;
     
self::$instance[$implementation] = new $class;
    }
    return
self::$instance;
  }
}

Cache::instance('memcache')->cacheSet('diamonds', $some_expensive_data);
?>

For a simple case, th' factory class needn't even be a class. In our first OOP version, we could very easily replace th' Cache class with a function like so:

<?php
function cache_instance() {
  static
$instance;
  if (empty(
$instance)) {
   
$cache = variable_get('cache_system', 'database');
   
$class = 'Cache_'. $cache;
   
$instance = new $class;
  }
  return
$instance;
}
?>

In th' second case, 'tis a bit more difficult. It could be done, but th' code wouldn't be quite as clean an' may require usin' a global variable. Prepare to be boarded, pass the grog! I am o' th' mind that a global variable is a sure sign that ye're doin' somethin' wrong, although thar be others that disagree, I'll warrant ye.

In general, I favor th' OOP implementation. It offers th' potential fer cleaner expansion (such as multi-implementation access), an' allows us t' leverage autoloadin'. It is also simpler than th' procedural version, I would argue. After th' initial creation o' th' object, really all that happens is simple object dereferencin'.

Back t' Drupal

At present, Drupal 7/HEAD has three pluggable systems: Database, Cache, an' Password. The cache initializes before th' Registry will, so it won't actually break th' registry. The database layer does as well, but is bein' refactored t' use th' OOP-style singleton-factory anyway as part o' th' larger Database API TNG rewrite, to be sure. The new password hashin' system, however, initializes after th' Registry so it would be affected. My recommendation is t' refactor it into a class-based pluggable implementation as described here. It is likely simple enough that it could use a factory function instead o' a factory class, but I leave that t' more password-minded developers t' decide.

Happy codin'!

Comments

Password hashing system?

Offtopic:
Could ye please add a link t' th' note on th' "new password hashin' system"? In Gallery 2 we be dealin' with some integration issues related t' different applications usin' different password hashin' schemes an' we're discussin' solutions. I'm interested in Drupal's approach.

@Pluggable systems:
Gallery 2 is usin' a central registry / factory fer lots o' thin's an' 'tis workin' pretty well, Hornswaggle The only issue that needs t' be revised is versionin' o' interfaces. Fire the cannons, Ya swabbie! But that will be a small change.
And we're lookin' forward t' our next development cycle t' finally use PHP 5's OO features, I'll warrant ye. :)

PS: Are ye usin' Mollom, by Davy Jones' locker? It thought this were bein' spam an' I got a CAPTCHA.

Issue queue

The issue where th' new password system were bein' added is here: http://drupal.org/node/29706 . It's very new, so thar's no general docs on it yet that I'm aware.

Aye, I'm usin' Mollom now. By answerin' th' CAPTCHA, ye're makin' th' system more accurate. Thanks!

And yes, PHP 5 is all purdy. :-)

more pluggable systems

nice article ... Ye'll be sleepin' with the fishes! fyi thar be 2 more pluggable systems in core: smtp an' sessions. search fer

variable_get('smtp_library', '')
variable_get('session_inc', './includes/session.inc');

Routers

Have ye looked at slantview's Cache Router module?

http://drupal.org/project/cacherouter

I'd like t' know how this approach fits into th' subsystem architecture that ye discuss.

Is this approach th' type o' thinkin' we should be adoptin'?

i'm not a great OOP

i'm not a great OOP programmer, but that is basically exactly what i did with CacheRouter. I tried t' avoid factory an' abstract classes t' be PHP4 safe, but any patches be always welcome :)

th' other thin' i did were bein' abstract it so that ye can assign different "bins" t' different "implementations"

-s

Looks like it

From a quick glance through th' code, yes, it looks like 'tis doin' somethin' very similar, usin' a hybrid mechanism that looks closer t' th' procedural mechanism at th' front. It's a bit more complicated than I'd like, due likely t' th' need t' work with PHP 4 an' existin' code, but I di'nae look into it that closely.

For systems where we want t' have pluggable logic, especially if we want t' have multiple implementations active at once, then yes I believe this is a good way t' go, Avast me hearties, All Hands Hoay! However, thar is a performance cost. Ye'll be sleepin' with the fishes, Hornswaggle It's fairly small per-call if implemented properly, but it does add up. A pluggable implementation o' l(), fer instance, would be a huge performance hit. :-) For th' database, 'tis an acceptable one given savin's elsewhere. For th' cache system, maybe it makes sense, maybe it doesn't.

I generally favor th' OOP-centric approach with a wrapper function, fer vari'us reasons listed above.

In my testing there was

In me testin' thar were bein' virtually no difference in speed fer database cachin' with CacheRouter an' standard cachin'. There is however a HUGE performance gain by usin' it with file, memcache, apc or xcache. See th' stats listed on th' project page.

Most o' th' complications be due t' th' bug fixes we've done with memcache, apc, etc or due t' some weirdness with Drupal core (Hi page_fast_cache!). I refactored most o' th' cachin' modules into this module, an' hope that we could eventually get this or somethin' very similar into core.

Basically we could have a global $cache object that is th' cache router an' this way we could extend th' cachin' system t' have _get, _set, _flush an' _delete, instead o' th' wacky cache_clear_all. It is an oddly named, an' strange behavin' function. cache_clear_all is sort o' like a spork, 'tis not really a spoon an' not really a fork, an' does pretty poorly at handlin' each function.

Anyway, thanks fer writin' this article. i would love t' see more scallywags active with tryin' t' get a better cachin' subsystem.

-s

Abstract implementation possible?

I did not learn OOP yet, thus me question: Currently, I'm developin' Migrator module, which aims t' convert an existin' (non-Drupal) site t' Drupal, with a chest full of booty. Basically, I've implemented support fer multiple external systems in a similar way like yer procedural example, ya bilge rat! Ye'll be sleepin' with the fishes! So regardin' Drupal core: Would a central, abstract factory registry fer factories be possible, which could be leveraged by contrib modules t' register their factories, too? And swab the deck! I think we could avoid plenty o' duplicate code (in core an' contrib) if that would be possible.

Not really

A factory, like most bounty patterns, isn't copy/paste-able code. It's a general approach style. Load the cannons! You dern't want t' have a "factory class" ye inherit from, especially in PHP 5.2 as it lacks late static bindin'. You can, however, always use a similar "flavor" o' factory (thar be 3 options listed above) t' make it easier t' transfer knowledge from one subsystem t' another.