Recently I've been talking up various ideas for pluggable subsystems in Drupal in IRC and the other usual haunts. Ideas have been percolating in my head, but so far I have been remiss in actually writing them down. Yesterday, however, I had an epiphany to solve the primary issue I was trying to work out, so I present a hopefully workable RFC (for real, not IETF version) for pluggable subsystems in Drupal.
I am posting this over to Planet PHP as well to invite commentary from those who aren't already embedded in the Drupal mindset. :-)
I have been running Mollom as my spam-fighter on this site for not quite two months now. It's been fairly effective overall. The nifty flash meter shows me just how bad the spam problem is (good grief, 593 blocked spam messages just on 15 May!), and I haven't gotten any spam in my comment list yet.
That is, until today, when a new form appeared.
Some time ago, on a lark, I wrote a Drupal module called Google Search. It was mostly so I could experiment with the then-new Forms API, and was one of those "one night" projects we all do from time to time.
At DrupalCon Sunnyvale 2007, Rasmus Lerdorf chided Drupal on spending over half of its request time on just the bootstrap process. As a GHOP Task , Cornil did a performance analysis of Drupal and found its two largest performance drains were the bootstrap process and the theming layer. Quite simply, Drupal spends too much time including code.
Drupal 6 has the beginnings of a solution. Page handlers, the most unused code in Drupal, can now be split out into conditional include files and the menu system is able to conditionally load just the file it needs for a given page request. Based on earlier benchmarks, just that code shuffling netted Drupal 6 a 20% performance boost. The downside, however, is that it does require the module author to explicitly specify file to be included, and the syntax for it is just a little bit annoying what with the file name and file path being separate keys on the menu handler.
For those who haven't noticed yet, the latest in a expected long line of Drupal books for this year has been published: David Mercer's verbosely-named "Building Powerful and Robust Websites with Drupal 6". It is not a book for the experienced Drupaler; it's target market is people picking up Drupal, and the web for that matter, for the very first time.
Personally I think David has done a great job with it, but then I am biased; I was the tech reviewer for the book. :-) If you want an unbiased opinion, pick up a copy yourself and give it a read. Then you'll know how good it is. As an added bonus, 5% of all sales through Packt's web site are donated to the Drupal Association. Everybody wins!
By now you may have heard the news from Paris that a unit testing framework has landed in Drupal core. A huge shout-out goes to everyone involved. I particularly want to note the work that's been put in by former GHOP students and members of the GHOP team. It's amazing to see how far some people have come in a short time, despite still having homework to do. :-)
The next step, of course, is to make Drupal itself fully-tested. That poses a number of challenges, particularly for unit tests. Because I'm sure others will be singing the (well-deserved) praises of the testing team, I want to take a moment to focus on that next step and one important approach: Testable APIs.
I recently had a discussion with Peter Wolanin about pluggable subsystems. (You can tell this is going to be an exciting entry already, can't you?) Drupal has supported a few pluggable subsystems for a long time, namely the database and cache systems. In both cases, they work on a very simple principle: Conditionally include one of two (or more) files that defines the same set of functions but with different bodies.
That's all well and good and simple, but has some very serious limitations. Most notably, because the same function name is defined twice you can never load multiple versions at the same time. That becomes a problem if you want to, say, connect to a MySQL and PostgreSQL database in the same page request. In addition, Drupal 7 is on track to include a fully introspective code registry for conditional code loading, which, based on earlier benchmarks, should be a huge performance boost. The Registry, however, assumes that all code resources (functions, classes, and interfaces) are globally unique throughout Drupal. Having a given function name defined twice will confuse the poor thing.
That is not an insurmountable problem, or even, truth be told, a difficult one. It simply requires switching from a simple include to a more formal mechanism. There are, in fact, several ways that can be done, so to further the education of the world at large (and probably embarrass myself a bit in front of other architecture buffs) I decided to write a brief survey of simple pluggable mechanisms.
I've been meaning to upgrade the Akismet module on this site for a while now. Of course, I waited so long that another option just appeared, one I've been waiting to see for a while: Mollom.
The other other project from apparent insomniac Dries Buytaert, Mollom is a content filtering service similar to Akismet. I've actually been familiar with it for a long time, as the GoPHP5 project has been running a Mollom private beta since last June. In fact, when Acquia was announced my first thought was "wait, what happened to Mollom?"
A recent thread on the Drupal documentation list has brought up once again the "why don't we have a wiki on Drupal.org?" question. It comes up regularly; you can set your watch by it.
What I've never understood is why anyone would want to take a step backwards from Drupal's handbooks to a simple wiki. How would removing features and capabilities help Drupal's huge pool of documentation?
What exactly makes something "a wiki"? Let's examine:
One of the major changes in Drupal 6 (where "major" is defined as "worthy of a mention in Dries' keynote") was a new feature of the menu and theme hooks. The newly introduced "file" and "file path" keys in those hooks' respective retun arrays. allow them to define files that get included conditionally, only when needed. In theory, that should be a big performance boost; page handlers are virtually never called except for on the page they handle, so loading all of that code on every other page is a waste of CPU cycles. Of course, there is also the added cost of the extra disk hit to load that one extra file we need. Modern operating systems should do a pretty good job of caching the file load, but that may vary with the configuration.
So just how much benefit did we get from two dozen fragile patches that were a glorified cut and paste? And is it worth doing more of it? Let's benchmark it and find out.