Drupal 7 battle plans

Submitted by Larry on 2 February 2008 - 1:58am

So, Dries wants to know what our Drupal 7 battle plans are. I think this is the first version where I'll have explicit battle plans before hand rather than just "whatever I come up with along the way". :-) So, for those playing along at home, here's my goals for Drupal 7:

  • Move Drupal to PDO
  • Introduce a function registry
  • Begin to solve the "Data API problem"
  • Whatever I come up with along the way

Move Drupal to PDO

This should come as no surprise to anyone, especially given previous blog entries and DrupalCon presentations I've given. :-) There is little doubt at this point that PDO is the way forward for PHP database handling in general. It's not perfect, and there are stil some inconsistencies between different databases (most notably with Oracle and LOB handling), but it's a much cleaner interface than the many disparate APIs that came before it. It also offers a number of nifty features that people have either been asking for or that we could find use for.

Work on this front is about 50% complete already. I need to refactor a few things yet for easier maintenance, and then get to work on Postgres and SQLite drivers for a PDO-based extension. How these changes with affect the Schema API I am not sure yet, and that's one thing I plan to discuss with Barry Jaspan next week as part of the Data API pow-wow/sprint Palantir is hosting.

And before anyone asks, Oracle is not on my immediate radar for two key reasons: 1) It's a difficult system to support, because it requires different field handling than other database systems; 2) It's not Free/free, therefore I have no way of testing against it in the first place. My hope is to be able to offer an API behind which someone else can implement Oracle support.

Introduce a function registry

I've bounced this idea off of a few other developers, and the response so far has been positive. In Drupal 6, we introduced the concept of menu handler files, where functions called from a menu handler could live in a separate file that is included conditionally only when needed, thus decreasing the amount of code to be parsed on any given page load. (We later added it to theme functions, too.)

The next step is to remove the manual declaration step. PHP is capable of parsing itself, so we can have Drupal introspect itself and build up a list of all functions or classes that are called indirectly. We can then lazy-load anything that is called indirectly: page handlers, theme functions, forms, nearly any hook, nearly any alter-function, any class or interface (which PHP 5 supports properly now, recall)... almost all of Drupal. Think of it as the love child of Drupal 6 menu-split and Karoly's old "split mode" concept.

With proper code organization, then, we can eliminate most of the bootstrap phase of a Drupal page request, which is well-documented (Yay, GHOP!) as being the slowest part of the request lifecycle because we're loading so much code that we never actually use. As a nice bonus, we also get a fully-dynamic registry of what modules implement what hooks; there's all sorts of things we could do once we know that.

I already have working proof-of-concept code in my sandbox, implemented as a module. The only catch at the moment is that it suffers from the same "death by a thousand small queries" problem as the path module currently does. Feedback on how to solve that issue is most welcome.

The Data API problem

I am far from the first person to want to tackle this issue. It's well-accepted that we need to somehow clean up our "entity systems": Nodes, Users, Files, Comments, etc. The APIs are wildly inconsistent and in current incarnations are very fragile, slow, and too-tightly-coupled to other systems like the Form API.

I have no expectation of being able to solve this challenge alone, of course. That's why next week several Drupal developers will be camping out in the conference room at Palantir.net here in Chicago to see if seven heads are better than one in establishing long-term battle plans for data handling. The guest list includes Larry Garfield, Barry Jaspan, Karoly Negyesi, Karen Stevenson, Yves Chedmois, Moshe Weitzman, and Nedjo Rogers. That is, six really smart luminaries of the Drupal world plus me. :-) Afterward, Karoly and I will be spending some quality time together to try and hash out code along whatever lines we come up with at the data-sprint.

Expect a full run-down on what we're able to accomplish (if anything) in Boston. You are going to be there, right?

Whatever I come up with along the way

On the off chance I actually have time to think about anything else, I'd also like to see about getting more fine-grained filter control into core. In nearly every case, I want filters to apply to only certain node types. Right now I'm solving that with a very touchy module I wrote called filter by node type, but various other people have similar modules floating about contrib. That to me is a sure sign that such functionality belongs in core. Ideally, every textarea on a node, including the body as well as any CCK textfields, could specify a limited set of input formats that it supports. If anyone else wants to pick up this issue, please do so! I won't be offended, trust me. :-)

What else? I'll figure it out once I use Drupal 6 enough to know what annoys me so that I can fix it.

Drupal 6 is going to rock. It's going to rock so much that it will be really hard to make Drupal 7 rock even more, and that's going to take the combined effort of over 1000 Drupal developers.

I love a challenge...

tostinni (not verified)

2 February 2008 - 12:26pm

Hi Larry,
These are really great steps for (we hope) Drupal 7.
Regarding Oracle/PDO support, yes there's some problems with Oracle special handling of LOB columns. But the good news is that there's a free version of Oracle to try these changes.
It's called Oracle 10g Express edition, you can get it there: http://www.oracle.com/technology/products/database/xe/index.html . It came with some limitations but it should be enough to debug, also if you want Every Oracle version can be tried for free during 30 days but they're far more heavy ;)
I'm looking forward to see these change coming.
Good luck.

I wonder wether the PHP5 OO autoloading mechanism is something like your sugested function registry. Besides the fact that PHP5OO an Drupal ... don't like each other too much: Would you say, a function registry does "the same" job as the autoload mechanism, and do you have any idea which of both my be faster?

First off, it is untrue that "PHP 5 OO and Drupal don't like each other too much". Until Drupal 7, PHP 5 OO has not even been a consideration, and for a variety of reasons both technical and cultural we have shied away from PHP 4's OO implementation. Drupal 7 will open up the far more powerful PHP 5 OO implementation, which I believe should be leveraged in places where appropriate.

Also, the registry, as I have implemented it, handles class autoloading, too. Most __autoload() implementations assume a Java-style mapping of class -> file. That is, class Foo_Bar_Baz will live in includes/Foo/Bar/Baz.class.php. That's actually rather limiting, especially as one-class-per-file is sometimes too slow because classes don't need to be split out that far. Instead, we include classes in the introspection phase and register an spl_autoload callback. That callback, which gets called by the PHP engine automatically when a class is being autoloaded, will then use the registry to include the class wherever it lives; we're then not bound to the class -> file mapping.

The language will auto-load classes for us as needed. For functions, we will need to do so ourselves. Fortunately most of Drupal is called indirectly, so we can insert our own "manual autoload" into the indirection routines. So it's sort of the same job. That doesn't mean it's a direct replacement, however. Switching something to classes just to get autoload functionality is bad design, as objects are not always a good "fit" for a given problem.

I don't think there's any one page that lists all the problems with the current implementation, since they're so widespread. There's a dozen issues in the queue, pages on groups.drupal.org, etc. that go into detail about parts of it. My recent Data API Musings are one, but by no means the only. Basically it's just understood by anyone who's tried to do anything particularly fancy or complex with the current node structure that it's very limiting and fragile.