Drupalcon rocks even more than Drupal, and how to make PHP 5 happen

Submitted by Larry on 26 March 2007 - 11:13pm

So I'm back from OSCMS 2007, and it was a blast. I'll provide a more complete (and illustrated) writeup later, but for now suffice to say that Drupal developers are by and large totally cool people on top of being very smart cookies.

A lot of people have been blogging about PHP 5, too, and how Drupal needs to move to it or keep PHP 4 compatibility or whatever. One of the most important things to come out of this Drupalcon, as far as I'm concerned, is that I think we really do have a picture of how we can make it happen.

I had the opportunity to speak to Rasmus Lerdorf over lunch on Thursday about PHP 5, and pointed out the third-point of the chicken-and-egg upgrade problem: the PHP development team is still providing security updates for PHP 4. They could drop PHP 4 support as well, not just developers, and that would apply even more pressure on web hosts to upgrade than Drupal ever could. Of course, he pointed out that with 80% of the installed PHP base still running PHP 4, dropping security support for it would be rather irresponsible. So we're all stuck.

The subject came up again during Dries keynote. (OK, it wasn't officially a keynote, but close enough.) The PHP dev team wants Drupal, Joomla, and siblings to drop support for PHP 4 and use their weight to strangle the life out of PHP 4 so they can stop supporting it, but most of the Drupal team doesn't think we have that sort of weight. Can't we have our cake and eat it too? I say we can. Specifically, there's three things we can do:

Explicitly allow PHP 5-only contribs
There's a handful of contrib modules now that require PHP 5, but there's no special way to specify that. If we can make it explicit that contrib modules can be PHP 5-only, then we can slowly, in bits and pieces, create a growing pressure to move to PHP 5. Drupal will still work in PHP 4, but some modules may not. You want the fancy XML-slicing-and-dicing modules? Well, then you'll need to upgrade. I already suggested this on the devel list. It looks like there's some technical issues to work out with project module, but I don't think they're unreasonable. Hopefully we can make that happen and provide a small nudge at small (and easily adjustable) compatibility cost.
All your arrays are belong to Drupal
Jeff Eaton was shopping around an idea on Saturday that got the Dries seal of approval on Sunday and is already partially in HEAD. He can explain it far better, but in short it involves using nexted arrays all the way down to the page and node templates, replacing node_view(). While that has various advantages, one of them is that it allows themers to use Steven Wittens' fQuery module to make it easier to slice and dice a template. fQuery allows jQuery-like CSS 3 syntax to traverse PHP arrays, but is PHP 5-only for various reasons. Want to use your CSS-skillz to make your site look awesome? Oh, well, you need PHP 5 for that.
PDO
I started shopping around an idea of my own on Saturday, too. After getting a nod from Dries, I started work on Sunday and made decent progress. The idea is to add a PDO database driver for Drupal 6, with small mini-drivers for MySQL, Postgres, and potentially other databases. PDO is the new PHP database access API, which provides a common user-space interface to different database drivers. More importantly, it provides unified support for prepared statements. Drupal implements its prepared statements in user-space using preg_replace_callback(), which is, as Dries has noted before, quite slow. By pushing the prepared statement syntax down into the C engine level, we get a nice performance boost as well as better security (prepared statements inherently know how to escape different data types) and easier support for new databases by just making sure our queries work. In the future, using PDO exclusively means simpler code (user-space doesn't have to care about data types) and the potential for using named parameters (those keyed arrays Drupal uses everywhere). So what's the catch? PDO is PHP 5-only. :-) Want your database layer to get faster? Use PHP 5. I have high hopes I can get this ready in time for Drupal 6.

So what does all of that buy us? It allows us to have our PHP 5 and eat it, too. Drupal itself and most of contrib will run just fine under PHP 4, but Drupal will run better with PHP 5. And you want your site to run better, don't you?

I believe this plan is both achievable in the Drupal 6 timeframe and without the overhead of my ill-fated PHP_Compat patch. Drupal puts pressure on the user-base and host-base to use PHP 5, finally, while those stuck on lame hosts that don't yet support PHP 5 aren't abandoned. Everyone wins.

Think we can do it? I'll try to get patches ready to be reviewed as soon as I can. :-)

Regarding PDO and your notes and observations. It's true that preg_match_callback() is somewhat expensive. This comes from two fronts. Firstly, it's actually a pretty heavy handed function when all you really need is the %d%s%b%% etc to look for. Secondly, it makes callbacks into user-space from the PECL space and that's expensive (a callback for every token processed).

I wrote a C PECL that does db_table_prefix() and db_query() (and then commented them out of core to avoid the obvious "cannot redefine function" error). My basic banchmark tests showed between 55% and 80% increase in speed for the PHP portion of code. So there's room for improvement but I can't see that room in user-space. I tried various other methods of implementing db_query() and found the "slow" preg_match_callback() method to be the fastest.

Regarding PDO and prepared statements. The idea of prepared statements sounds good (save on the DB layer having to compile your SQL each time, that's a good thing). But the biggest drawback is that prepared statements don't get into the SQL cache (well, in MySQL anyway). And that's pretty bad as the MySQL SQL result caching can give a not insignificant boost. I like prepared statements but I like the cache more ;)

</2p>

I didn't know that about MySQL. That would suck, although I suppose it depends on the site. The individual pages that tend to be slowest for me are usually the same set of queries (such as a node_load()) getting called a zillion times. That's where prepared statements would have the biggest boost. I guess we'll have to find out from benchmarks.

Perhaps there's a setting on the driver that would affect the query cache. I'll have to look into it further.

Oh if only we could require custom PECL code. Port Drupal to C; it would be much faster then. :-)