Programming language trade-offs

Submitted by Larry on 30 September 2010 - 7:04pm

This article is also available in Serbo-Croatian

This article is also available in Dutch

There has been much discussion of late in Drupal about Object-Oriented Programming. That's not really surprising, given that Drupal 7 is the first version that has really tried to use objects in any meaningful way (vis, as something other than arrays that pass strangely). However, too much of the discussion has boiled down to "OMG objects are inflexible so they're evil!" vs. "OMG objects are cool, yay!" Both positions are harmfully naive.

It is important for us to take a step back and examine why one particular programming paradigm is useful, and to do that we must understand what we mean by "useful".

Programming paradigms, like software architecture, have trade-offs. In fact, many of the same methods for comparing architectural designs apply just as well to language design. To do that, though, we need to take a step back and look at more than just PHP-style objects.

Warning: Hard-core computer science action follows. If you're a coder, I recommend getting a cup of $beverage before continuing, as it could take a bit to digest although I've tried to simplify it as much as possible. There's fairly little Drupal-specific stuff here so hopefully it should be useful to any PHP developer.

Approaches to programming

Every programming language is, fundamentally, a way of encoding logic. There are different was to encode logic in such a way that a computer can process it, and every language has a slightly different spin on the subject. In the abstract, though, we can identify several general paradigms, a number of which are relevant to us as PHP developers. (The Wikipedia links below have far more detail than I can go into here.)

Procedural programming
Procedural is frequently the first programming style taught. A program is organized into "procedures" (aka functions or subroutines) that then operate on data. The code is generally written in an imperative fashion: Do X, then do Y, then do Z. Those subroutines frequently have side-effects; that is, some program state has permanently changed in a way that lasts beyond the life of the subroutine.
Functional programming
Functional programming is as old as procedural programming. In fact, the two approaches were the original schism in programming conceptualization. In purely functional programming one does not write a set of steps for the program to follow. Instead, one writes mathematical functions that relate to each other. That has a number of important attributes: Most notably, purely functional programs are incapable of side-effects. Functions have no state of their own, and in fact once a variable has a value it may not be changed again. Ever. The output of a function depends exclusively on its explicit inputs. Generally, functional languages also treat functions as first-class objects along-side other more familiar variables like ints, strings, and so forth.
Declarative programming
Declarative languages avoid specifying how the computer should do something in favor of saying what it should do, somehow. This is extremely powerful for simplifying common tasks, but can make developing new-and-innovative capabilities harder; they still need to be translated from the declarative form into an executable approach somewhere. SQL itself is the most obvious example for web developers, but there are many more. Even some markup languages can, arguably, be considered declarative programming. (See also: HTML5 or SVG)
Classic Object-Oriented programming
In class-based OO (or "Classic" as I tend to call it), rather than logic being encapsulated into simple subroutines that operate on arbitrary data passed to them logic and data are bundled together into "objects". These objects are treated by the outside world as a single black box. Manipulation of data doesn't happen directly but happens through the logic bound to that data. That is, via methods on the object. Popular examples here include C++, Java, and PHP.
Prototypical OO programming
This classless variant of OO is in some ways closer to functional programming. Javascript is the most widely known example of this style. Since it's not really possible in PHP we'll leave it be for the time being. I separate it from "Classic" OO mostly to point out that PHP's approach to OO is by no means universal.
Aspect-oriented programming
A relative newcomer on the scene, AOP is based around creating join points between different parts of a system. That is, places where one logical routine can inject itself into another without having to modify either routine directly.

(Note: I'm sure some purists will say that I'm grossly over-simplifying one or more of the styles above. They're probably right, but for the sake of argument I'm only looking at some aspects of those approaches. If you want a more complete treatment, that's what the links are for. I do recommend reading them with an open mind.)

There's a very important fact about the above designs that is important to keep in mind: They're equivalent. It's been proven mathematically that procedural and functional languages are equally expressive; that is, any algorithm you can implement in one can be implemented in the other. OOP and AOP are essentially just outgrowths of procedural programming and frequently are implemented in multi-paradigm languages, so anything you can do in procedural language can be done in an OOP language or AOP language, or vice-versa. Declarative programming is the odd man out as many declarative languages are not turing complete, although many are if you try hard enough.

So if all of these programming paradigms are equivalent, why bother using one over another?

Language focus

Simple: Each approach has different trade-offs that make it easier or harder to write certain types of algorithms. Not "possible"; Any functionality you can implement in functional languages can be implemented in an aspect-oriented language, and vice-versa. They make it "easier". The amount of code and the amount of incomprehensible complexity involved will vary greatly. What makes different paradigms easier to apply to certain algorithms? What they don't let you do.

That's right. What makes a programming style easier is what it doesn't let you do. Because if you know for a fact that certain things are not possible, you can make assumptions based on that impossibility that make other tasks easier.

Functional: Logic centric

For example, in purely functional languages you know for a fact that the same set of inputs to a given function will always produce the same result. That means the compiler itself can optimize away multiple calls to the same function with the same parameters. Go ahead and call a function multiple times. Don't bother caching the result. The language semantics themselves will do that for you without any thought on your part. Haskell, I believe, does exactly that. It also makes Verifiability easy; you can mathematically prove the correctness of a particular function independent of the rest of the system. That can be very useful in a fault-intolerant system, such as, say, nuclear reactors or air traffic control where bugs become really really dangerous.

Since you know, for a fact, that a function will not affect any code outside of itself (aside from its return value), there's no requirement that a function have access to any data except its own parameters. It doesn't even have to be in the same memory space as the rest of the program... or even on the same computer. Witness Erlang, where every function can run in its own thread, or its own process, or even in a process on a different computer without syntactic changes. That makes programs written in Erlang extremely distributable and scalable, precisely because functions are incapable of producing side-effects.

Of course, for some use-cases the program structure functional languages require becomes horribly nasty. That's especially true for programs that are based around manipulating state over a long term. That's the trade-off: a clear, logically simple structure that makes complex algorithms easy to build right and scales well but makes stateful systems harder to build.

Procedural: Instruction centric

Procedural programs were the other major fork in programming language theory. Procedural programming can start very simple as just a list of instructions, broken up into chunks (subroutines, or functions). Most also include global state in some form or another in addition to locally scoped state.

What procedural languages don't let you do is heavily segment your program. There is one big pool of subroutines that can be called pretty much at any time. You cannot bind a given subroutine to just certain data or vice versa. That makes the code highly unpredictable, as your function could be called from quite literally anywhere at any time. You can't make assumptions about the environment you're running in, especially if your system makes use of global variables (or their close cousin, static variables). There's no way to hide data, there's no way to control when a given routine can or cannot be executed, there's no way to protect yourself against another developer hacking his way into a subroutine that wasn't designed to be hacked into.

Which of course is also its power. Because it's so low-level and has no safeguards, you can hack your way into (or out of) most situations with enough effort. Because you're prevented from hiding data, you get a great deal of flexibility. That can be very good in some cases. On the other hand, that means that in general procedural programming lets you make no assumptions about the context of your system or its state.

Because you have such limited control, it's extremely difficult to do any meaningful form of unit testing. You can do functional testing (that is, testing of functionality at a high level) or integration testing, but you have no clearly separable units to work from.

Object-oriented: Behavior centric

There are lots of variations on object-oriented languages, each with their own subtleties. For the moment, we're concerned only with Class-and-Interface languages such as PHP. The interface part is important: In a Classic OO language, individual primitive values are irrelevant. The interface to an object, as defined by its class and interfaces, is what matters. The class forms a completely new data type with its own semantics.

Just as a string primitive has its semantics (e.g., length) and possible operations (split, concatenate, etc.), so too does an object. The internal implementation of the string is irrelevant: it may be a hash table, it may be a straight character array. As someone making use of it you don't know, nor do you care (except in C, of course). Similarly with an object, it has behaviors as defined by its methods. The underlying implementation is irrelevant.

The data within the class is tightly coupled to the class; the class itself is (if done correctly) loosely coupled to anything else. Data within the class is irrelevant to anything but that class. Because it is hidden away ("encapsulated" in the academic lingo), you know, for a fact, that only selected bits of code (in the same class) are able to modify it. Unlike in procedural code, you can rely on the data not changing out from under you. You can even completely restructure the code. As long as the interface doesn't change, that is, the behavior, you're fine.

That's an important distinction. In OO, you are not coding to data. You're coding to behavior. Data is secondary to the behavior of an object.

Because you have isolated data behind behavioral walls, you can verify and unit test each class independently. That is, assuming you've properly isolated your object. A lot of code doesn't properly do so, which defeats the purpose. (See my previous rants on dependency injection.)

Aspect-oriented: Side-effect centric

And finally we come to new kid on the block Aspect-oriented programming. In some ways, AOP is the diametric opposite of functional programming. Where functional programming tries to eliminate side effects, AOP is based on them. Every join-point in AOP is a big red flag saying "please do side-effects here". Those side-effects could be all sorts of things. They could modify data, they could change program flow, they could initiate some other sideband logic and even trigger further side-effects.

What AOP offers is exactly that: The ability to modify a program without modifying a program. Once a join-point is established, you can alter the data or program logic at that point without changing any existing code. That provides a great deal of flexibility and extensibility, but at the expense of control.

Once you introduce a way to allow 3rd party code to modify your logic flow or data, you surrender any ability to control that logic flow or data. You can no longer make assumptions about your state, because you've built in a mechanism to allow your state to change out from under you. The way you compartmentalize your code is to make it impossible to fully compartmentalize your code. (Ponder that one for a moment...)

Trade-offs

Functional approaches emphasize Verifiability, Testability, and Scalability at the expense of Modifiability, Extensibility, and in some cases Understandability.

Procedural approaches emphasize Modifiability, Understandability, and Expediency at the expense of Testability, Verifiability, and if you're not careful Maintainability.

Object-oriented approaches emphasize Testability, Modifiability, and Scalability at the expense of Extensibility, Expediency, and if you have a poor design Understandability.

Aspect-oriented approaches emphasize Modifiability, Extensibility, and Expediency at the expense of Testability, Verifiability, and arguably Understandability.

Oh great, so which one do we want to use? Which approach is best? The one that best fits your use case and priorities, of course.

Multi-paradigm languages

Because all of these approaches are perfectly viable depending on your use case, most major programming languages today are multi-paradigm. That is, they support, at least to some extent, multiple approaches and ways of thinking about program logic.

PHP began life as an entirely procedural language. With PHP 4 it started adding object-oriented capabilities, although those didn't really come into their own as a viable alternative until PHP 5. PHP 5.3 introduced anonymous first-class functions, which while not pure functional programming since they still allow variables to be changed do allow programmers who are so inclined to write in a more functional way.

Although most aspect-oriented implementations are built atop object-oriented models, PHP supports procedural-based AOP. In Drupal, we call it hooks. module_invoke_all() becomes a joint point, and a hook implementation becomes a pointcut.

(I am by no means the first to call Drupal's hook system a form of AOP. I just think it's a particularly good way of describing them.)

To be fair, without native syntactic support hooks are a rather clunky, hacked-up poor man's AOP, but conceptually it is still AOP. They have the same implicit trade-offs: Extremely flexible when used appropriately but totally destroy any hope of isolating a system to unit test it or do interface-driven development.

The fact that they're also bolted on top of a non-AOP language but not documented as being AOP, or applied consistently, is also a major stumbling block for new developers, especially those who have been brought up in a predominantly OO world.

Just as it's possible to emulate AOP in procedural code, it's possible in object-oriented code as well. There are many OOP patterns that give you all the same flexibility as AOP, for instance, in sometimes more verbose ways. Observer and Visitor patterns come to mind in particular. Again, it's not a question of can you implement a given design but how easily you can do so, and at what cost.

Nothing forbids the mixing and matching of different approaches, either. Take Drupal 7's Database layer. It is mostly straight up OO -- Modular, dependency-injected, self-contained, interface-driven -- but throws in some AOP in the form of hook_query_alter() and has procedural convenience wrappers such as db_query(). I certainly don't claim that it's a perfect balance, but it does show how multiple approaches can be leveraged together.

Decisions, decisions

When considering how to tackle a given problem, or how to use a particular language feature, it's not enough to say "well I like X" or "approach Y is stupid". That is a naive approach, and tends to lead to spaghetti code. (Pasta exists in all languages.) Instead, we should ask what our priorities are, what we're willing to give up, and what we're willing to do in order to mitigate it. We always have to give up something. Always.

Which cost you want to pay is not always an easy balance to strike. Do you favor robustness (Testability, Verifiability, Scalability, data hiding, encapsulation, etc.), flexibility (Modifiability, Extensibility, bare data, etc.) or simplicity (Expediency, Maintainability, possibly Performance, etc.)?

Pick two.

My thanks to Bec White and Matt Farina for their input on this article.

Stay tuned for part 3 in this series, where we apply these principals of trade-off and balance specifically to OOP, and to that ever-popular subject of property visibility. :-)

Larry, thank you for the writeup.

I would appreciate if you explain how did you come up with the comparative list of trade-offs per each approach. I mean - aside from the gut feel :) I am having hard time understanding why you are saying that object-oriented approach emphasizes scalability and sacrifices extensibility. I don't see anything in OO that inherently helps scalability (which you had defined as serving high-traffic) and I am wondering how a paradigm that has inheritance at its core can be inferiorly extensible.

Also, I am not sure that putting Aspect-oriented paradigm at the same level as procedural, functional and OO is fair. There are purely procedural, purely functional and purely OO languages, but I don't know of a purely AO language. AO is usually an add-on to object-oriented languages. You can not write only aspects by definition - aspects need to hook into something. Maybe a technical detail but seemed important to me.

Thank you.

I actually had a hard time with that section, because I was trying to make a balanced statement. :-) Much of it is just gut feel, or in relation to other paradigms mentioned. For OO, I went with the fact that making "alternate backends" is really really really simple in OO, moreso than in most other paradigms. An important part of scalability is being able to simply swap out an implementation of one part of the system for something more appropriate for a given use case without breaking the rest of the system. That's an area where OO excels. For extensibility, OO can be very extensible if used properly but I would argue that AOP is, by nature, more extensible. In this case I was using "extensible" to mean "can be modified without changing existing code", something that AOP, by design, does better than traditional Classic OO (at its own costs, of course, as noted above).

As far as giving AOP "top billing", I think that is reasonable. I've never worked in a language that was "purely OO" that wasn't also, at some level, procedural. Within a method, all OO languages I've seen are still procedural, at least inso far as they're not locked down the way functional languages are with regards to state. They also tend to have a procedural-style call stack, whereas many functional languages take advantage of their in-built assumptions to have a call chain instead, so that there is no "call stack overflow". (That's a gross over-simplification since I have not done major work in a functional language; I'm going by what I've read about them.)

So if OOP qualifies as a first-class approach alongside procedural and functional, AOP should as well. Both of them are largely extensions of procedural approaches that introduce new capabilities. There are other paradigms that build off of functional models that I didn't go into here since they're irrelevant for most PHP developers.

Excellent article, thanks for taking the time to write it!

This part about the difference between OO and AO in terms of scalability and extensibility reminded me of the explanation from the rough cut of the Miles' Drupal Building Blocks, which is available on Safari Books Online. They define hooks as part of the procedural architecture, while I think it makes more sense if it is clear that it is paradigmatically AO and independent of the procedural model, but I think the distinction they make in vertical vs. horizontal is a good way to explain the difference.

The procedural model is often said to allow extension horizontally, whereas the OO model allows extension vertically. What this means is that in procedural, siblings can modify each other’s behavior, but in the object oriented model the extension is parent to child and siblings aren’t really allowed to get involved.

—p. 151, Views API

The Miles (how does one pluralize Miles anyway?) did a good job of making the OO paradigm understandable to layfolk throughout the section, definitely worth a read.

Berdir (not verified)

1 October 2010 - 2:02am

A great read as usual, a few comments... :)

AOP. I've read a few times about it but never really got it. I always thought of something that allows you to inject validation and logging into methods. Because these are the only two examples I've ever seen on that topic I think. Until I've read your short explanation and I automatically thought "OMGWTFBBQ, hooks!" ;)

"There's no way to hide data" (... in procedural languages). Not sure what exactly you mean with "hide" here, but I partly disagree. At least in PHP, where we have the static keyword which is imho similar to a private/protected property in a class. Examples:

- The reason we introduced drupal_static() in Drupal 7: The thing that we call "static cache" in Drupal. Many functions in Drupal have a $reset argument to clear that information, many others don't (most of them do now in Drupal 7). Sure the point of existence of these caches is not to hide information from outside (since it's mostly just a summary of information that exists publicly somewhere else) it is more a side effect of the real purpose.

- Other functions actually use static to hide information from outside, an example is http://api.drupal.org/api/function/db_set_active/6: "static $db_conns, $active_name = FALSE;". There is no way you can access for example the name of the currently active connection name without changing it to something (then you get the old name back) and there is really no way to see which db connections are active :)

- We have many getter/setter *functions* in Drupal since probably a long time, some examples: drupal_set/get_messages(), drupal_set/get_title, drupal_add/get_js/css, menu_set/get_active_*. Sure, the point of these functions is more to *store* information than hide it but if it were only that, we could also use $_GLOBALS. Reason is that most of these do some processing (order by weight, for example) or have logic for a default value if nobody called the setter function.

Last, about the encapsulation in our new database layer: I think the encapsulation we have is very limited, even with all those "evil" protected/private modifiers. This is because most getter methods we have return the information *by reference* so you can change directly instead of going through setter/adder functions. Sure, we *want* that but we have to be aware that we loose the ability to change they way those data is stored internally, one of the main reasons the information hiding desing principle exists :)

PS: Hm, mollom thinks my comment is spam :(

I love those. :-) I'm glad this article helped with one.

As far as data hiding, statics (at least as they exist in PHP) are not really a protected/private property because the function has no existence past the end of the last curly brace. An object does. Static variables in PHP are more akin to scope-restricted globals, and in fact I believe (although I'm not certain, and if someone knows otherwise please correct me) that they are even implemented that way. One could call that information hiding to an extent, but it's a rather backwards way of doing so.

It also means you have only two scopes for shared data: One function and global. There's no way to say "this data is local to just these three functions, but no others". Of course, we work around that with drupal_get_title() sub-calling to drupal_set_title() which has a special way that it can be called to make it return an internal static; that's one of those workarounds I mentioned to emulate a different paradigm, which is by nature uglier than doing so in a natively-supported way. (There's almost always a workaround, just not always a pretty one.)

As for DBTNG, note that the get-by-reference methods are only on SelectQuery, and only because we couldn't figure out a sensible API to allow complex manipulation of the query state in an alter hook otherwise. I'd rather we have done that but we couldn't figure out how. :-) All other query types do not do that. However, because there is a method we still have an access point if needed. In theory, we COULD change the internal structure and then if someone calls, say, $query->getFields() rewrite that method to create a copy of that internal structure in the legacy format and return that, then set an internal flag. Then on $query->execute() we check for that flag and mutate the altered legacy structure back into the new format, then continue with query compilation. Voila, we've changed the internal format without affecting the "API". It might be ugly, but it's possible. If we just exposed those properties directly then there's nowhere that we could hook (no pun intended) into it and allow such a change, no matter how ugly.

So in this case, having the method there gives us more flexibility than exposing the property directly; that is, it gives us more flexibility than a procedural approach would.

How's that for irony?

PS: I take no responsibility for Mollom. It thinks everything I post is spam for some reason. I think Dries just doesn't like me or something...

Rob Knight (not verified)

1 October 2010 - 7:12am

Personally, I assign a lot of weight to the readability of code. In a loosely-typed language like PHP, where there aren't even any real conventions about how to annotate type information, traditional Drupal procedural code is very easy to read compared to the OO alternative. I can give you a code fragment containing a call to node_view() and you know exactly what's happening there. If I give you a fragment containing $obj->view(), it can be very hard to figure out what's going on if $obj was defined outside of the fragment. It gets even worse if $obj might belong to a subclass that has overridden the view() method. The need to maintain PHP 4 compatibility has limited the scope for objects in Drupal prior to 7, so these issues have simply never arisen.

The new compromise seems to be that modules can be "OO on the inside, procedural on the outside", which does at least provide some artificial limit on the scope of objects, but I think it might become increasingly difficult to do Drupal's traditional procedural/AOP mix alongside "full" OO with inheritance and overrides. If I want to change how something behaves, should I do this with hooks or should I subclass it and use some kind of dependency injection to insert my object in? What if another module wants to do the same thing? Pity the poor developer who has to debug this without any of the static typing and code analysis tools available in Java.

I've borrowed a couple of my points above from Linus Torvalds, who recently re-stated the case against C++ in the Linux kernel here: http://www.realworldtech.com/forums/index.cfm?action=detail&id=110618&t… . I think many of the points he makes are equally valid for Drupal (at least for Drupal core), and whether by luck or design Drupal has managed to copy many of the cultural approaches to code that have worked so well for Linux. I think it's seriously worth asking why Linux and Drupal have prospered whilst architecturally "better" solutions (BeOS, Zend Framework/Symfony/Cake/etc.) have largely failed, and I think the code readability and ease of patching is a massive factor which OO might have a negative impact on.

In the previous article I called readability "Understandability". Vis, how easy is it for the next developer to grok what's going on. It's also closely related to Maintainability. That can be an important factor or not, depending on your goal. (Perl totally fails on the Understandability front, for instance, but has advantages in other areas. Perl fans: don't hate me, it's true.)

However, it's not true that an OO language is inherently less readable. Take for instance, hooks. :-) Or worse yet, pseudo-hooks, which are really magic callbacks that look like hooks but don't behave looks hooks. How many times have new developers confused hook_nodeapi (a hook) with hook_load (a magic callback)? They look the same. They're even documented the same. They even use some of the same internal dispatch mechanisms. Yet they're not the same, nor even remotely close to being the same thing. What pseudo-hooks are, in practice, are a very bad way to emulate... an object. That's less readable.

Similarly, look at Field API. It has an enormous number of new features, but try to follow the array structure without a debugger when writing a widget. I've written a chapter of a book on the subject and I still can't comprehend them without copying and pasting. They're completely unreadable precisely because they are not encapsulated. It's a problem space that is by nature very context-sensitive, but we're trying to use a context-free language syntax for it. That creates a major DX problem.

Or consider Views 2. Could it have been implemented entirely procedurally? Certainly. Would anyone have been able to read it, to say nothing of maintain or extend it? I'll wager even Earl wouldn't have been able to, even if he wrote it. :-) Yet Views being OO hasn't hurt its adoption at all, and I'd argue is one reason that Views 2 has been so much more successful and more frequently extended than Views 1. In that case, the OO code is far more readable than its procedural equivalent.

As for when a system should be built in an OO way or an AOP way or even a functional way, that's not always an easy problem. Deciding which is appropriate can be difficult, and is one of the primary reasons for this blog series: We need to carefully consider these questions and make informed decisions about which trade-offs are appropriate when, and how we can mitigate the costs. In order to do so, we need to have a common vocabulary and understanding of these trade-offs in order to make the hard decisions.

I also wouldn't entirely agree with your final statement. I would hardly call Zend Framework, Symfony, or Cake failures. They may not get the attention Drupal does in the circles we usually run in, but they're all very successful projects in their own right. They're also aimed at different markets, so their architectural priorities are different. BeOS died, but for reasons that had very little to do with whether they were OO under the hood or not. It was more that BeOS was hard for developers because of a "threads are hard, everything is a thread, good luck" attitude (Note: that's what I have heard from other developers, not something I have any experience in myself) and an inability to find a market niche.

Consider KDE, which is incredibly successful with an enormous and active community and yet is completely OO under the hood. Qt is actually a fairly nice architecture.

@Larry Thanks for the great article.

Again, it's not a question of can you implement a given design but how easily you can do so, and at what cost.

Indeed, its all about expressivity of a paradigm (after all the languages should be Turing-complete). Sometimes you use structures for it, but you don't always need to. I'm not sure how to use functions to create an object in PHP but its freaking easy in Lisp. This sad, I see what you mean with "without native syntactic support hooks are a rather clunky, hacked-up poor man's AOP,". So in what way does poor-man's AOP have enabled many people to code aspects? Schouldn't we have much more poor-man stuff?

@Rob Knight

I think it's seriously worth asking why Linux and Drupal have prospered whilst architecturally "better" solutions (BeOS, Zend Framework/Symfony/Cake/etc.)

This phenomena has been fascinating me in the past 4 years. It is incredible how many "poor-man stuff" is actual winning the battle. This relates to economic concepts as creative destruction and disruptive innovation. From software engineering point I wonder how we can help to express innovation easier. I don't expect any answer to yet with Drupal 8, but I hope it will build a foundation that would make switching between different paradigms and architectures easier. It would be interesting to see if Drupal could become a living-lab for paradigms. I would like to investigate a "poor-man oriented paradigm" (POP ;-)