As we begin a new year, it seems appropriate that the discussion of backward compatibility has come up yet again in Drupal. It's a perennial question, and you can tell when a new Drupal core version is ready for prime time when people start complaining about lack of backward compatibility. It's like clockwork.
However, most of these discussions don't actually get at the root issue: Drupal is architecturally incapable of backward compatibility. Backward incompatibility is baked into the way Drupal is designed. That's not a deliberate decision, but rather an implication of other design decisions that have been made.
Drupal developers could not, even if they wanted to, decide to support backward compatibility or "cleanup only" type changes in Drupal 8. It is possible to do so in Drupal 9. If we want to do that, however, then we need to decide, now, in Drupal 8, to rearchitect in ways that support backward compatibility. Backward compatibility is a feature you have to design for.
First, we need to understand what backward compatibility even means. As some comments in the post linked above note, backward compatibility is not the same thing as easy upgradeability. It is also not the same thing as easy to-relearn. Drupal 7's UI is completely not backward compatible with Drupal 6, for instance, but it is easier to learn.
For now, we are speaking strictly of API backward compatibility. It is related to, but not the same thing as, easy upgradeability of modules (but unrelated to data; more on that some other time). The fewer APIs change from one version to the next, the easier it is to upgrade. But you can also have API changes that are easy to upgrade to. More than one module developer, when porting a module from Drupal 6 to Drupal 7, noted that they were able to throw out (and therefore not have to spend time upgrading) hundreds of lines of code by switching to the Drupal 7 query builders. That's an API change (and thus lack of backward compatibility) that made upgrading easier, as well as code better. As Dries has noted before, if we don't let ourselves do that then we will never advance, and will just carry around dead weight baggage forever. That's how Windows Me happens, and no one wants that.
Backward compatibility is also a factor in how long major modules take to be upgraded, but not as much as many people think. Views, Panels, and Context are often cited as examples of "Drupal not being ready yet" because those modules don't have stable releases. That is a specious argument, however. In the case of Views and Panels, the initial ports to Drupal 7 were done over a year ago. The lack of an official release was because the developers decided to make other overhauls and changes and feature additions at the same time, and those took a while to stabilize. In the case of Context, it's because the development team behind it changed when DevSeed moved on and the new team has been swamped. (Also, Context didn't have a stable D6 release for most of its existence, either.)
So if that's what backward compatibility is not, then what is it?
Backward compatibility of a code component is a measure of how many changes (good or bad) code that depends on it needs to make in order to move from one version to the next version.
The fewer changes need to be made, the more backward compatible code is, obviously with an upper bound of "no changes needed." Note that says nothing about the quality of the code or the value of those changes; often a backward incompatible change is absolutely necessary for certain improvements.
What does it mean for one code component to depend on another as far as compatibility goes? It means that those two code components touch. There are two ways that systems could "touch":
Of those two, the second is a tighter coupling because it means dealing with implementation details.
There is also the general question of how much one component touches another. In general, the more two components interact the larger their shared "surface area".
The larger the surface area, the tighter the coupling. The tighter the coupling, the more likely a change in one component is going to necessitate a change in another component.
Backward compatibility happens when the touch points between two components do not change. Anything else can change, but those touch points do not. As a result, the smaller the surface area between two components the more you can change without breaking compatibility.
As noted in the Wikipedia article linked above, accessing raw data structures is always tight coupling. Raw data structures are an implementation detail. When implementation details get shared, that is a classic Code Smell known as Inappropriate Intimacy.
Inappropriate intimacy is a massive amount of surface area. It covers the entire data structure. That is, any change to the data structure whatsoever is a potential, if not actual, backward compatibility break. The reason is that you do not know what parts of that data structure some other component may care about. You have thrown the doors (and your proverbial pants) wide open, and said "have at it, world!" Once you do that, every change, no matter how slight, could break backward compatibility because you simply do not know who is doing what with your private data structures. (You should be squirming about now.)
Rather, the first step in making backward compatibility possible is to put your pants back on, protect your data structures, and take control of your surface area.
By far the easiest way to take control of, and reduce, your surface area is to define it explicitly. That is the essence of an Application Programming Interface (API): It is the explicit definition of the surface area of your component.
That could take the form of function calls or method calls on an object. Both could suffice as an API. The latter, however, has the added bonus of a language structure called Interface, which explicitly and in code defines the surface area of an object. By design, it defines that surface area independently of implementation details.
What will not suffice, however, regardless of whether one uses classes and objects or not, is defining ways by which one will get complete access to a components internal implementation details and raw data structures.
Access to raw data structures does not constitute an API. It constitutes avoiding the responsibility of defining an API.
Now, sometimes there are good reasons to do that. Sometimes. Not often.. But when that's done, it must be understood that backward compatibility is made effectively impossible.
With a clearly defined interface, we know what we can change and what we cannot, if we want to preserve backward compatibility. We also can more easily document where things are going to break, and offer documentation or automation to make it easier to migrate.
One of the popular arguments in favor of object-oriented code is "data hiding". That is, you can explicitly demand, in code, that certain data is kept hidden from certain other components, and is not part of your surface area. That can only be done in procedural code by convention.
That convention has been used before, of course. In a previous life I developed for Palm OS, which used an entirely procedural C-based API. It passed around a lot of raw data structures, because in C that's all you can do. However, it was extremely bad form to ever touch them directly. Rather, there were copious amounts of functions that were, in any practical sense, methods, just called inside out. So you'd call
Form_Add_Element(form, …) rather than
form.AddElement(...). Doing anything with
form directly, while it would compile, was not guaranteed to continue working even in minor bugfix releases of the OS. Here be dragons.
That's a viable option, but doesn't really change the amount of work that has to be done to define an API. Unless you say, either by convention or code, that implementation details and data structures are off limits and not guaranteed, you do not have a controlled surface area and therefore you do not have an API.
In many recent presentations I have used this example from the Drupal 7 database layer. A "raw data structure procedural" implementation of the new select builder would look like this:
$fields = array('n.nid', 'n.title', 'u.name');
$tables = array(
'n' => array(
'type' => NULL,
'table' => 'node',
'alias' => 'n',
'condition' => array(),
'arguments' => NULL,
'all fields' => FALSE,
'u' => array(
'type' => 'INNER JOIN',
'table' => 'user',
'alias' => 'u',
'condition' => 'u.uid = n.nid',
'arguments' => array(),
'all_fields' => FALSE,
$where = array(
'field' => 'u.status',
'value' => 1,
'operator' => '=',
'field' => 'n.created',
'value' => REQUEST_TIME - 3600,
'operator' => '>',
$order_by = array(
'n.title' => 'ASC',
db_select($tables, $fields, $where, NULL, $order_by, array(), NULL, array(0, 5));
Aside from the obvious DX problems that has, mostly in terms of not being self-documenting, it makes the entire implementation public. Do we use "INNER JOIN" or just "INNER" to specify the join type? And if we change it, does that not break every single query in the system? Yes it does.
However, we did in fact change from INNER JOIN to INNER at some point during Drupal 7's development cycle, and it was not an API change. That's because the query builders use an object-oriented, interface-driven, non-raw-data API:
$select = db_select('node', 'n');
$select->join('user', 'u', 'u.uid = n.uid');
->fields('n', array('nid', 'title'))
->condition('n.created', REQUEST_TIME - 3600, '>')
We're separating the internal data structure from the
join() method. As long as that method doesn't change, the internal implementation could change today, mid-Drupal 7's lifetime, without breaking an API. That is possible only because it eschews raw data structures in favor of a well-thought-out, abstracted, interface-driven API.
That is what it takes to be backward compatible.
Of course, sometimes we need to change the API. That happens, often for very good reason. If the system has been designed properly, however, it may still be possible to retain backward compatibility, at least for a time. KDE is a good example. Most KDE 3 apps work under KDE 4, albeit not as cool as they would as KDE 4 apps, and without integrating with the awesome new plumbing that KDE 4 offers. As a result, nearly all applications have been rewritten for KDE 4 by now as it's in their interest to do so.
It is possible to support multiple versions of an API at the same time, but there is a cost to doing so. First and foremost, that must be planned for in advance. There are likely others beyond what I am listing here, and I would appreciate feedback in the comments on other good approaches.
Microsoft Direct X took the approach of "version everything and keep it". When requesting a new Draw object, for instance, you specify a version of it. So if you code to Direct X 6, you ask for the DrawV6 object. If a user has Direct X 7, the DrawV6 object is still in there, just as it was, and still works. On the upside, this means code almost never breaks. On the downside, it's a lot of baggage to carry around indefinitely, and that baggage only increases every version. It also requires that you have APIs that are broken up into very clear discrete objects (not function calls), and that your version-request mechanism is baked in from the start.
If you can keep your general class structure the same, then you can also simply expand your API. As long as the existing interface doesn't change, adding more operations to it braks no existing code. in the simplest case, this is simply adding optional parameters to the end of a function signature. My very first Drupal patch did exactly that, in fact. When you have a lot of parameters, though, or a more complex case, it's easier to add more methods to a language interface and object.
If you would need to change the way a given method behaves, or change its signature, then it's also possible to simply add a new method that does the new thing instead, and leave the old one in place. Perhaps the old one could be reimplemented internally to use the new one, but calling code doesn't care because the contents of its surface area hasn't changed. The old method could then be explicitly marked deprecated, and give developers time to migrate over to the new one before it is removed. The downside of course is that you need a new name for the new method, which is one of the two hardest problems in computer science. (There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors.)
Another alternative is to simply fold both versions into a single call. As a trivial example, consider everyone's favorite PHP WTF, implode(). At some point in the past, it took $pieces and then $glue to turn an array into a string. That was inconsistent with explode(), which was a major DX problem. To resolve that, implode() was enhanced to take $pieces and $glue in either order. That provided both backward compatibility and a more consistent API moving forward... except that the old version was never removed, and is still not even marked as deprecated, so there's still plenty of old code out there using the old parameter order making it impossible to remove the old and inconsistent baggage.
Backward compatibility, then, requires a number of things:
That's why, no matter how much users want it and no matter how much Drupal developers may want to do so, Drupal 8 will not be, cannot be, API backward compatible with Drupal 7. Drupal today is based on passing around raw data structures rather than having clear APIs. (Render API, Form API, etc. thus do not technically qualify as APIs by this definition.) When we have APIs, they're generally not designed with future extensions in mind (except in so far as conventions for adding more stuff to a raw data structure). We have no long term strategy for future development or how we're going to maintain compatibility between versions, even through legacy add-ons.
Again, quoting Dries:
But what to do if many of your users slowly force you to change one of your core values? It seems inevitable that sooner than later, we will have to be a lot more careful about breaking peoples' code. And when that happens, I fear that this will be the end of Drupal as we have come to know it.
Dries wrote that around the release of Drupal 4.7, another very hard release. But really, it already is the end of Drupal as we knew it then. Drupal as we knew it then does not exist, and the Drupal of today is quite different, both in terms of APIs and in ways entirely unrelated to code. Thinking about API compatibility is not a death-knell for Drupal.
But, if we want increased API compatibility and stability, we need to take steps, now, to ensure that there is a structure to support that. That means, first and foremost, designing APIs, not raw data structures, and not just as an outgrowth of a particular implementation. That's a cultural shift as much as a technical one, but a technical one as well. It means changing the way we think about software design. It means, quite simply, interface-driven development.
Fortunately, such thought has already been happening in both the WSCCI and Multilingual initiatives at least. Interface-driven development, complete with real language interfaces, is where Drupal is headed anyway. We should embrace it, fully, and allow ourselves to be open to the potential for improving compatibility between Drupal versions that will result... if we take it.