As if on cue, the public vs. private debate has sprung up again within Drupal. The timing is fitting given my last blog post on programming language paradigms. Of course, property visibility is not a new debate, and the PHP community debates this subject from time to time (sometimes humorously).
What I believe is usually missing from these discussions, and what I hope to offer here, is a broader picture view of the underlying assumptions that lead to different conclusions about when different visibility is appropriate (if ever).
In short: It's the difference between procedural-think and object-think.
Those coming from a procedural background, I find, tend to think in terms of what is possible in a procedural language. Any procedural language has a concept of a variable (duh), and any procedural language worth using has some way of creating more complex data structures to be used as variables. In C there is the concept of a Struct, for instance. In PHP one can largely emulate the same structure (no pun intended) using PHP's ridiculously flexible arrays and good documentation.
In either case, what one is doing is simply clustering variables together to make them easier to work with. To actually operate on that variable cluster requires a function that understands that structure.
To be sure, one can do an incredible amount with such an approach. In a past life I was a Palm OS developer, and the Palm OS API was entirely procedural. Most functions took some sort of struct as their first parameter and would manipulate that struct in some way. Occasionally you could manipulate the struct directly but it was usually documented as unsupported, "do this and it will probably break", although the API authors tried hard to not break the struct definition. In a sense it was poor-man's classes and methods but without the class or method.
With this approach, you're at best talking yourself into not making use of functionality that's right in front of you because it could break (or some host company like Palm says "naughty naughty", or like Apple rejects your app). However, even with being able to name structs in C you're not doing anything more than clustering variables to make the syntax cleaner.
In a procedural approach, the position of pre-created variables within the structure is the contract between caller and callee. The behavior of that structure is completely undefined.
As noted in the previous article, in an OOP paradigm one doesn't think about clustering variables. One thinks about objects as being a single opaque entity that has behaviors. Instead of thinking in terms of integers and strings, one thinks in terms of Domain objects: Nodes, or Posts, or Users, or Dates, or Customers.
These objects are atomic; they're just as atomic as an int or string would be in a procedural approach. And therein lies the power.
If I'm accessing a string in, say, PHP, I don't know, or care, if it's being stored internally as a character array. Or if I concatenate something to the string I don't know or care if the character array is being extended in memory, moved to a new location in memory that is bigger, or if an entirely new variable is created and both strings are then put into it, destroying the first string. That's not my problem. When writing my code, I should not have to care about the memory management that goes on under the hood. If I could access it and tried to manipulate it myself for some reason I'm more likely than not to cause a fatal error and segfault my entire PHP process, if not now then as soon as the next minor point release of PHP tweaks the string management code. Don't think that's a problem in practice? Why do you think some people hate C, where you're doing that sort of code back-stabbing all the time? :-)
In an OO model, if I'm accessing a Customer object then I don't know, or care, that it's being persisted internally to SQL or to a flat file. I don't know or care if when I call
$customer->getName() if that name is stored as a single string within the object, as multiple strings that get concatenated together, or if it generates a new request to a SOAP service to look up the name. In fact, it's a design flaw if I even need to care which it is. If I start mucking with it directly, then as soon as someone switches from local MongoDB to a remote SOAP service my code falls apart and fatals.
There is no $customer->name variable. As far as I, in calling code, am concerned, it doesn't exist. All that exists is the defined contract of the interface methods.
In an OOP approach, the methods that are exposed in the interface is the contract between the caller and callee. The underlying logic and primitive variables are completely undefined.
Once again, I blame Sun for why many people don't "get" that distinction. In Java, there is a concept of a JavaBean. A Bean is a class in Java that has a no-parameter constructor, is serializable, and has a getX() and setX() method that corresponds to every property X. That is, it is an object that is really a Struct and offers little if any advantage in being an object in the first place.
In so many conversations about OOP, I hear people ask "well where are your getters and setters?" Which is usually followed by "if you have to have getters and setters then why bother making properties non-public?" The latter is a perfectly valid question, but is backwards. Why bother having matching getters and setters for every property? That breaks encapsulation, one of the key reasons for using an object approach in the first place, and is one of the reasons why JavaBeans are, in most cases, a horribly bad model to follow. They are a terrible example of OO. They are Naked Objects wearing a see-through negligee. While there are valid use cases for such a design they are not representative of "good" OO. I encourage most PHP developers to forget they exist as the Bean approach in most cases defeats the entire purpose of using objects.
It is quite unfortunate that so many schools teach textbook Java as their primary programming language, as in so many ways bad API architecture like Beans encourages bad OO programmers, who then give good OO design a bad name.
When viewed from that standpoint, it's easy to see where the debate about public vs. private/protected comes from. To a procedural way of thinking, hiding properties doesn't make the slightest bit of sense. Those are the data, let me at the data dagnabbit!
From an OO standpoint, however, exposing internal properties makes no sense at all. You're just begging for someone to break your code, or worse yet open up a security hole, and it means you cannot refactor your code when you need to for fear that someone is relying on the current implementation. You cannot improve the API without breaking it. You may as well ship runkit with every copy of PHP and encourage people to change the language syntax out from under you. (This is a step more evil than eval().) You're making me care about the underlying complexity, stop that, just let me tell you what to do dagnabbit!
Certainly both concepts can be taken to an extreme. I did once work on a procedural system where the entire communication mechanism between different parts of the system was global variables. That was a horrid system, let me tell you, despite being the ultimate in "flexible and bare data". I haven't seen anything quite that horrid in OO code myself, unless you count JavaBeans, but I have faith that it exists.
If you're using your Classes as structs, then public properties make total sense and anything else is silly.
If you're using your Classes as objects, then protected/private properties make total sense and anything else is silly.
It has been suggested that Drupal should adopt a "public only" policy, on the grounds that "Drupal is in the business of throwing doors wide open". Both that and the article the suggestion references misses the point; more specifically, the suggestion is to use classes as structs, not as objects.
However, that approach also runs afoul of another part of the Drupal business: Being modular. Exposing implementation details of that sort is, as explained above, an inherently non-modular approach. Exposing properties publicly encourages their direct use, which in turn means that we are using classes-as-structs: as little more than arrays that pass funnily. That means losing all of the benefits of encapsulation, abstraction, modularity, and portability that using classes-as-objects offers.
Given the markets that Drupal is moving into, where swappability of components (a mainstay of classes-as-objects) is critical, I believe that to be a very bad trade-off.
There is only one way I can see for such an approach to work, and that would be to adopt and rigorously enforce the following policy:
All public properties are to be treated as internal implementation details and not accessed unless no alternative is available. Accessing a public property is not supported in any circumstance, and the structure, definition, or existence of such properties may change at any time, even in a point-release, without notice. Changes to an object property are never considered an API change.
If we could hold to that, we would essentially be able to have our cake and eat it too: classes-as-objects but with an "out" in case the defined API only does 98% of what we need.
To be perfectly honest, however, I do not think we could pull that off. That's not a slight against Drupal developers but against human beings. The temptation to say "oh, well, it's easier to just grab this property than to file a bug report" is just too high. And then when we refactor something and inevitably break the "unofficial API" (aka those public properties), someone (who may or may not have read the documentation) will come screaming to the issue queues that we broke their site. It doesn't matter if it's core or contrib, it will happen.
What do we say then? "Sorry, you didn't read the docs, go away?" No, we'll find ourselves avoiding changing properties to avoid those sorts of issues. We'll find ourselves saying "Eh, I don't need to think through an API here, people can just grab the property and do what they need". We'll finally give in and, informally, consider properties to be part of the API and try to not change them, even if it would make the code better or provide some new feature.
And we will have gained nothing.
Does that leave us with no resolution? No way forward to decide how to build the next generation of Drupal? Hardly. Rather, it puts the onus on us as Drupal developers to decide, situationally, which sort of flexibility we value more.
There are cases where "opaque domain objects" are absolutely the right fit, and we want to think in terms of behavior. The Drupal 7 database layer comes to mind, obviously, but any system where we want to flexibly exchange or chain together disparate components is also a strong fit for classes-as-objects, with all of the benefits and trade-offs that come to mind.
There are other cases where we want to turn it inside out, cases where we want to think in terms of raw data. Cases where we have highly unstructured or irregular data come to mind, such as Render API or FAPI, and we need to amorphously aggregate it over time. While those systems certainly could be made in a purely OO classes-as-objects approach, I suspect it would be even more complex than FAPI is now (which is already pretty complicated) and with all the extra stack calls that would result much slower. Any sort of info hook is another place where bare data wins, because it's all about definition rather than active behavior.
To be sure, that puts a lot of pressure on us as software architects to think through our APIs and figure out the appropriate technique. The right tool for the right job is never an easy decision, but it is from those decisions that really powerful designs are born.
That is a challenge I do believe we are up for.