Composer vs. Linux Distributions: A Mental Model Battle

Recently, Gentoo documented what they view as the Composer Problem: Basically, PHP projects using Composer can't be packaged the way they want to package it, with system-level shared libraries. This is not a new complaint; Other distributions have complained about Composer's impact before. But fundamentally I think the issue stems from having the wrong mental model of how modern PHP works when viewed from a distribution or sysadmin perspective.

In a recent heated GitHub thread, several people referred to PHP "linking" to 3rd party libraries, as if they were shared C libraries. That is simply not the case.

Neither "static linking" nor "dynamic linking" really applies to PHP. From a sysadmin perspective, PHP is closer to highly complicated bash scripts than anything else. However, since the terminology has already been used I will continue with it.

A bit of history: In the BC era (Before Composer), there were two ways that PHP developers used 3rd party code:

  1. Copy and paste from a blog post into your own repository. This is the "statically linked" approach.
  2. Use PEAR, which was installed globally on the system. This is the "dynamically linked" approach.

Both of these approaches had issues, but PEAR had far more. Basically, PEAR sucked balls for anyone who didn't have root on their server, which for PHP devs was the 95% use case. It was hard to use, hard to contribute to, and if you depended on a version of a package that wasn't what was pre-installed, or was different than some other application on the same server needed, you were SOL. Basically it was unusable for the majority of the market.

And then there were manual require statements, which Gentoo is asking PHP developers to go back to. Those suck. Period. Full stop. Telling PHP devs to go back to PHP 4-era manual require statements is akin to telling them they need to cut off their thumbs to type and must use a 386 computer at the latest, while sitting in a dark room whose only lightbulb is pointed straight in their eyes. No, really, we're a decade past that era and it is not going to come back, ever. For a distribution to even make the request is bordering on offensive.

What Composer did was make approach 1 ("static linking") far easier and more sane. It did NOT address the system-wide-library question; in fact, the net result has been that the PHP community has entirely abandoned the "dynamic linking" approach wholesale, and has been much happier for it. Although PHP is runtime interpreted, modern PHP really doesn't function without compiling an autoloader for locally-available files.

Let me repeat that, because it's the crux of the disagreement: From a sysadmin/distribution maintainer point of view, composer is the compile step of PHP and only supports static linking.

Once viewed that way, the situation becomes much more obvious. The directory in which composer install is run is equivalent to the "static binary" of a compiled language. Or to a directory tree of .class files for Java.

Of course, PHP is not the only language to take that approach. Go, for instance, is only statically linked. Shared system-level libraries are just not a thing in Go, by design, and it gets by just fine, and distributions manage to ship Go packages just fine.

What that means for distribution packagers is this:

  1. For PHP apps that are frameworky in nature, or intended to have the end user add/remove add-on packages (Symfony, Zend Framework, Drupal, Laravel, etc.)... please ignore us. DO NOT try to package these. Please. We don't want you to. We got this. Just ignore them and we'll all be fine.
  2. For PHP apps that are a complete bespoke system (PhpMyAdmin, Wallabag, etc.), treat composer install the same way you would make or javac or go build.

That is, for a "binary" distribution (RedHat, Debian, Ubuntu, etc.), run this:

composer create-project wallabag/wallabag wallabag --no-dev --optimize-autoload

And then treat the resulting "wallabag" directory as a binary. Tarball it, package it, have a nice day. If later the project updates its dependencies due to security releases... recreate the "binary" and package that up. Or if one of its dependent packages has a security update that doesn't cause wallabag to update its own version, the distro can recompile itself and release a 1.2.3-1distroname version, as they already do for hundreds of packages.

For a "source" distribution (Gentoo, Arch, etc.), the "source" package is... probably just that composer create-project command, I wager. So when a user installs that package, the "compile" step runs that command and "compiles" their application, downloading the appropriate packages. If they want to recompile later, that's no different than recompiling a Go app.

The mistake here is trying to treat dependent packages of modern PHP applications like shared libraries. They're not. The community has spoken, and PHP simply doesn't work that way anymore. Fighting that is a losing battle. But by viewing composer as a compiler, distributions can still slot PHP into their typical workflows and get all of the security update ease that they're looking for.

At which point the only remaining objection is disk space usage. The answer on this front is simple: It's 2016. I can buy a 3 TB hard drive for under $100 after shipping. Your point is invalid.

Comments

Must disagree

Sorry, but I must disagree. I don't see any valid reason why PHP stack should be different from other language stack (C, perl, python...)

Shared PHP library just works.

Ex: composer requires symfony ^2.5, phpunit requires symfony ^2.1, phpcompatinfo requires symfony ^2.5, so they can work (by upstream design) with a shared symfony 2.7.9. And yes it works.

Of course, this requires some integration work, but this is exactly the "job" of a downstream distribution.

This also requires a lot of QA, ex read http://blog.remirepo.net/post/2014/08/12/Koschei-continuous-integration-...
And this have allowed to find interesting things will have to be fixed by upstream in all case, recent examples :
* https://github.com/composer/composer/pull/4756 json-schema 1.6 breaks composer test suite
* https://github.com/zendframework/zend-file/pull/14 zend-filter 2.6 breaks zend-file

But other changes, can also have a huge impact on PHP application, such as non-php library, PHP version , PHP extension...
* https://github.com/zendframework/zend-i18n/issues/14 ICU 56 breaks zend-i18n

This is about upstream / downstream communication and collaboration, a huge and exciting challenge.

Accepting the "bundled everything" as a rule, means we can also drop "semver" standard, and don't care of API stability, and just requires "exact" version of everything used. Sounds like a bug fail.

More: having 10 copies of symfony on a web server seems terribly bad for performance (10 x memory used by the opcode cache)

I'm terribly sad reading such blog, which mostly said, "we don't care" of clean re-distribution of our work. And this is probably one reason why so much sysadmin hate the PHP stack.

Thanks for this post i can

Thanks for this post i can absolutly agree. And that opcode argument is just as old as the ideas of gentoo. In times of docker and beyond the applications are isolated and they decide which Version of libs they are compatible with. We won't go back to that "sorry this Version ist not available here" times from the past. And that topic ist not about php. What about npm, chef and so on.

Beside composer allows shared dependencies for projects but thats up to the project maintainer just like it should be.

Other static applications

Tell me, how do distributions handle Go-based apps? Or C[++]-based apps that use statically compiled libraries? Or Java apps distributed as JAR/WAR files? The "static compile" problem is not even remotely unique to PHP, so why is PHP being made out to be the lone "bad actor" for saying that we, as developers, prefer static compile? I am genuinely curious, as someone who wishes he had more time to get into Go. :-)

I don't get at all how you get from "please static compile" to "pfft, semver". If anything, Composer has pushed PHP to follow semver MORE carefully, precisely because Composer's resolution logic depends on it.

For memory usage, yes, having 10 copies of the same code lying around is bad for the opcode cache. However... that's not even a concern anymore for most of us. That's an issue pretty much nowhere except for shared hosting. The last time I deployed a PHP application that wasn't to a dedicated VM or container was 2007, and the production server was running PHP 4. Using PEAR-style deployment for that wouldn't have any memory benefit anyway.

It's not that PHP devs "don't care about clean redistribution". It's that we do not consider "make it work like glibc" to be "clean distribution". PHP is not C. "Shared libraries" simply work differently in PHP, and no amount of begging or shaming is going to change that. We DO want clean distribution of our work; our definition of "clean distribution" is "statically compile it, or just let us do it". If anything, we're begging you to stop trying to solve this problem for us because we've already got it handled. Please, just let us take care of it. There's plenty of other, more impactful things that distro packagers could be doing than trying to shame PHP developers into bringing back PEAR/PHP 4. As a Linux user (laptop and personal server) I value the time distro packagers put into making my system "just work". Please, don't waste your time on something we don't want or need you to fix for us.

(Incidentally, I hate working with Ruby for exactly this reason: Gems still install globally like PEAR used to, and distros like to re-package their own things, so every time I try to upgrade Sass I run into this exact problem where there's what's in Gems, there's what the distro has a package for, and there's what my Ruby app requires, and none of them line up at all. If anything, as an end user PHP is *easier* for me because I have one and only one source to think about: Composer.)

Who is prioritized?

There has been a shift over the years to prioritize developers of applications and speed of development. This first took off with startups, small companies, and those wanting to make a large impact quickly. It's gone everywhere including the large enterprises.

Fast development at the application level is far faster than any operating system release cycle. And, different services running on a system may move at different speeds.

To enable developer to move fast in that space dependencies need to be tied to the project rather than the system. That was projects running on a system aren't held to the least up to date tool on the system.

The shift is about enabling developers which has become a business priority. Which, is what pays for a lot of the work at the end of the day.

There are definitely tradeoffs. You do have to use a little extra memory if there are multiple instances of the same package on the system. The priority is speed and developer enablement. The memory usage is subservient to that.

Ruby, Python, Go, Java, server side JavaScript, and others have all moved in this same direction. Just look at all the applications using virtualenv for python.

I respect the desire for optimized operation. I really do. Given the drop in price for memory and processing for what we're running it's simply a lower priority than fast development and enabling developer productivity.

@remi I understand your feelings, but that boat has sailed

As another commenter posted above, the problem is not in the packaging system, but in the speed of evolution. The PHP apps and runtime are not special compared to other languages for their linking paradigm or runtime requirements - they are just moving too fast compared to the operating system.

As afar as I can recall (and I've been doing php daily for the last 9 years), there has always been a huge amount of friction going on between php developers and distribution packagers, even before Composer was born.
PEAR never made it, not because of an unwieldy interface, but precisely for the fact that it defaulted to a 'shared library' model. Distributions packaged it, and no-one ever used it (of the people deploying php apps).
RedHat, with its long release cycle and php version cast-in-stone was also the bane on many php developers, who developed apps on a modern language runtime, only to find sysadmins refusing to install them because of 'must run on pristine vendor sw stack'. This was a real problem, as the amount of non-official rpm repos which sprang up with more modern php versions can certainly attest (I'm quite sure you must have heard of some... ;-) )

But even with epel and remi-collet around, the friction was still high, and when Composer came along, it spread like fire because it solved the problems that so many developers had.

I don't dispute that an ecosystem of shared libraries where each developer cross-tests his own lib for api and abi stability is a good thing, but in my experience few php devs actually do. If you ask them to release version 1.2.33 and restore compatibility with app. xxx which was broken in version 1.2.32, they often respond with 'sorry mate, I' now working on version 8.9.0".

In my own experience, a *single* modern php app can have hundreds of dependencies, and it can be hard to just make sure that there is one combination of dependencies which makes the install possible. Want to update one dependency because it adds feature X or bugfix Y ? Sorry, you will find it impossible unless you take over the composer.json file and override the version numbers by yourself.
The idea of having a set of dependencies which can satisfy the needs of not only one app, but many apps at the same time seems just unrealistic.

Is it wise to try to fight the mindset of the whole ecosystem, or is it better to find ways in which it can be easier to embrace it?