Periodically, there is a complaint that PHP conferences are just "the same old faces". That the PHP community is insular and is just a good ol' boys club, elitist, and so forth.
It's not the first community I've been part of that has had such accusations made against it, so rather than engage in such debates I figured, let's do what any good scientist would do: Look at the data!
Update 2015-08-25: The Joind.in folks have given me permission to release the source code. See link inline. I also updated the report to include a break down by continent.
Joind.in me and rule the Internet!
The first step of course is getting data to analyze. While we don't have perfect data for all PHP events around the world, we have a reasonably good proxy. Joind.in has become the de facto session review site for most of the PHP community over the past few years, and has even branched out outside of the PHP world more recently. That allows us to get a very detailed picture of the PHP event ecosystem. We wil use that as our data source.
There are still numerous caveats to that data.
- Joind.in's data only goes back to late 2008, and there have been conferences since long before that. For the purpose of this analysis, then, I am looking only at events in Joind.in from 1 January 2010 onward; that gives about a year's worth of data to "prime" the list of existing speakers.
- Not all events are listed with Joind.in. While most of the general-PHP world uses it, not all PHP communities use it. Nearly all Drupal events are missing, for instance. That means that some established speakers look like newbies to Joind.in, and speakers who frequently attend events not listed here are under-reported. For instance, Joind.in thinks my first presentation was at Symfony Live Paris in 2012, and knows nothing about the dozen or more presentations I've given at DrupalCons since then. (Seriously, I've given way more than 23 sessions in the last 5 years!)
- Joind.in doesn't differentiate between different sizes or types of event. It includes 3000 person conferences and 20 person user group meetups. For the purpose of this analysis I defined a "conference" as any event Joind.in knew about that had 5 or more presentations listed. A different cutoff may produce slightly different data.
- Joind.in allows a session to have more than one speaker. However, the overwhelming majority of sessions have only a single speaker, and trying to account for multiple speakers would have made my job a lot harder. Therefore, I am only tracking a single speaker per session. If Join.in lists multiple speakers I am only counting the first listed. I don't believe that greatly impacts the overall conclusions, but may impact certain speaker's rankings.
Also, while downloading the data and loading it into MySQL there was some character set corruption (despite everything being UTF-8 as far as I know). That means a few speaker and event names have been garbled, particuarly Asian character sets. According to the Joind.in folks, that's probably bad data in their database to begin with so there's not much I can do about it. My apologies to those affected.
Joind.in offers a very nice JSON API that allowed me to download essentially their entire dataset for local analysis. To download the data I used Guzzle, PHP's leading HTTP client, and de facto standard Doctrine DBAL for writing to the database. (Raw PDO would likely have worked too, but Doctrine has some nice schema management tools and I wanted an excuse to play with Doctrine more.)
The import code itself follows a mostly-functional style, with a little procedural thrown in for lack of interest in refining it any further. :-)
The full source is available on GitHub. Pull requests for more reports welcome, but please don't abuse the API. :-)
Enough talking, where's the data!
I've collected the reported data on a separate page to make it easier to read, which is attached to this post. Go ahead and have a look. I'll be here when you get back.
I want to call out especially the following line at the bottom of the first table:
Average total sessions: 31.6
Average number of speakers: 24.8
Average number of first-time speakers: 13.1
Average percent first-time speakers: 50.6
That is, across the entire spectrum of Joind.in's available data, events average half of their speakers as first-timers. Half.
Of course, there's plenty of variability in that, with event ranging anywhere from 0% to 100%. However, most are at least in double-digits, even up through and including 2015.
That also appears to be about the same across regions (as defined by the timezone code the event used). The one exception is Asia, which I suspect is due to being newer to Joind.in so its data is skewed.
So to the claims that PHP conferences just select the "same tired old speakers, year after year", I would say that is patently false and we have the data to prove it. Are there other issues with session/speaker selection, diversity, and so forth? Quite possibly. But claims that there's not a diversity of names, period, are provably untrue. Even if the data is skewed a bit because of the sampling process or recent non-PHP additions, there's still a huge churn.
Also, everyone loves Derick. :-) In fact, Europeans seem to dominate the top-speakers list. There is only one American in the top 10, Matthew Weier O'Phinney. So much for the loudmouth American stereotype.
Any other data I should crunch? See a bug in my analysis? Let me know! I'm happy to crunch additional reports out of the data as long as it's not too difficult to add. I will also add Errata to this post if anything turns out to be totally wrong. :-)